A number of experiments were used to evaluate the system. There were three categories of experiments: modular experiments to test the various components which make up the system, end-to-end experiments to test the system as a whole and user experiments to gain insight into how users felt about the system.

Modular Testing


An experiment was carried out to determine how successfully the segmentation of words on a page can be done.


Features are used to prune the dataset and allow for more acuurate matching to take place on words which are similar to the search key provided by the user. Experiments were conducted to test which features were most important and to what extent features can be used to prune the dataset. Experiments were conducted which allowed for the different features to carry different importance weights. Similarly, experiments were conducted which allowed for feature correspondence variation where the features of the words in the dataset were allowed to be different to the features of the search key to some extent. All experiments were conducted on a 20% subset of the total collection and was made up of 2921 images.

Accurate Matching

Accurate matching experiments were performed to test which of the four matching algorithms implemented were the most accurate as well as which ones were the fastest. This experimentation was also conducted on a 20%$ subset of the total collection.

End-to-end Testing

End-to-end testing was conducted to determine how the system functions as a whole. Optimal values found in the modular experiments were used in the end-to-end testing and then varied to see the effect. Testing was also carried out to see how the system held up as scale increased and subsets of size 20%, 40%, 60%, 80% and 100% of the collection were used.

User Testing

User testing was conducted to determine how users felt about the usefulness and usability of the system. Users were asked to make use of the system to translate a word and were then asked to comment on the way searching was conducted, the way results were displayed and how accurate they felt the system was. They were then asked to give feedback as to how they would improve the system.


