Reference Information
Title: "A Visual Approach to Sketched Symbol Recognition"
Authors: Tom Y. Ouyang and Randall Davis
Citation: "A Visual Approach to Sketched Symbol Recognition", Tom Y. Ouyang and Randall Davis, Proceedings of the 21st International Joint Conference on Artificial Intelligence, pp. 1463-1468, 2009.
Summary
This paper presented an approach to symbol recognition for recognizing freehand drawings based on the visual appearance of the sketch instead of based on other features such as geometrical properties. Specifically, the purpose of drawing diagrams was mentioned when describing this method. A new symbol classifier was also presented.
By basing recognition on visual properties of a sketch, the system becomes less sensitive to differences in strokes. The recognizer was designed to handle both shapes and characters in order to be useful for the task of drawing diagrams. In addition, the recognizer was based off recognizers that perform off-line handwriting do to their high level of performance. The use of temporal properties of sketches were added to the off-line approaches.
For the recognition approach, the sketch is first resampled, scaled, and translated to normalize the data. Then, the gestures are converted to feature images, with a total of five features calculated for every point. The features include reference angle orientations and endpoints and are spanned across feature grids, each of which represents a feature image. The next step involves increasing noise tolerance by using Gaussian smoothing and downsizing the images. Then, the recognition is performed with template matching using an image deformation model (IDM). Performance was optimized by using coarse candidate pruning to eliminate some of the template candidates with nearest neighbors and hierarchical clustering to create a hierarchy of training examples that can be used to limit the number of templates that are checked during recognition. Rotational invariance is maintained by rotating the sketch at a series of different points, spanning 360 degrees.
Evaluation was performed by testing the approach against three different datasets, each with a different domain of input: handwritten characters, shapes, and circuits. Each of these was tested with four different classifiers in addition to the one proposed in this paper with user-independent cross validation. It was determined that the method presented in this paper performed better than the other algorithms that it was tested against for each of the domains that were tested and that the performance optimizations made a large difference in runtime.
Thoughts
I think it's interesting that this paper was designing an approach that is able to effectively recognize diagrams. Diagrams are used in a wide variety of fields, and the ability to recognize such sketches would be very useful. However, the mix of types of input associated with a diagram (both shapes and text) presents an interesting challenge for recognition. Having focused mostly on recognizers that use a single type of input so far, it was interesting to read about one that could handle multiple types of input. This seems as though it would be a very useful property to have for recognizers for many different domains.
The consideration of optimizations in order to improve runtime performance was a useful consideration. In addition, the method for obtaining rotation invariance was interesting. It could be useful to use a technique like this for other approaches, as well, in order to obtain rotation invariance; however, the performance consideration of creating multiple versions of the sketch at different rotations would have to be considered.
No comments:
Post a Comment