CSCE 624: Sketch Recognition

Tuesday, April 23, 2013

Reading Assignment: Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches

Reference Information
Title: "Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches"
Author: Michael Oltmans
Citation: "Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches", Michael Oltmans, 2007.

Summary
This thesis presented a solution to recognizing free-drawn sketches using visual properties of the sketch. Techniques from the computer vision field are used to apply visual properties of the sketch to the recognition process. It is designed to be an interruption-free system that does not provide feedback to the user; however, this means that the recognition shortcuts that come with feedback systems cannot be used. Issues with signal noise, conceptual variation, overtracing, and segmentation were addressed and believed to be handled effectively by the fact that this system uses visual properties.

The approach that was used here considered the classification of isolated shapes, in which shapes are considered to be collections of visual parts and associated conceptual variations. The visual parts are based on shape context features and are constructed by creating a circular "bullseye", centered around points every 5 pixels in the image. The bullseye consists of a central point with concentric rings around it, cut into cross sections. The cross sections are rotated slightly from being horizontal. The rings increase in size as they go out from the center, at a logarithmic scale, such that the rings in the center provide more detailed information and the outer rings provide more context than detail. Each region of the bullseye is treated as a point bucket, counting the number of points of the sketch that reside within it. A vocabulary of parts is created with the parts from each of the training examples. A support vector machine is trained to the training data, then classification occurs by measuring the difference between a sketch and the vocabulary parts by calculating a match vector from the distances of the bullseye histograms. An isolated shape recognizer assigns a set of labels to shape parts, while the full sketch processor identifies the components of a full sketch.

In addition to visual information, temporal information about the stroke is also used in order to determine the direction of the stroke. Features are made to be rotationally invariant by aligning them horizontally and calculating two different histograms for each bullseye, one for the original and one for its reverse.

The approach was evaluated by testing the isolated shape recognizer and the full sketch processor separately on sets of circuit symbols and shapes. The classifier was compared to one based on Zernlike moments and it was determined that this proposed approach outperformed the other classifier.

Thoughts
The approach presented within this paper was very intriguing, and raised some interesting questions with regards to sketch recognition. The use of visual properties for classification as opposed to relying solely on stroke information or geometric properties is one of the more prominent features of this approach. It's like using a mix of offline and online properties, such as essentially creating point buckets for comparisons while using temporal data to determine the direction of the strokes. The fact that it helps to eliminate noise is very useful, and it would be interesting to see if this method performs better in some domains rather than others. For example, it was mentioned within the paper that different match vector approaches work better for various domains, such as photographs vs. sketches.

Also, there were some questions regarding the method itself. For example, the details regarding some areas of the paper seemed to be a bit vague, such as the resampling processes that were used. In addition, it would be interesting to compare the runtime of this system against other similar systems. It seems as though creating a new shape context feature every 5 pixels along the sketch strokes could result in an excessive amount of computations that might bog down the classifications, so it could be useful to have some data with regards to this.

Reading Assignment: A Visual Approach to Sketched Symbol Recognition

Reference Information
Title: "A Visual Approach to Sketched Symbol Recognition"
Authors: Tom Y. Ouyang and Randall Davis
Citation: "A Visual Approach to Sketched Symbol Recognition", Tom Y. Ouyang and Randall Davis, Proceedings of the 21st International Joint Conference on Artificial Intelligence, pp. 1463-1468, 2009.

Summary
This paper presented an approach to symbol recognition for recognizing freehand drawings based on the visual appearance of the sketch instead of based on other features such as geometrical properties. Specifically, the purpose of drawing diagrams was mentioned when describing this method. A new symbol classifier was also presented.

By basing recognition on visual properties of a sketch, the system becomes less sensitive to differences in strokes. The recognizer was designed to handle both shapes and characters in order to be useful for the task of drawing diagrams. In addition, the recognizer was based off recognizers that perform off-line handwriting do to their high level of performance. The use of temporal properties of sketches were added to the off-line approaches.

For the recognition approach, the sketch is first resampled, scaled, and translated to normalize the data. Then, the gestures are converted to feature images, with a total of five features calculated for every point. The features include reference angle orientations and endpoints and are spanned across feature grids, each of which represents a feature image. The next step involves increasing noise tolerance by using Gaussian smoothing and downsizing the images. Then, the recognition is performed with template matching using an image deformation model (IDM). Performance was optimized by using coarse candidate pruning to eliminate some of the template candidates with nearest neighbors and hierarchical clustering to create a hierarchy of training examples that can be used to limit the number of templates that are checked during recognition. Rotational invariance is maintained by rotating the sketch at a series of different points, spanning 360 degrees.

Evaluation was performed by testing the approach against three different datasets, each with a different domain of input: handwritten characters, shapes, and circuits. Each of these was tested with four different classifiers in addition to the one proposed in this paper with user-independent cross validation. It was determined that the method presented in this paper performed better than the other algorithms that it was tested against for each of the domains that were tested and that the performance optimizations made a large difference in runtime.

Thoughts
I think it's interesting that this paper was designing an approach that is able to effectively recognize diagrams. Diagrams are used in a wide variety of fields, and the ability to recognize such sketches would be very useful. However, the mix of types of input associated with a diagram (both shapes and text) presents an interesting challenge for recognition. Having focused mostly on recognizers that use a single type of input so far, it was interesting to read about one that could handle multiple types of input. This seems as though it would be a very useful property to have for recognizers for many different domains.

The consideration of optimizations in order to improve runtime performance was a useful consideration. In addition, the method for obtaining rotation invariance was interesting. It could be useful to use a technique like this for other approaches, as well, in order to obtain rotation invariance; however, the performance consideration of creating multiple versions of the sketch at different rotations would have to be considered.

Thursday, April 11, 2013

Reading Assignment: ShortStraw: A Simple and Effective Corner Finder for Polylines

Reference Information
Title: "ShortStraw: A Simple and Effective Corner Finder for Polylines"
Authors: A. Wolin, B. Eoff, and T. Hammond
Citation: "ShortStraw: A Simple and Effective Corner Finder for Polylines", A. Wolin, B. Eoff, T. Hammond, Proceedings of the Fifth Eurographics Conference on Sketch-Based Interfaces and Modeling, pp. 33-40, 2008.

Summary
ShortStraw is a system created with the goal of being an easy-to-implement, accurate algorithm for corner finding with freehand sketches using the polyline corner finding method. Ideally, it is designed with the intention that it can be used for educational purposes such that beginning-level programming students can implement it without too much difficulty. Polyline corner finders find corners of gestures by finding the minimum set of points within the gesture that can be used as splitting points in order to retrieve a only a set of lines from the stroke.

ShortStraw is a bottom-up algorithm that first resamples the points to be equidistant, calculates the straw distance (the Euclidean distance between two resampled points within a constant window) between each resampled point, and then identifies corners as those points with the minimum values for their straw distance. Then, processing of the found corners occurs with a top-down approach in order to account missing corners or false positives. This processing includes using a line test on each pair of consecutive corners, calculating the distance divided by the path distance to see if it is within a particular threshold. If not within the threshold, a missing corner is detected within a midway window between the two points by using the minimum straw distance. False positives are removed by performing a collinear check.

Evaluation of ShortStraw occurred by testing the system on sketches done by students, then computing an all-or-nothing accuracy measure. This measure was compared to that of the Sezgin's and Kim and Kim's corner finding algorithms. Correct corners found accuracy (number of correct corners found / number of correct corners a human would percieve) and all-or-nothing accuracy (number of correctly segmented strokes / total number of strokes) were both used. It was postulated that all-or-nothing accuracy is a more important measure, since correct corners found can be manipulated by simply returning every point. It was determined that the all-or-nothing accuracy of ShortStraw was significantly better than that of the other corner finding algorithms that it was tested against. Ease of implementation was evaluated by having an undergraduate student implement the system, and it was determined that it was indeed simple to implement. In addition, it was determined that the algorithm runs quickly.

Thoughts
I thought that it was a great idea to aim to create a corner finding algorithm that is easy to understand and easy to implement for educational purposes. It makes the algorithm easy to understand for those reading the paper, and provides a means for introducing newer programmers to the field. I thought that it was even better, however, that the ShortStraw design was actually tested by having a student implement it. This provided a means to actually evaluate that particular goal of the design, instead of just jumping to conclusions and declaring that the algorithm is short, therefore it must be easy to understand.

In addition, I found the discussion of different accuracy measures to be very interesting. Instead of computing a single accuracy measure to report to the user, different methods of accuracy measure were provided, each with their own merits. Then, the all-or-nothing accuracy was determined to be the most important measurement. This is something to consider when reading other papers. Instead of just accepting a single measurement as the definitive accuracy measurement, this paper made it apparent that it should be taken into account what kind of accuracy measurement is being used and whether it is actually the best measurement for the given situation.

It was also useful to learn about polyline corner finding, since we have previously learned about Sezgin's temporal means of corner finding. This is useful as it adds to our knowledge another method for corner finding that may be more useful in some situations.

Tuesday, April 9, 2013

Reading Assignment: Sketch Recognition Algorithms for Comparing Complex and Unpredictable Shapes

Reference Information
Title: "Sketch Recognition Algorithms for Comparing Complex and Unpredictable Shapes"
Authors: Martin Field, Stephanie Valentine, Julie Linsey, Tracy Hammond
Citation: "Sketch Recognition Algorithms for Comparing Complex and Unpredictable Shapes", Martin Field, Stephanie Valentine, Julie Linsey, Tracy Hammond, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp. 2436-2441, 2011

Summary
This paper discussed applying sketch recognition techniques to the field of engineering education. Specifically, a system called Mechanix was created to recognize freehand sketches of engineering problems for educational purposes, such as for completing homework and exams. Feedback is provided to students on their sketches based on comparing the sketches to a solution provided by the class grader/professor by also using Mechanix. One of the benefits of Mechanix is that it provides an environment for sketching engineering problems that is very similar to the traditional pen and paper approach, providing a natural means of completing assignments that is easier to grade and contains instant feedback.

The algorithms used for Mechanix are designed to identify bodies (in free-body diagrams) and trusses, as well as to use recognition to compare them to the solution sketches provided by the grader. The system uses an online recognition process that runs after each separate stroke is sketched. Identification of bodies occurs by using the pre-existing PaleoSketch system to find any strokes that form a closed shape, which constitutes a body. Then, the body is compared for similarity to the single template solution provided by the grader by using a combination of the Hausdorff distance, the modified Hausdorff distance, and the Tanimoto coefficient. Trusses are recognized as a collection of polygons that share sides, and are identified by using PaleoSketch to build a connection graph to find at least two polygons that share an edge. The identified truss is then compared to the single template provided by the grader, using the connection graph and the properties of the sketch.

Mechanix was evaluated via user testing and with use during a single section of the ENGR 111 class. The students that used the system had positive feedback and it was determined that the recognition is relatively accurate with little possibility of being able to trick the system.

Thoughts
Mechanix is an application that has very practical purposes of improving both the educational experience for engineering students and the grading process for engineering graders. It was very refreshing to read about such a useful application that I have some prior experience with, and that the sketch recognition techniques that we've been learning can be applied to. Also, it was very useful that the prior work section of this paper conveniently mentioned how Mechanix improves upon or uses each of the prior systems, instead of simply discussing those systems.

In addition, the fact that part of the evaluation was able to occur by having the system used within an actual engineering class was very interesting and potentially quite beneficial. Since Mechanix was designed for that purpose, being able to test its usage within the same domain that it is intended to be used in, by potential actual users of the system, is a great opportunity to receive more accurate evaluation feedback.

Monday, April 8, 2013

Reading Assignment: A Domain-Independent System for Sketch Recognition

Reference Information
Title: "A Domain-Independent System for Sketch Recognition"
Authors: Bo Yu and Shijie Cai
Citation: "A Domain-Independent System for Sketch Recognition", Bo Yu and Shijie Cai, Proceedings of the 1st International Conference on Computer Graphics and Interactive Techniques in Australasia and South East Asia, pp. 141-146, 2003.

Summary
This paper discussed a system for sketch recognition that accepts freehand sketches as input and performs recognition by using stroke approximation with low-level geometric features without the use of domain-specific knowledge. The output of the recognizer is a hierarchical structure of recognition information, designed to be easily used by high-level applications that the system can be embedded within. The system was designed with a list of ideal attributes for a sketch recognition system in mind. These attributes include the ability to draw naturally, to produce consistent recognition, to understand hierarchical relations, to predict the sketch during drawing, to provide an efficient and easy-to-use interface for the user, and be easily integrated into other applications.

The system has two main stages: stroke approximation and post-processing. Stroke approximation occurs during sketching, with each stroke as input as the stroke is completed. It uses recognition techniques to approximate the shape of the stroke as compared to a set of low-level geometric shapes (lines, arcs, circles, ellipses, and helixes). Stroke approximation includes vertex detection, line approximation with feature-area verification, curve approximation, and the handling of shapes with intersecting features. The post-processing stage is performed after the entire sketch is complete, and consists of using the data from the stroke approximation phase to create shape relations and complete recognition. It includes relation retrieval, cleanup (removing useless elements of the strokes and merging strokes as necessary), and finally, object recognition to recognize the basic objects that may occur within the gesture.

The user interface itself is designed to allow for creation, deletion, and modification of sketches. The modification feature is the most emphasized portion of the user interface. An evaluation was conducted by testing the system with users. It was determined that it was easy to use, provides useful information as output, and can be easily integrated into other applications as intended.

Thoughts
The idea of creating an easy-to-use recognition system for freehand drawing that can be easily integrated into other applications seems like it could be very useful. In addition, the attributes of an ideal, practical recognition system that were put forth could be very useful with regards to the future design of sketch recognition systems; however, I was curious as to where these properties came from. There is no source provided for the properties, so have they originated from research studies that have been conducted or are they simply the authors' opinions of the ideal attributes that a sketch recognition system should have? Since these are guidelines that the system is compared against throughout the paper, this could be an important distinction to make.

It was interesting to learn about methods for recognizing freehand sketches, as we have mostly focused on single-stroke sketches thus far. Also, the methods for vertex approximation mentioned the noise that is produced when using timing data, which is an interesting result to apply to the information that we learned previously of Sezgin's corner finding methods that used timing data.

Tuesday, March 26, 2013

Reading Assignment: A Few Useful Things to Know About Machine Learning

Reference Information
Title: A Few Useful Things to Know About Machine Learning
Author: Pedro Domingos
Citation: "A Few Useful Things to Know About Machine Learning", Pedro Domingos, Communications of the ACM, pp. 78-87, 2012.

Summary
This paper provided an overview of machine learning techniques at a broad scope. Machine learning consists of using learning algorithms with large amounts of training data to generalize the set of possible data in order to classify new, unknown data that is discovered. The focus of this paper was on classification algorithms, along with some key lessons in the field of machine learning regarding classification. Evaluating classifiers involves having a formal representation, an objective function for evaluation, and optimization. It should be noted that testing and training data should always be kept separate for more accurate evaluation statistics. This can be remedied by a few different solutions, including using cross validation.

Machine learning is all about generalizing the data that is given. A learner uses not only training data, but extra assumptions or knowledge about the domain, as well ("no free lunch" theorems). Some problems with the generalization include overfitting, underfitting, multiple testing, and the curse of dimensionality (generalizing becomes more difficult as more features are added into the input). Some theoretical guarantees that are incorrect were mentioned, including the fact that there are no exact guarantees on the bound of the number of examples needed, and that infinite data does not necessarily lead to a correct classifier. The importance of choosing the correct features was reiterated. While relative, independent features are the most useful, it is difficult to know what these features are when simply presented with the raw data of an input, since the input data tends to be more observational than experimental. In addition, it is more important to have large amounts of data for training purposes instead of having a more complex learning algorithm, but scalability must be taken into account when using lots of data. As for choosing the "best" learning algorithm, it really depends on the particular domain and application for which it is being used.

Finally, combining learning algorithms was discussed by using model ensembles to create more accurate learners from running the data through multiple classifiers. Some combination techniques include boosting, bagging, and stacking.

Thoughts
I have previously taken a course in machine learning, so I knew most of the information that was presented within this article. However, it was a good refresher for the information that I did know, and it provided some very good information regarding lessons and myths in machine learning that I did not previously know about.

For example, it was very interesting to learn about some of the details of possible machine learning problems, such as overfitting, multiple testing, and the curse of dimensionality. It was very useful that possible (and best) solutions were presented for each of these problems. Overall, the article was written in a very understandable format that made for an enjoyable and informative read.

Reading Assignment: The Word-Gesture Keyboard: Reimagining Keyboard Interaction

Reference Information
Title: The Word-Gesture Keyboard: Reimagining Keyboard Interactions
Authors: Shumin Zhai and Per Ola Kristensson
Citation: "The Word-Gesture Keyboard: Reimagining Keyboard Interactions", Shumin Zhai and Per Ola Kristensson, Communications of the ACM, pp. 91-101, 2012.

Summary
This paper discussed word-gesture, an alternative method of interaction for text input when using touch-screen keyboards, such as on mobile devices. The interaction consists of the action of swiping a single finger across the soft keyboard on a touch screen, running the finger consecutively across each letter of a word in one, fluid motion. It is designed to be a faster method of text input than using a traditional physical keyboard, but it still uses the same keyboard design so it is intended to be easy to learn and with the ability to improve usage speed with time. The faster speed of the word-gesture system stems from the fact that only a single continuous motion made with one finger is necessary to create a word and spaces are automatically inserted between the words. Ease of use comes from the fact that users are already familiar with the keyboard design, gestures come more naturally than the traditional usage of physical keyboards, and that no gestures are required to be learned since the user simply follows the pattern of keys that are visible on the screen.

The shape of the gesture that is created with this motion is compared against a set of pre-known gestures that are already associated with words in order to perform gesture recognition. The ability to enter commands (such as copy and paste) was added along with the ability to type words. Indexing and pruning are used to make the searching of the known gestures feasible for mobile devices.

The word-gesture method was tested through some experimentation along with releasing it as an application for mobile devices in order to receive feedback from real users of the system. The experiments included testing users on the ability to memorize gesture shapes and the speed of users' initial uses. Reviews made by users of the released application for mobile devices were used for evaluating the general conception of the system itself. One of the major contributions of word-gesture is that by releasing it as an actual product for people to use, the idea became more widespread, allowing for the proliferation of this new technique of text input.

Thoughts
It was really great to be able to read this paper about a recent system that is now in fairly widespread use in daily life. I have seen variations of this system on my own phone and therefore can relate my own experience with it to the information gained from reading this paper. Many of the papers that we read discuss systems that are not well known; however, the contribution of this paper is known by many now. This made it very interesting to learn how this method that I was previously aware of actually works.

I think that it is a very interesting concept to introduce a new method like this that relies mostly on previously-known concepts such as the physical keyboard. The fact that the user is not required to learn any new gestures, but simply to apply a new type of motion to a well-known system, is a very intriguing idea. It makes one start to think about what other types of new interaction can be applied to existing systems in order to improve upon their usage.

The focus on human psychology that was used in the creation and evaluation of the method was great to read about, since it was discussed why this method actually works. In addition, it was great to see that the main evaluation of the system occurred by putting it into real-world usage and obtaining actual reviews of the product, instead of simply running lab experiments to try and approximate real-world usage.