CSCE 624: Sketch Recognition: 2013

Tuesday, April 23, 2013

Reading Assignment: Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches

Reference Information
Title: "Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches"
Author: Michael Oltmans
Citation: "Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches", Michael Oltmans, 2007.

Summary
This thesis presented a solution to recognizing free-drawn sketches using visual properties of the sketch. Techniques from the computer vision field are used to apply visual properties of the sketch to the recognition process. It is designed to be an interruption-free system that does not provide feedback to the user; however, this means that the recognition shortcuts that come with feedback systems cannot be used. Issues with signal noise, conceptual variation, overtracing, and segmentation were addressed and believed to be handled effectively by the fact that this system uses visual properties.

The approach that was used here considered the classification of isolated shapes, in which shapes are considered to be collections of visual parts and associated conceptual variations. The visual parts are based on shape context features and are constructed by creating a circular "bullseye", centered around points every 5 pixels in the image. The bullseye consists of a central point with concentric rings around it, cut into cross sections. The cross sections are rotated slightly from being horizontal. The rings increase in size as they go out from the center, at a logarithmic scale, such that the rings in the center provide more detailed information and the outer rings provide more context than detail. Each region of the bullseye is treated as a point bucket, counting the number of points of the sketch that reside within it. A vocabulary of parts is created with the parts from each of the training examples. A support vector machine is trained to the training data, then classification occurs by measuring the difference between a sketch and the vocabulary parts by calculating a match vector from the distances of the bullseye histograms. An isolated shape recognizer assigns a set of labels to shape parts, while the full sketch processor identifies the components of a full sketch.

In addition to visual information, temporal information about the stroke is also used in order to determine the direction of the stroke. Features are made to be rotationally invariant by aligning them horizontally and calculating two different histograms for each bullseye, one for the original and one for its reverse.

The approach was evaluated by testing the isolated shape recognizer and the full sketch processor separately on sets of circuit symbols and shapes. The classifier was compared to one based on Zernlike moments and it was determined that this proposed approach outperformed the other classifier.

Thoughts
The approach presented within this paper was very intriguing, and raised some interesting questions with regards to sketch recognition. The use of visual properties for classification as opposed to relying solely on stroke information or geometric properties is one of the more prominent features of this approach. It's like using a mix of offline and online properties, such as essentially creating point buckets for comparisons while using temporal data to determine the direction of the strokes. The fact that it helps to eliminate noise is very useful, and it would be interesting to see if this method performs better in some domains rather than others. For example, it was mentioned within the paper that different match vector approaches work better for various domains, such as photographs vs. sketches.

Also, there were some questions regarding the method itself. For example, the details regarding some areas of the paper seemed to be a bit vague, such as the resampling processes that were used. In addition, it would be interesting to compare the runtime of this system against other similar systems. It seems as though creating a new shape context feature every 5 pixels along the sketch strokes could result in an excessive amount of computations that might bog down the classifications, so it could be useful to have some data with regards to this.

Reading Assignment: A Visual Approach to Sketched Symbol Recognition

Reference Information
Title: "A Visual Approach to Sketched Symbol Recognition"
Authors: Tom Y. Ouyang and Randall Davis
Citation: "A Visual Approach to Sketched Symbol Recognition", Tom Y. Ouyang and Randall Davis, Proceedings of the 21st International Joint Conference on Artificial Intelligence, pp. 1463-1468, 2009.

Summary
This paper presented an approach to symbol recognition for recognizing freehand drawings based on the visual appearance of the sketch instead of based on other features such as geometrical properties. Specifically, the purpose of drawing diagrams was mentioned when describing this method. A new symbol classifier was also presented.

By basing recognition on visual properties of a sketch, the system becomes less sensitive to differences in strokes. The recognizer was designed to handle both shapes and characters in order to be useful for the task of drawing diagrams. In addition, the recognizer was based off recognizers that perform off-line handwriting do to their high level of performance. The use of temporal properties of sketches were added to the off-line approaches.

For the recognition approach, the sketch is first resampled, scaled, and translated to normalize the data. Then, the gestures are converted to feature images, with a total of five features calculated for every point. The features include reference angle orientations and endpoints and are spanned across feature grids, each of which represents a feature image. The next step involves increasing noise tolerance by using Gaussian smoothing and downsizing the images. Then, the recognition is performed with template matching using an image deformation model (IDM). Performance was optimized by using coarse candidate pruning to eliminate some of the template candidates with nearest neighbors and hierarchical clustering to create a hierarchy of training examples that can be used to limit the number of templates that are checked during recognition. Rotational invariance is maintained by rotating the sketch at a series of different points, spanning 360 degrees.

Evaluation was performed by testing the approach against three different datasets, each with a different domain of input: handwritten characters, shapes, and circuits. Each of these was tested with four different classifiers in addition to the one proposed in this paper with user-independent cross validation. It was determined that the method presented in this paper performed better than the other algorithms that it was tested against for each of the domains that were tested and that the performance optimizations made a large difference in runtime.

Thoughts
I think it's interesting that this paper was designing an approach that is able to effectively recognize diagrams. Diagrams are used in a wide variety of fields, and the ability to recognize such sketches would be very useful. However, the mix of types of input associated with a diagram (both shapes and text) presents an interesting challenge for recognition. Having focused mostly on recognizers that use a single type of input so far, it was interesting to read about one that could handle multiple types of input. This seems as though it would be a very useful property to have for recognizers for many different domains.

The consideration of optimizations in order to improve runtime performance was a useful consideration. In addition, the method for obtaining rotation invariance was interesting. It could be useful to use a technique like this for other approaches, as well, in order to obtain rotation invariance; however, the performance consideration of creating multiple versions of the sketch at different rotations would have to be considered.

Thursday, April 11, 2013

Reading Assignment: ShortStraw: A Simple and Effective Corner Finder for Polylines

Reference Information
Title: "ShortStraw: A Simple and Effective Corner Finder for Polylines"
Authors: A. Wolin, B. Eoff, and T. Hammond
Citation: "ShortStraw: A Simple and Effective Corner Finder for Polylines", A. Wolin, B. Eoff, T. Hammond, Proceedings of the Fifth Eurographics Conference on Sketch-Based Interfaces and Modeling, pp. 33-40, 2008.

Summary
ShortStraw is a system created with the goal of being an easy-to-implement, accurate algorithm for corner finding with freehand sketches using the polyline corner finding method. Ideally, it is designed with the intention that it can be used for educational purposes such that beginning-level programming students can implement it without too much difficulty. Polyline corner finders find corners of gestures by finding the minimum set of points within the gesture that can be used as splitting points in order to retrieve a only a set of lines from the stroke.

ShortStraw is a bottom-up algorithm that first resamples the points to be equidistant, calculates the straw distance (the Euclidean distance between two resampled points within a constant window) between each resampled point, and then identifies corners as those points with the minimum values for their straw distance. Then, processing of the found corners occurs with a top-down approach in order to account missing corners or false positives. This processing includes using a line test on each pair of consecutive corners, calculating the distance divided by the path distance to see if it is within a particular threshold. If not within the threshold, a missing corner is detected within a midway window between the two points by using the minimum straw distance. False positives are removed by performing a collinear check.

Evaluation of ShortStraw occurred by testing the system on sketches done by students, then computing an all-or-nothing accuracy measure. This measure was compared to that of the Sezgin's and Kim and Kim's corner finding algorithms. Correct corners found accuracy (number of correct corners found / number of correct corners a human would percieve) and all-or-nothing accuracy (number of correctly segmented strokes / total number of strokes) were both used. It was postulated that all-or-nothing accuracy is a more important measure, since correct corners found can be manipulated by simply returning every point. It was determined that the all-or-nothing accuracy of ShortStraw was significantly better than that of the other corner finding algorithms that it was tested against. Ease of implementation was evaluated by having an undergraduate student implement the system, and it was determined that it was indeed simple to implement. In addition, it was determined that the algorithm runs quickly.

Thoughts
I thought that it was a great idea to aim to create a corner finding algorithm that is easy to understand and easy to implement for educational purposes. It makes the algorithm easy to understand for those reading the paper, and provides a means for introducing newer programmers to the field. I thought that it was even better, however, that the ShortStraw design was actually tested by having a student implement it. This provided a means to actually evaluate that particular goal of the design, instead of just jumping to conclusions and declaring that the algorithm is short, therefore it must be easy to understand.

In addition, I found the discussion of different accuracy measures to be very interesting. Instead of computing a single accuracy measure to report to the user, different methods of accuracy measure were provided, each with their own merits. Then, the all-or-nothing accuracy was determined to be the most important measurement. This is something to consider when reading other papers. Instead of just accepting a single measurement as the definitive accuracy measurement, this paper made it apparent that it should be taken into account what kind of accuracy measurement is being used and whether it is actually the best measurement for the given situation.

It was also useful to learn about polyline corner finding, since we have previously learned about Sezgin's temporal means of corner finding. This is useful as it adds to our knowledge another method for corner finding that may be more useful in some situations.

Tuesday, April 9, 2013

Reading Assignment: Sketch Recognition Algorithms for Comparing Complex and Unpredictable Shapes

Reference Information
Title: "Sketch Recognition Algorithms for Comparing Complex and Unpredictable Shapes"
Authors: Martin Field, Stephanie Valentine, Julie Linsey, Tracy Hammond
Citation: "Sketch Recognition Algorithms for Comparing Complex and Unpredictable Shapes", Martin Field, Stephanie Valentine, Julie Linsey, Tracy Hammond, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp. 2436-2441, 2011

Summary
This paper discussed applying sketch recognition techniques to the field of engineering education. Specifically, a system called Mechanix was created to recognize freehand sketches of engineering problems for educational purposes, such as for completing homework and exams. Feedback is provided to students on their sketches based on comparing the sketches to a solution provided by the class grader/professor by also using Mechanix. One of the benefits of Mechanix is that it provides an environment for sketching engineering problems that is very similar to the traditional pen and paper approach, providing a natural means of completing assignments that is easier to grade and contains instant feedback.

The algorithms used for Mechanix are designed to identify bodies (in free-body diagrams) and trusses, as well as to use recognition to compare them to the solution sketches provided by the grader. The system uses an online recognition process that runs after each separate stroke is sketched. Identification of bodies occurs by using the pre-existing PaleoSketch system to find any strokes that form a closed shape, which constitutes a body. Then, the body is compared for similarity to the single template solution provided by the grader by using a combination of the Hausdorff distance, the modified Hausdorff distance, and the Tanimoto coefficient. Trusses are recognized as a collection of polygons that share sides, and are identified by using PaleoSketch to build a connection graph to find at least two polygons that share an edge. The identified truss is then compared to the single template provided by the grader, using the connection graph and the properties of the sketch.

Mechanix was evaluated via user testing and with use during a single section of the ENGR 111 class. The students that used the system had positive feedback and it was determined that the recognition is relatively accurate with little possibility of being able to trick the system.

Thoughts
Mechanix is an application that has very practical purposes of improving both the educational experience for engineering students and the grading process for engineering graders. It was very refreshing to read about such a useful application that I have some prior experience with, and that the sketch recognition techniques that we've been learning can be applied to. Also, it was very useful that the prior work section of this paper conveniently mentioned how Mechanix improves upon or uses each of the prior systems, instead of simply discussing those systems.

In addition, the fact that part of the evaluation was able to occur by having the system used within an actual engineering class was very interesting and potentially quite beneficial. Since Mechanix was designed for that purpose, being able to test its usage within the same domain that it is intended to be used in, by potential actual users of the system, is a great opportunity to receive more accurate evaluation feedback.

Monday, April 8, 2013

Reading Assignment: A Domain-Independent System for Sketch Recognition

Reference Information
Title: "A Domain-Independent System for Sketch Recognition"
Authors: Bo Yu and Shijie Cai
Citation: "A Domain-Independent System for Sketch Recognition", Bo Yu and Shijie Cai, Proceedings of the 1st International Conference on Computer Graphics and Interactive Techniques in Australasia and South East Asia, pp. 141-146, 2003.

Summary
This paper discussed a system for sketch recognition that accepts freehand sketches as input and performs recognition by using stroke approximation with low-level geometric features without the use of domain-specific knowledge. The output of the recognizer is a hierarchical structure of recognition information, designed to be easily used by high-level applications that the system can be embedded within. The system was designed with a list of ideal attributes for a sketch recognition system in mind. These attributes include the ability to draw naturally, to produce consistent recognition, to understand hierarchical relations, to predict the sketch during drawing, to provide an efficient and easy-to-use interface for the user, and be easily integrated into other applications.

The system has two main stages: stroke approximation and post-processing. Stroke approximation occurs during sketching, with each stroke as input as the stroke is completed. It uses recognition techniques to approximate the shape of the stroke as compared to a set of low-level geometric shapes (lines, arcs, circles, ellipses, and helixes). Stroke approximation includes vertex detection, line approximation with feature-area verification, curve approximation, and the handling of shapes with intersecting features. The post-processing stage is performed after the entire sketch is complete, and consists of using the data from the stroke approximation phase to create shape relations and complete recognition. It includes relation retrieval, cleanup (removing useless elements of the strokes and merging strokes as necessary), and finally, object recognition to recognize the basic objects that may occur within the gesture.

The user interface itself is designed to allow for creation, deletion, and modification of sketches. The modification feature is the most emphasized portion of the user interface. An evaluation was conducted by testing the system with users. It was determined that it was easy to use, provides useful information as output, and can be easily integrated into other applications as intended.

Thoughts
The idea of creating an easy-to-use recognition system for freehand drawing that can be easily integrated into other applications seems like it could be very useful. In addition, the attributes of an ideal, practical recognition system that were put forth could be very useful with regards to the future design of sketch recognition systems; however, I was curious as to where these properties came from. There is no source provided for the properties, so have they originated from research studies that have been conducted or are they simply the authors' opinions of the ideal attributes that a sketch recognition system should have? Since these are guidelines that the system is compared against throughout the paper, this could be an important distinction to make.

It was interesting to learn about methods for recognizing freehand sketches, as we have mostly focused on single-stroke sketches thus far. Also, the methods for vertex approximation mentioned the noise that is produced when using timing data, which is an interesting result to apply to the information that we learned previously of Sezgin's corner finding methods that used timing data.

Tuesday, March 26, 2013

Reading Assignment: A Few Useful Things to Know About Machine Learning

Reference Information
Title: A Few Useful Things to Know About Machine Learning
Author: Pedro Domingos
Citation: "A Few Useful Things to Know About Machine Learning", Pedro Domingos, Communications of the ACM, pp. 78-87, 2012.

Summary
This paper provided an overview of machine learning techniques at a broad scope. Machine learning consists of using learning algorithms with large amounts of training data to generalize the set of possible data in order to classify new, unknown data that is discovered. The focus of this paper was on classification algorithms, along with some key lessons in the field of machine learning regarding classification. Evaluating classifiers involves having a formal representation, an objective function for evaluation, and optimization. It should be noted that testing and training data should always be kept separate for more accurate evaluation statistics. This can be remedied by a few different solutions, including using cross validation.

Machine learning is all about generalizing the data that is given. A learner uses not only training data, but extra assumptions or knowledge about the domain, as well ("no free lunch" theorems). Some problems with the generalization include overfitting, underfitting, multiple testing, and the curse of dimensionality (generalizing becomes more difficult as more features are added into the input). Some theoretical guarantees that are incorrect were mentioned, including the fact that there are no exact guarantees on the bound of the number of examples needed, and that infinite data does not necessarily lead to a correct classifier. The importance of choosing the correct features was reiterated. While relative, independent features are the most useful, it is difficult to know what these features are when simply presented with the raw data of an input, since the input data tends to be more observational than experimental. In addition, it is more important to have large amounts of data for training purposes instead of having a more complex learning algorithm, but scalability must be taken into account when using lots of data. As for choosing the "best" learning algorithm, it really depends on the particular domain and application for which it is being used.

Finally, combining learning algorithms was discussed by using model ensembles to create more accurate learners from running the data through multiple classifiers. Some combination techniques include boosting, bagging, and stacking.

Thoughts
I have previously taken a course in machine learning, so I knew most of the information that was presented within this article. However, it was a good refresher for the information that I did know, and it provided some very good information regarding lessons and myths in machine learning that I did not previously know about.

For example, it was very interesting to learn about some of the details of possible machine learning problems, such as overfitting, multiple testing, and the curse of dimensionality. It was very useful that possible (and best) solutions were presented for each of these problems. Overall, the article was written in a very understandable format that made for an enjoyable and informative read.

Reading Assignment: The Word-Gesture Keyboard: Reimagining Keyboard Interaction

Reference Information
Title: The Word-Gesture Keyboard: Reimagining Keyboard Interactions
Authors: Shumin Zhai and Per Ola Kristensson
Citation: "The Word-Gesture Keyboard: Reimagining Keyboard Interactions", Shumin Zhai and Per Ola Kristensson, Communications of the ACM, pp. 91-101, 2012.

Summary
This paper discussed word-gesture, an alternative method of interaction for text input when using touch-screen keyboards, such as on mobile devices. The interaction consists of the action of swiping a single finger across the soft keyboard on a touch screen, running the finger consecutively across each letter of a word in one, fluid motion. It is designed to be a faster method of text input than using a traditional physical keyboard, but it still uses the same keyboard design so it is intended to be easy to learn and with the ability to improve usage speed with time. The faster speed of the word-gesture system stems from the fact that only a single continuous motion made with one finger is necessary to create a word and spaces are automatically inserted between the words. Ease of use comes from the fact that users are already familiar with the keyboard design, gestures come more naturally than the traditional usage of physical keyboards, and that no gestures are required to be learned since the user simply follows the pattern of keys that are visible on the screen.

The shape of the gesture that is created with this motion is compared against a set of pre-known gestures that are already associated with words in order to perform gesture recognition. The ability to enter commands (such as copy and paste) was added along with the ability to type words. Indexing and pruning are used to make the searching of the known gestures feasible for mobile devices.

The word-gesture method was tested through some experimentation along with releasing it as an application for mobile devices in order to receive feedback from real users of the system. The experiments included testing users on the ability to memorize gesture shapes and the speed of users' initial uses. Reviews made by users of the released application for mobile devices were used for evaluating the general conception of the system itself. One of the major contributions of word-gesture is that by releasing it as an actual product for people to use, the idea became more widespread, allowing for the proliferation of this new technique of text input.

Thoughts
It was really great to be able to read this paper about a recent system that is now in fairly widespread use in daily life. I have seen variations of this system on my own phone and therefore can relate my own experience with it to the information gained from reading this paper. Many of the papers that we read discuss systems that are not well known; however, the contribution of this paper is known by many now. This made it very interesting to learn how this method that I was previously aware of actually works.

I think that it is a very interesting concept to introduce a new method like this that relies mostly on previously-known concepts such as the physical keyboard. The fact that the user is not required to learn any new gestures, but simply to apply a new type of motion to a well-known system, is a very intriguing idea. It makes one start to think about what other types of new interaction can be applied to existing systems in order to improve upon their usage.

The focus on human psychology that was used in the creation and evaluation of the method was great to read about, since it was discussed why this method actually works. In addition, it was great to see that the main evaluation of the system occurred by putting it into real-world usage and obtaining actual reviews of the product, instead of simply running lab experiments to try and approximate real-world usage.

Monday, March 4, 2013

Reading Assignment: Sketch Based Interfaces: Early Processing for Sketch Understanding

Reference Information
Title: Sketch Based Interfaces: Early Processing for Sketch Understanding
Authors: Tevfik Metin Sezgin, Thomas Stahovich, Randall Davis
Citation: "Sketch Based Interfaces: Early Processing for Sketch Understanding", Tevfik Metin Sezgin, Thomas Stahovich, Randall Davis, PUI, 2001.

Summary
This paper discussed a system that was implemented for processing freehand sketching in an attempt to provide a method for natural interactions with a user interface. The interpretation of the freehand gestures into geometric descriptions was discussed, for a representation that can be used by the system easier. However, the interpretation part that was discussed within this paper is intended to be only the first part of a larger system that can provide understanding and interaction using freehand sketching.

Freehand sketching is more complicated than working off of a set of predefined shapes, since anything can be sketched but it still must be able to recognized by the system. Therefore, preprocessing is used in order to distinguish corners from curves to recognize the low-level geometric properties of the gesture. The processing stage includes three phases: approximation, beautification, and basic recognition. Approximation includes finding the vertices at the corners of the gesture by using a hybrid fit with both the stroke information and the timing information associated with sketching the gesture. It also consists of determining the curved sections of the gesture. The approximated data is then used within the beautification phase to improve the appearance of the gesture. The beautified data is used in the basic recognition phase to recognize basic geometric properties from the data.

The system was evaluated using a user study in which participants sketched a set of gestures using the system. Results from the evaluation labeled the system as easy and natural to use due to the ability to draw freehand gestures. In addition, it was determined that the system could efficiently and correctly interpret the freehand shapes that were drawn.

Thoughts
The intention of providing a system for allowing users to apply freehand sketching within user interfaces is an appealing idea. Something like this would open up a number of different interactions with user interfaces that have never been possible before. Since this paper simply described a single part in the process of creating such a system, it would be interesting to find out more about other parts of the system.

One of the most notable findings mentioned within this paper is the fact that timing data can be used to interpret gestures. In particular, it was explained that a user slows down when drawing corners in gestures, allowing corners to be recognized by timing data. One distracting thing that I noticed throughout the paper was that the usage of bounds and thresholds was mentioned multiple times but not how the values of those bounds and thresholds were actually determined. Nevertheless, the contributions of this paper regarding the recognition of freehand sketches seem to have been very important to the field of sketch recognition.

Thursday, February 28, 2013

Reading Assignment: Protractor: A Fast and Accurate Gesture Recognizer

Reference Information
Title: Protractor: A Fast and Accurate Gesture Recognizer
Author: Yang Li
Citation: "Protractor: A Fast and Accurate Gesture Recognizer", Yang Li, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 2169-2172, 2010.

Summary
This paper discussed Protractor, which is a template-based gesture recognizer that uses nearest neighbor for classifying new gestures based on a set of pre-classified training gestures. The nearest neighbor approach works by comparing an unknown gesture at runtime to the training examples, and classifying the unknown one as that of the class of its K nearest neighbors in terms of similarity. The method used for comparing the similarity of gestures with Protractor is claimed to be novel, using minimum angular distance to calculate similarity. This contributes to the recognizer's accuracy, speed, and minimization of memory usage. It was suggested that these features of Protractor make it a good choice for use on mobile devices, where memory and processing power are in limited supply. The usage of nearest neighbors also allows for personalization by users. Preprocessing of gestures was performed before comparison, involving resampling and noise reduction similar to that of the $1 recognizer. One of the more unique features of Protractor is its ability to recognize variation with regards to the orientation of a gesture.

Protractor was compared to the $1 recognizer during evaluation, and a similar experiment was conducted using the same data that the $1 recognizer was tested on in its comparisons to other recognizers. The results of the experiment showed that Protractor performs faster, no less accurately, and uses less memory than the $1 recognizer. In addition, the experiment was performed on a mobile device to analyze its usefulness for mobile applications.

Thoughts
The main concern that I had while reading about Protractor is the slowness that generally comes with the usage of nearest neighbors algorithms. Since the algorithm requires comparisons at runtime to each of the known gestures, the runtime computations could be costly.Therefore, I was rather surprised that the experiments run on a mobile device showed it to be a feasible solution for mobile applications. I thought that this was a great test to perform, as memory and time constraints are something that must be considered for mobile platforms.

In addition, the ability to use a nearest neighbors recognizer that is both fast and that requires smaller amounts of memory is beneficial due to the amount of personalization that can be provided by its use. Since the nearest neighbors algorithm simply needs to compare gestures based on their associated data, it is much simpler for gestures to be added to the database, allowing for more opportunities for user-customization. This attribute of Protractor could allow it to be applied to more applications. Combined with its ability to run reasonably on mobile devices, this opens up even more possibilities for usage of the Protractor recognizer.

Thursday, February 14, 2013

Reading Assignment: PaleoSketch: Accurate Primitive Sketch Recognition and Beautification

Reference Information
Title: PaleoSketch: Accurate Primitive Sketch Recognition and Beautification
Authors: Brandon Paulson and Tracy Hammond
Citation: "PaleoSketch: Accurate Primitive Sketch Recognition and Beautification", Brandon Paulson and Tracy Hammond, Proceedings of the 13th International Conference on Intelligent User Interfaces, pp. 1-10, 2008.

Summary
This paper discussed a recognizer of low-level, primitive gestures that produces beautified versions of the gestures. The motivation behind the creation of this recognizer was to provide a means for integrating sketch recognition into user interfaces for freely-drawn sketches. The idea of the recognizer is to be able to recognize primitive gestures that then provide a foundation for creating more complex shapes by combining primitive shapes hierarchically. In order to improve upon other sketch recognition algorithms, two new features were added (NDDE and DCR) and a new ranking algorithm was used.

The recognizer works by taking a single stroke, calculating normalized distance between extremes (NDDE) and direction change ratio (DCR), then sending the data through a series of recognizers for each primitive that the system is designed to recognize (line, polyline, ellipse, circle, arc, curve, spiral, helix, and complex). The results of each recognizer are sorted into a hierarchy and ranked.

Experiments were conducted to collect drawn shapes from users, train the system on that data, and then test it against more data collected from users. The data was tested on both the recognizer described in this paper and other notable recognizers. It was shown that the recognizer has improved accuracy of recognition as compared to the other algorithms and that it also recognizes more primitives. Accuracy is most notably improved with regards to recognizing polylines and curves.

Thoughts
A motivation of this work that was discussed, providing an easier means of integrating sketch recognition into user interfaces, is very similar to that of the quill system that we read about in a previous reading assignment. Other topics that were mentioned in this paper that we have read about in previous assignments included the Sketchpad, Rubine, and Long work that were all cited as previous work that influenced the recognizer discussed within this paper. The previous reading assignment regarded a hybrid recognizer that was mentioned as a future goal within this paper.

I found the fact that a gesture is run through multiple recognizers, one for each primitive shape that the system is capable of recognizing, to be very interesting. Since the results of each are ranked, this would be useful for when a particular gesture is similar to multiple types of shapes, since each shape's likelihood of recognition would then be ranked. Also, the idea of building up complex shapes from a series of primitives seems like a very useful process for recognizing complex gestures.

Wednesday, February 13, 2013

Reading Assignment: What!?! No Rubine Features?: Using Geometric-Based Features to Produce Normalized Confidence Values for Sketch Recognition

Reference Information
Title: "What!?! No Rubine Features?: Using Geometric-Based Features to Produce Normalized Confidence Values for Sketch Recognition
Authors: Brandon Paulson, Panjaj Rajan, Pedro Davalos, Ricardo Gutierrez-Osuna, Tracy Hammond
Citation: "What!?! No Rubine Features?: Using Geometric-Based Features to Produce Normalized Confidence Values for Sketch Recognition", Brandon Paulson, Pankaj Rajan, Pedro Davalos, Ricardo Gutierrez-Osuna, Tracy Hammond.

Summary
This paper discussed a hybrid approach for sketch recognition that combines gesture-based recognition methods and geometric-based recognition methods. Gesture-based recognition uses the stroke properties of the gesture to classify gestures into a particular gesture class. Geometric-based recognition uses the geometric properties of the sketch itself to classify it as a geometric shape. The idea of a hybrid approach was to use the best aspects of each type of recognition to create an improved recognizer for natural sketches with normalized confidence values.

A set of 44 features were used, with 13 gesture-based features (Rubine's) and 31 geometric-based features. Feature subset selection was performed with this set of features in order to determine those features that were the most important for accurate recognition. It was determined that the geometric-based features were selected as being more signification for the given data set than the gesture-based features.

Thoughts
We haven't discussed geometric-based recognition much yet in class, so this paper provided a great, general explanation of what it is. The idea to combine the two sketch recognition methods, gesture-based and geometric-based, into a hybrid recognition system seems like it could be very advantageous due to the combination of the different kinds of techniques. I found it particularly interesting that the feature selection resulted in demonstrating that the geometric features were much more significant than most of the gesture features, even though the Rubine features that were used as the gesture features are a common method for sketch recognition. It would be interesting to determine exactly why the geometric features were chosen as being more significant and if it may be based on the particular data that was used to test the features.

I also liked that this paper built on work that we have seen in previous reading assignments, such as the papers describing the Rubine and Long features. It provided a means for showing ways that the topics discussed in the previous papers have influenced future research.

Monday, February 11, 2013

Reading Assignment: Visual Similarity of Pen Gestures

Reference Information
Title: Visual Similarity of Pen Gestures
Authors: A. Chris Long, Jr., James A. Landay, Lawrence A. Rowe, and Joseph Michiels
Citation: "Visual Similarity of Pen Gestures", A. Chris Long, Jr., James A. Landay, Lawrence A. Rowe, Joseph Michiels, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 360-367, 2000.

Summary
This paper discussed a set of experiments that were conducted in order to create a model for predicting the perceived similarity of gestures. The results of the experiments were used to create the gesture design tool, quill, that was discussed in the previous reading assignment. The motivation behind this research and the tool is that gestures are often difficult for users to remember and recognize. Therefore, the authors wanted to help gesture designers to create improved gestures such that they are easier to recognize by both humans and machines by developing an algorithm to compute the similarity between gestures.

Two experiments with participants were conducted, each designed to determine what properties of a gesture can lead a user to find it similar to other gestures. The experiments were designed with prior work in mind, including work with gesture features (such as Rubine and MDS) and psychological research regarding. The first experiment consisted of showing participants sets of animated gestures and having the participant select the gesture in the set with the least similarity to the others. From the resulting data, a set of features designed to accurately measure similarity was created and a set of equations for prediction were developed. In addition, it was determined that the similarity decisions were participant-dependent. The second experiment was similar to the first, but it allowed the prediction equations from the first experiment to be tested. It was determined that the predictions worked reasonably well and that the perceived similarity can be reasonably related to the features that were calculated for each gesture.

Thoughts
It was very helpful to read the details of the experiments that were briefly mentioned in the previous reading assignment. It made the previous paper much easier to understand, and the amount of detail that was discussed regarding these experiments was very welcoming compared to the lack of details in the previous paper. I found it very interesting that not only was prior work in gesture recognition used to design the experiments, but that psychological research was considered, as well. The development of features based on experiment data seemed like a great idea, as did the fact that a second experiment tested the developments that resulted from the first experiment. It seemed like a very thorough development process.

Some of the details regarding the experiments were debatable, however. For instance, the usage of only a student population for participants, while convenient, may not be the best representation of users for a gesture design tool. In addition, the fact that the gestures were not drawn by the users, but that animations were viewed instead, may have skewed the results, as well. It would be interesting to conduct similar experiments that take these factors into account in order to see whether the results are affected.

Wednesday, February 6, 2013

Reading Assignment: "Those Look Similar!" Issues in Automating Gesture Design Advice

Reference Information
Title: "Those Look Similar!" Issues in Automating Gesture Design Advice
Authors: A. Chris Long, Jr., James A. Landay, Lawrence A. Rowe
Citation: "'Those Look Similar!' Issues in Automating Gesture Design Advice", A. Chris Long, Jr., James A. Landay, Lawrence A. Rowe, Proceedings of the 2001 workshop on Perceptive User Interfaces, pp. 1-5, 2001.

Summary
This paper discussed quill, a gesture design tool to help interface designers with the creation of pen-based gestures. It provides the user with unsolicited advice by actively offering users design advice as they create gestures. The advice consists of warnings that appear while the user is creating gestures. It warns if gesture classes are too similar and if a gesture can be easily misrecognized. The idea is to provide a tool that helps novice interface designers to create improved gestures that can be easily recognized by both computers and people.

Various experiments were conducted. The first set of experiments consisted of participants judging similarity between gestures. It allowed a model to be created to recognize gestures that people can easily confuse with other gestures due to similarities. It was determined that the similarity predictions could be wrong, however.

A set of issues regarding the advice was presented, as well. It regarded interface challenges, such as the timing of presenting warnings to users, the amount of advice presented to users, and the content of such advice. Background processes and hierarchical structures were also discussed. It was hoped that the advice presented regarding these issues could be used in the future to improve other gesture techniques.

Thoughts
I think that it's a great idea to create a tool for those unfamiliar with gestures to easily create and improve upon them. I liked that preliminary experiments regarding gesture design were conducted in order to determine a foundation on which to base the tool that was created. In addition, the advice that was presented from the research and experiments that were conducted seems like it could be very helpful to apply to further studies of this nature. However, the paper seemed to be lacking in implementation details about the tool. Also, it was mentioned that a formal evaluation of quill occurred and that some conclusions were made based on it; however, the evaluation itself was never discussed. In addition, some of the conclusions drawn about presenting advice did not seem to explain the reasoning that backed up the conclusion. It would have been helpful to provide a description of how this conclusion was reached, or to have conducted further experiments to test the validity of the conclusion.

Tuesday, January 29, 2013

Reading Assignment: Gestures without Libraries, Toolkits, or Training: A $1 Recognizer for User Interface Prototypes

Reference Information
Title: Gestures without Libraries, Toolkits, or Training: A $1 Recognizer for User Interface Prototypes
Authors: Jacob O. Wobbrock, Andrew D. Wilson, Yang Li
Citation: "Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes", Jacob O. Wobbrock, Andrew D. Wilson, Yang Li, Proceedings of the 20th annual ACM symposium on User interface software and technology, pp. 159-168, 2007.

Summary
This paper discussed the $1 recognizer, a gesture recognition algorithm designed to be cheap, simple, and easy-to-use such that even novice programmers can include it in their own interface systems. Many gesture recognition algorithms rely on complicated math (such as that presented in the Rubine paper) or computationally expensive methods that limit the number of programmers that can implement such a system.

When given a gesture as input, the the $1 recognizer works through a series of four steps to recognize the given gesture. The first step resamples the path of the gesture to produce a path with N equally-spaced points. Step two rotates the gesture based on an angle found using a seed and search approach. Step three scales the gesture non-uniformly, then translates it to a particular reference point. Finally, step four conducts the recognition by comparing the modified gesture to a set of stored templates. Limitations of this recognizer include requiring comparison to templates, being rotation, scale, and position invariant, and being unaware of time as related to the gestures.

The $1 recognizer was tested against two other algorithms, the Rubine classifier and Dynamic Time Warping (DTW). Tests were conducted by having users provide a series of gestures at varying speeds, then using the three algorithms to recognize the gestures. It was determined that medium speed gestures are recognized more accurately than slow or fast gestures, most likely due to the balance between speed and accuracy. It was also determined that the $1 recognizer had accuracies similar to DTW and better than Rubine for the experiment that was conducted.

Thoughts
I think that it's a great idea to provide a recognition algorithm that is simple, easy-to-understand, and easy for novice programmers to include in their own work. This could serve to not only increase awareness of gesture recognition, but also to increase the number and range of ideas surrounding gesture recognition by having systems implemented by a much wider range of programmers with differing backgrounds.

This paper made for a very good read, and it was easy to understand the workings of the algorithm. It made many references to the Rubine paper that was also assigned as reading for this course, so having previously read that paper made it much easier to understand some of the motivations and the structure of the recognizer mentioned here. In addition, having been written in 2007, this paper is much more current than the others that we have read so far. While it is nice to see the foundations of sketch recognition, it was also nice to read of some more current technology and how it actually applied the foundations that we have learned.

One problem that I have with the paper is that it repeatedly mentioned a major goal of the recognizer being that it should be easily usable by novice programmers; however, it is later mentioned that the programming ease has yet to be tested. Therefore, it is unknown whether or not the algorithm actually accomplishes this major goal that was set for it. However, the evaluation that was provided was very helpful in seeing how this recognizer compares with others.

Monday, January 28, 2013

Reading Assignment: Specifying Gestures by Example

Reference Information
Title: Specifying Gestures by Example
Author: Dean Rubine
Citation: "Specifying Gestures by Example", Dean Rubine, SIGGRAPH '91 Proceedings of the 18th anuual conference on Computer graphics and interactive techniques, pp. 329-337, ACM New York, NY, USA, 1991.

Summary
This paper discussed gesture-based interfaces, specifically GRANDMA, which is an object-oriented toolkit for applications with direct manipulation interfaces. It allows for gestures to be added to the interface without being hand coded. A gesture in this sense is a stroke made by a device such as a stylus or a mouse. The gesture recognition toolkit results in a recognizer that is trained from examples of gestures to be able to recognize new gestures that are input to the system.

A gesture-based application, GDP, was described and the GRANDMA toolkit was used to provide gesture recognition for the interface. Gesture classes were used for sets of associated gestures, and are arranged into a hierarchical structure. The GRANDMA toolkit works similar to the Model/View/Controller format, associating an input handler with a view class in order to provide all of its instances and subclasses with access to it.

A limitation of GRANDMA includes the fact that only single stroke gestures are allowed, eliminating the possibility of using more complex symbols. However, it allows for faster recognition, accomplished with a two-phase interaction technique (combining both gestures and the direct manipulation property of the interface) and eager recognition (recognition of unambiguous gestures). Multi-finger recognition was implemented by processing each finger's stroke as a separate, single stroke then combining them to create a multi-path gesture.

Gesture recognition occurs by first calculating a set of features based off of the various properties (i.e. angles, lengths, etc.) of the gesture, then using each feature to classify the given gesture into one of a set of gesture classes. The classifier is trained using a set of example gestures with an appropriate variance.

The importance of gesture-based interfaces was emphasized multiple times throughout the paper, namely for the ability to improve interactions between humans and computers. It was hoped that this may encourage further integration of gesture recognition in interfaces.

Thoughts
This paper provided a great deal of information regarding gesture recognition. I found it to be very helpful for understanding some of the basic problems and approaches associated with such recognition systems. Published in 1991, this paper strived to encourage further interactions between humans and computers, namely with gesture recognition techniques, including multi-finger touch recognition. This is a topic that is still being emphasized today, although improvements and wider usage has occurred. The system presented, GRANDMA, is an object-oriented system that can apply a hierarchical structure to classes of gestures. The object-oriented nature with the hierarchical structure reminded me of the Sketchpad paper read previously, with its usage of hierarchical structures to easily organize the system and to provide simple extensibility.

An important point of this paper is that it presented a simple, fast gesture recognition algorithm. The extensive use of features for classifying various attributes of a particular gesture distinguished the different properties of a stroke that can be used to compare the differences of various classes of gestures. The use of a classifier for recognizing gestures was simple and easy-to-understand, despite the mathematics associated with calculating the features. The simplicity of this algorithm, combined with its extensibility, provided a foundation for further gesture recognition systems to build upon.

Tuesday, January 22, 2013

Reading Assignment: Sketchpad: A Man-Machine Graphical Communication System

Reference Information
Title: Sketchpad: A Man-Machine Graphical Communication System
Author: Ivan E. Sutherland
Citation: "Sketchpad A Man-Machine Graphical Communication System", Ivan E. Sutherland, DAC '64 Proceedings of the SHARE design automation workshop, pp. 6.329-6.346, ACM New York, NY, USA, 1964.

Summary
This paper discussed Sketchpad, a system designed for allowing users to create line drawings on a computer. The interaction was accomplished by using a light pen for indicating points on the screen and a set of push buttons for accomplishing various commands. The image could be created, manipulated (such as zooming and rotating), and observed on a display screen. An image can consist of any number of subpictures consisting of various symbols. Instances of drawings can be copied. In addition, constraints can be applied to parts of the drawing in order to apply mathematical and geometrical conditions to the image. The structure of the pictures is stored in memory using a ring structure for maintaining the information about a drawing. A generic, hierarchical structure was used for implementing Sketchpad, in which generic functions exist that call more specific subroutines, such that specific operations can be easily added to the system and be called by the generic functions.

The system is designed to aid users with designing and drawing images. It is useful for storing and modifying drawings, increasing understanding of complicated designs, and creating repetitive drawings. In particular, it is useful within fields that could benefit from an easy way to understand and duplicate images, such as engineering.

Critique
Sketchpad seems to be a very important development in computer science, particularly in the areas of sketch recognition and human-computer interaction. Reading about it now, the use of push buttons seems rather out-dated and cumbersome compared to the touch screens that we currently have, but the ability to use a computer to create drawings had to start somewhere. It introduced a novel way of interacting with computers, using a light pen and push buttons to create images on a screen. This is a form of interaction that is often taken for granted in current times, where pens and fingers can be used with touch screens to draw out images with a computer. It is amazing to think that Sketchpad provided these abilities to draw and manipulate images on a machine back in 1964. This paper really shows how far we have come in that time, and yet the similarities that occur despite the time passing shows the importance of the ideas discussed with the Sketchpad system. It mentioned ideas that are now fully fledged systems that are used on a day-to-day basis in many different fields, such as software for aiding both artistic drawings (such as animations) and engineering designs (such as bridge design, circuit design, etc.). Reading this paper makes me interested in learning just how the ideas mentioned in this paper affected the future of computer science and how we have gotten from a system like Sketchpad to the systems that we have today.

I also found it interesting that a generic structure was emphasized for the implementation of Sketchpad, such that a hierarchical structure exists going from general to more specific functions. This was done to provide the ability to easily extend the system. While these ideas have been around for a long time, they are continuously being highlighted in many programming methods today, specifically with object-oriented programming approaches.

Wednesday, January 16, 2013

Introduction

E-mail address: shoffmann@neo.tamu.edu

Class standing: 2nd year Master's student

Why are you taking this class?
Sketch recognition sounds like an interesting topic, so I would like to learn more about it. Also, I have had classes with Dr. Hammond in the past and really enjoyed them, so I thought it'd be nice to take another class taught by her.

What experience do you bring to this class?
I have experience with multiple programming languages, especially C++ and Java, from working on various types of projects (including design, AI, game development, etc.)

What are your professional life goals?
I would like to work in the game development industry. Within that field, I simply want to program, preferably working in areas that I enjoy such as dealing with artificial intelligence or human-computer interaction. In addition, I would like to be able to inspire and help others in the field of computer science.

What are your personal life goals?
My personal goals are to achieve my professional goals and to make decisions that both make me happy and that I can be proud of.

What do you want to do after you graduate?
After graduation, I will be working as a software developer for a large software company.

What do you expect to be doing in 10 years?
I expect to still be programming in 10 years. I would like to be developing video games, hopefully doing something that allows me to apply both my knowledge of AI and physics within that field.

What do you think will be the next biggest technological advancement in computer science?
I think that there will be advancements in how we actually interact with devices. Recently, we have had smart phones, tablets, and all other sorts of mobile devices become the "next big thing". While a new type of device is entirely possible as the next advancement, I think that we're more in need of improvements on how we actually interact with and perceive interacting with the devices that we currently use.

If you could travel back in time, who would you like to meet and why?
There are so many great people that have influenced the world to be how it is today, so I don't think that I could pick just one person to meet. It would have to be someone with great, novel ideas that maybe was not entirely understood at the time so that I could discuss with them how and why they came up with those ideas. But if I could travel back in time, why can't I travel forwards, as well? I think that would be much more entertaining, to see how things have changed and improved in a future that I would not normally be able to see within my lifetime.

Describe your favorite shoes and why they are your favorite?
My favorite shoes are my purple Converse, because they are purple. And cool.

If you could be fluent in any foreign language that you're not already fluent in, which one would it be and why?
I would like to be fluent in German, because I find it to be a very interesting language that sounds really cool.

Give some interesting fact/story about yourself.
I have a physics minor that I'm still trying to find a good use for.