Tuesday, April 23, 2013

Reading Assignment: Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches

Reference Information
Title: "Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches"
Author: Michael Oltmans
Citation: "Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches", Michael Oltmans, 2007.

Summary
This thesis presented a solution to recognizing free-drawn sketches using visual properties of the sketch. Techniques from the computer vision field are used to apply visual properties of the sketch to the recognition process. It is designed to be an interruption-free system that does not provide feedback to the user; however, this means that the recognition shortcuts that come with feedback systems cannot be used. Issues with signal noise, conceptual variation, overtracing, and segmentation were addressed and believed to be handled effectively by the fact that this system uses visual properties.

The approach that was used here considered the classification of isolated shapes, in which shapes are considered to be collections of visual parts and associated conceptual variations. The visual parts are based on shape context features and are constructed by creating a circular "bullseye", centered around points every 5 pixels in the image. The bullseye consists of a central point with concentric rings around it, cut into cross sections. The cross sections are rotated slightly from being horizontal. The rings increase in size as they go out from the center, at a logarithmic scale, such that the rings in the center provide more detailed information and the outer rings provide more context than detail. Each region of the bullseye is treated as a point bucket, counting the number of points of the sketch that reside within it.  A vocabulary of parts is created with the parts from each of the training examples. A support vector machine is trained to the training data, then classification occurs by measuring the difference between a sketch and the vocabulary parts by calculating a match vector from the distances of the bullseye histograms. An isolated shape recognizer assigns a set of labels to shape parts, while the full sketch processor identifies the components of a full sketch.

In addition to visual information, temporal information about the stroke is also used in order to determine the direction of the stroke. Features are made to be rotationally invariant by aligning them horizontally and calculating two different histograms for each bullseye, one for the original and one for its reverse.

The approach was evaluated by testing the isolated shape recognizer and the full sketch processor separately on sets of circuit symbols and shapes. The classifier was compared to one based on Zernlike moments and it was determined that this proposed approach outperformed the other classifier.

Thoughts
The approach presented within this paper was very intriguing, and raised some interesting questions with regards to sketch recognition. The use of visual properties for classification as opposed to relying solely on stroke information or geometric properties is one of the more prominent features of this approach. It's like using a mix of offline and online properties, such as essentially creating point buckets for comparisons while using temporal data to determine the direction of the strokes. The fact that it helps to eliminate noise is very useful, and it would be interesting to see if this method performs better in some domains rather than others. For example, it was mentioned within the paper that different match vector approaches work better for various domains, such as photographs vs. sketches.

Also, there were some questions regarding the method itself. For example, the details regarding some areas of the paper seemed to be a bit vague, such as the resampling processes that were used. In addition, it would be interesting to compare the runtime of this system against other similar systems. It seems as though creating a new shape context feature every 5 pixels along the sketch strokes could result in an excessive amount of computations that might bog down the classifications, so it could be useful to have some data with regards to this.

No comments:

Post a Comment