Sketch Recognition Course (TAMU CSCE 624): October 2017

Monday, 30 October 2017

Assignment 26: Reading 22 - Wolin ShortStraw

Bibliography:
Aaron Wolin, Brian Eoff, and Tracy Hammond. ShortStraw: a simple and effective corner finder for polylines. Proceedings of the Fifth Eurographics conference on Sketch-Based Interfaces and Modeling. pp. 33-40. 2008.

Summary:
This paper explains a simple corner finding algorithm, that is is highly accurate for polyline corner finding. The authors claim its simplicity and high accuracy as motivating reasons to use it. They also provide a pseudocode for the algorithm in the appendix. ShorStraw involves resampling the points, finding the 'straw distance' and recognizing points with minimum straw distances as corner. The resampling process is done based on a 'interspacing distance' for points, found by length of diagonal divided by a constant. This constant was set to 40, determined empirically. After resampling the points, 'straw' distances are found for every point, with its neighbours. Corners tend to have a small value for the straw distance. After the initial processing to find the corners, some higher level processing is done to identify missing corners and remove false positives. To add missing corners, consecutive corners are checked to see if they form a line. If not, the system looks for an additional corner inbetween these points. The process is repeated till all consecutive corners form lines. For eliminating false positives, three consecutive points are checked for collinearity. If true, then the middle point is removed. The algorithm was tested based on 2 accuracy measures, namely correct corners found' and 'all or nothing' accuracy. ShortStraw was found to have high values for both, compared to Sezgin's and Kim's corner finders.

Discussion:
I find ShortStraw to be a great example of simplicity and minimalism. It does one task and does it very well. It would be interesting to see ShortStraw used as a component in an algorithm such as Sezgin's, so that it can handle polylines in a more complex algorithm, and also be used as a feature for identifying complex shapes.

Assignment 25: Reading 21 - Yu Sketch

Bibliography:
Bo Yu and Shijie Cai. A Domain-Independent System for Sketch Recognition. Proceedings of the 1st international conference on Computer graphics and interactive techniques in Australasia and South East Asia. pp. 141-146. 2003.

Summary:
This paper talks about a domain independent low-level sketch recognition system, whose output is organized in a hierarchical structure, with semantic and syntactic information, along with the raw data. The system that can handle curves and hybrid shapes, as well as it handles polylines. The author aimed to solved two issues in existing systems,
1) Ability for user to sketch naturally. 2) Decompose hybrid and
smooth curves into primitive shapes.

The system records coordinates of the strokes, along with time. The process is divided into 2 stages, imprecise stroke approximation and post-process. The stroke approximation is done immediately (to either lines, curves, circles, ellipses or helixes), giving the user rapid feedback. Once the sketch is done, postprocessing is done and gives a set of hierarchical outputs, lowest level being raw-data and highest level being semantic relations table.

The imprecise approximation is done by looking at the direction and curvature graphs of the strokes. Feature-area is used as a guide during this phase. One main feature of this paper is that, it does not identify vertices in a seperate stage, and is combined with the approximation phase, allowing more information to be used and taken. If a given stroke, cannot be approximated to a primitive shape, it is recursively broken down at the highest curvature curve point and recursively checked. (This process is a little different for self-intersection strokes). The system also provides some basic object recognition, that are common accross all domains.

The system was evaluated by ten randomly chosen students, who all thought it was very easy to use. The system achieved 98% in correctly identifying primitive shapes, about 70% for hybrid shapes with smooth curves and 90% for other hybrid shapes.

Discussion:
The idea of recursively breaking down shapes and trying to create primitive approximations for them is very interesting. It's nice to see that curvature and direction data can get us a long way in primitive shape identification. One thing that was not clear to me was the rapid feedback. Won't the rapid feedback affect the flow of the user while sketching? If the system got the recognition wrong, its better to leave the original stroke as is (like LADDER), than recognizing it incorrectly.

Wednesday, 25 October 2017

Assignment 24: Reading 20 - Sezgin Corner

Bibliography:
Tevfik Metin Sezgin, Thomas Stahovich, and Randall Davis. Sketch Based Interfaces: Early Processing for Sketch Understanding. Proceedings of the 2001 Workshop on Perceptive User Interfaces. pp. 1-8. 2001.

Summary:
The paper describes a method for early stage processing of sketches. The motivation of early stage processing of sketches is to make sketching a natural way to comunicate with computers, without having to deal with icons, menus and tool selection. The system consists of three phases 1) Approximation 2) Beautification 3) Recognition.

Approximation is done by first finding vertices, then detecting curved segments of the stroke. Vertices (or corners) are found by using the direction curvature and speed data of the strokes. Extrema are curvature and speed are regarded as corners, only after reliably removing extrema due to noise. This is done by using average based filtering. This is done by looking at values beyond a threshold. For curvature, max values are found only above a certain threshold. Similarly for speed, min values are found only below a certain threshold. A hybrid fit (of both speed and curvature) is used to determine vertices, from a final set of candidates. (which were obtained from both speed and curvature extremas)

In order to detect curves, bezier curves are fit to the polyline fits created by the hybrid models for detecting corners. For computational efficiency, we use a piecewise linear fit on the bezier curve and compute the error for this approximation. If this error is above a given error-tolerance, the curve is subdivided in the middle and the process is repeated.

After approximation, beautification is done to the result, to make it look as intended. Slopes of a cluster of line segments are weighted and averaged. Once beautification is done, basic objects are recognized from the low-level line segments and curves. This is done by template matching for simple figures.

The system was tested with 13 users, and had a high accuracy of 96 percent. Most users preferred using the system's interface over paper and tablets.

Discussion:
Identification of corners and curves (at corners) was something I was taking for granted in sketch recognition systems. The idea of using speed and curvature data for identifying vertices is pretty cool. I wonder how these methods compare to corner identification techniques from the computer vision domain. Is there a real advantage to using sketching information rather than just using imaging recognition techniques to do the same?

Assignment 23: Reading 19 - Herot Sketch

Bibliography:
Christopher F. Herot. Graphical input through machine recognition of sketches. Proceeding of the SIGGRAPH '76 Proceedings of the 3rd annual conference on Computer graphics and interactive techniques. pp. 97-102. 1976.

Summary:
This paper talks about different approaches to inferring and identifying sketches. The author tries to explain the need for finding a balance between unwieldy fully automated systems and practical ones that rely heavily on the user. Three approaches to recognizing sketches are discussed.

The HUNCH system, tries to identify sketches, just by using the syntax of drawing, without any semantic knowledge. The system is a collection of FOTRAN programs, that identify skethes, at different levels of interpretation. STRAIT, is a program that identifies corners as a function of speed. CURVIT, used the outpu of STRAIT and identified curves by fitting them to B-Splines. Both STRAIT and CURVIT worked well for some users, but not all. The focus shifted to improving STRAT's latching algorithm by comparing candidate fits in 3D instead of 2D projections. Overtracing was another problem that had to be dealt with.

The next section focusses on using context, to improve sketch recognition. The idea was, if some context information can be provided, then the syntax of sketching can be combined with this, to provide meaningful inferences of the sketch. This approach was top-down, starting from a contextual interpretation, moving down the HUNCH system interpretations and looking for matches. CONNIVER language, was one of the earliest to try this approach, but was very resource heavy and hence made it hard to construct the knowledge base.

Lookings at the merits of the two approaches above, the author suggests that the field should move towards a 'interactive system', that relies on both, automatic recognition and user interpretation. The system consists of a database and a set of programs to manipulate it. The database is dynamically updated, as new interpretations are made. Three types of programs, manipulate the database. These are 1) Inference programs that interpret the data. 2) Display programs that display the various interpretations on multiple devices. 3) Manipulation programs, that allows the user to directly manipulate the database contents. The line/curve identifier (such as the one in HUNCH) using this system, does real time recognition of sketches (with context). But the user also modifies the parameters of the programs, inorder to get more satisfactory results.

Discussion:
This paper gives a nice overview on why sketch recognition is complex without user intervention and contextual information. I think contextual information is key in sketch recognition systems, because we humans interpret similar sketches in very different ways based on contextual information.

As seen in more recent papers, seems like the research is moving towards interactive systems, that take in both user feedback and automated recognition.

Monday, 16 October 2017

Assignment 22: Reading 18 - Hammond LADDER

Bibliography
Tracy Hammond and Randall Davis. LADDER, A Sketching Language for User Interface Developers. Computers & Graphics. 29. 2005.

Summary
This paper discusses LADDER - a sketching language for user interface developers. The goal of the authors was to build a domain independent framework for developing sketch recognition interfaces, which can be used by developers to build a recognizer in their domain. The framework requires the developer to write a domain description, which is automatically translated into shape recognizers, editing recognizers and shape exhibitors, for the specific domain.

Shapes can be defined in ladder using primitive shapes or previously defined shapes. The definition of a shape consists of components (elements from which shape is built), constraints (geometric constraints), aliases, editing behavior (like 'cross on arrow' for erase) and display methods (what to display when object is recognized). This description is translated into shape recognizers, exhibitors and editors that can be used in the domain sketch interface. The system makes it easy to define these shapes by providing abstract definitions for reusability and hierarchical definitions for simplicity. Shapes can be grouped together using 'shape groups' to provide chaining effects while editing. LADDER also supports 'Vectors' for defining shapes with arbitrary components. (such as polylines)

One limitation of the system is that the user defined shapes must be solely constrained of using primitive constraints defined in ladder. This makes it difficult to define shapes that have a lot of irregularities. Ladder supports a vase array of constraints such as acute, obtuse, concentric, larger, near, slope etc. Some constraints are defined for a given orientation. It also allows users to specify rotational invariance to shapes.

LADDER's correctness in identifying shapes depends a lot on getting the primitive shapes right. This is because, once the primitive shapes have been identified (can be many), these are then used by a higher level recognizer (written as jess rules) that matches the data with all possible shape combinations that satisfy the rule. The system may select the wrong higher level shape, as a greedy approach is used for picking the higher level shape. (This is done to maintain efficiency and real-time editing capability) A drawing gesture is short-circuited whenever an editing gesture is matched. Finally, a shape constraint solver is used to display a shapes ideal strokes.

LADDER domain desriptions have been tested in a variety of domains such as mechanical engineering, UML class diagrams, finite state machines and flowcharts.

Discussion
LADDER is the first paper I am reading, which offers a full blown generic sketch recognition system since GRANDMA. (explained in Rubine's paper) I particularly like the abstractions defined in LADDER, that makes it look very modular and reusable. I wonder if ladder can be improved using feedback from past decisions made by it. Some kind of reinforcement learning. It will be interesting to combine such a system with other recognition techniques, that may not be geometric.

Wednesday, 4 October 2017

Assignment 21: Reading 17 - Vatavu PDollar

Bibliography:
Radu-Daniel Vatavu, Lisa Anthony, and Jacob O. Wobbrock. Gestures as Point Clouds: A $P Recognizer for User Interface Prototypes. Proceedings of the ACM International Conference on Multimodal Interaction. 2012.

Summary:
This paper discusses $P, a new type of gesture recognition algorithm that extends the $ family of algorithms. The algorithm aims to support multistroke gestures (unlike $1 and Protractor), while avoiding the exponential complexity seen in $N and $N-protractor algorithms.

The $P algorithm, removes the temporal information from the gestures and stores the gesture as a point-cloud. This is matched against the point cloud of each possible template gesture in the training data. Doing this efficiently is what makes $P efficient. This problem can be converted into a form of the well studied 'assignment problem' in graph theory. The 'Hungarian algorithm' is among the gold standard for this, but is very slow, even though its very accurate. The authors instead, studied a set of Greedy-X heuristics for the template matching. The chosen greedy heuristic matches each point in the candidate with an unmatched point in the template point cloud and combines the differences with a weight. This was found to be the most efficient and its accuracy was comparable to the hungarian algorithm.

$P was found to perform comparable to $1 on single strokes and better than the $N recognizer on multiple strokes. The authors also provided the Pseudocode for the algorithm.

Discussion:
For me, this is surprisingly a simple, elegant and efficient algorithm for identifying gestures. The pseudocode provided by the authors makes it very accessible and easy for developers to implement the same in their applications. Since the core of the algorithm is based on a greedy heuristic, I am curious what its failing cases are. As mentioned in the table, since the algorithm is not rotationally invariant, I wonder if it can be combined with the polar-coordinate rotation invariance discussed in the previous paper.

Assignment 20: Reading 16 - Kara Symbol

Bibliography:
Levent Burak Kara and Thomas F. Stahovich. An Image-based, Trainable Symbol Recognizer for Hand-drawn Sketches. Computers and Graphics. Volume 29, Issue 4. August, 2005.

Summary:
The paper talks about an image based trainable, multi-stroke hand drawn symbol recognizer. Some advantages of this image based recognizer is that it avoids the segmentation and feature extration problems, commonly seen in recognizers based on single-stroke recognizers that break an image into constituent primitive lines and curves.

The recognizer can learn new symbol definitions from just one prototype image, allowing users to train the system with new symbols on the fly. The system mainly consists of two steps: 1) Make the image rotation invariant. This is done by transforming the image into polar coordinates and match with given definitions using rotation. Additionally, dissimilar images can be pruned off at this stage. In the next step, multiple classifiers analyze the remaining candidates and produce a list of definitions ranked by similarity using template matching. The template matching is done using Modified Hausdorff distance, Tanimoto similarity coefficient and the Yule coefficient. Finally, the individual classification results are normalized and combined to make the final decision.

A user study was conducted and it was found that the system accurately classifies symbols with only a small amount of training data and works well in both user-dependent and independent studies.

Discussion:
This is the first time I am reading about a recognizer that tries to identify the entire image rather than its constituent primitives. While this works great for specific applications and as an independent system, I wonder if it can be combined with other systems to form a higher level system. That way, we can use the output of this system as a feature for another system.

Assignment 19: Reading 15 - Hammond Tahuti

Bibliography:
Tracy Hammond and Randall Davis. Tahuti: A Geometrical Sketch Recognition System for UML Class Diagrams. SIGGRAPH '06. 2006.

Summary:
The paper discusses Tahuti, a system that recognizes UML diagrams and thereby helps user sketches to be leverages in actual coding. There were a few systems such as Rational Rose, Ideogramic UML for recognizing UML diagrams, but were not very flexible in combining user sketches and recognition techniques. Other systems such as onces based on Rubine features required training.

Tahuti is a multi-layer framework for sketch recognition that leverages geometric properties of the sketches to recognize them. The system has four main steps,names preprocessing, selection, recognition, identification. Preprocessing produces a bunch of possible interpretations of the stroke data along with its probability. Strokes are grouped into collections (and also pruned) by using temporal and spatial data. Experimentally, it was found that 9 strokes is a good threshold for a collection of strokes. The Recognizer interprets strokes as either an 'viewable object' or an editing command. The eight possible viewable objects are general class, interface class, inheritance association, aggregate association, dependency association, interface association text or collection of unrecognized strokes. The identification stage selects the final interpretation based on a set of rules, that accounts for object movement, number of strokes and correctness probability.

Experiments showed that users preferred Tahuti over Paint programs and UML tools such as Rational Rose as it combined the ease of drawing with ease of editing UML.

Discussion:
Though this paper was written in 2002, it still seems the state of the art in UML recognition tools! Having worked with UML diagrams and white board drawing, I would love to use a tool like Tahuti for design discussions in a team. I wonder how one can coming this with gesture recognition techniques and maybe create a software project scaffold just from UML? That would be interesting...

Monday, 2 October 2017

Assignment 18: Reading 14 - Bhat Entropy

Bibliography:
Akshay Bhat and Tracy Anne Hammond. Using Entropy to Identify Shape and Text in Hand Drawn Diagrams. Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence. 2009.

Summary:
This paper focuses on differentiating between text and shape sketches. The motivation for this is that, most existing sketch recognition systems perform well in either text or sketch, but not both. The paper suggests using entropy rate (specifically, zero-order entropy rate), as a distinguishing factor.

Previously, the shape vs text classification was done using some specific features, found by trial and error. This was very time consuming and this motivated the authors to find a single logically coherent feature. It was found that entropy of text is more than shape, when representing them using a general set of coordinate equations. This led the authors to define entropy based on digital ink. First, a model alphabet is defined as a set consisting of 7 symbols, corresponding to a range of angles between 0 and pi. Each point in a stroke is assigned an alphabet based on the angle it makes with neighbouring points. 'zero-order' refers to independence between consecutive symbols when determining the probability.

User storkes were grouped based on thresholds for time and spatial coordinates. After resampling and preprocessing, the alphabet model was applied and entropy for each group was calculated. The system classified the groups as text or shape, and left the ones with an entropy value at the boundary as unclassified. The system had an accuracy of over 90 percent.

Discussion:
Entropy based classification for text vs shape seems very effective. Also, the confidence value emitted by the system makes it usable in other higher-level systems (similar to paulson features paper). The alphabet model, however, seems fairly simple with only 7 symbols. I wonder how this would scale with a more complex alphabet and higher-level non-primitive shapes.

Assignment 17: Reading 13 - Paulson PaleoSketch

Bibliography:
Brandon Paulson and Tracy Hammond. PaleoSketch: Accurate Primitive Sketch Recognition and Beautification. Proceedings of the 13th international conference on Intelligent user interfaces. 2008

Summary:
The paper presents a system - PaleoSketch, that can accurately identify eight low level primitive shapes (line, polyline, ellipse, circle, arc, curve, spiral and helix) and hierarchically construct higher-level shapes based on geometric constraints defined by the user. Additionally, it also beautifies the sketch, thus providing early feedback.

The general architecture of the system consists of three steps: Pre-recognition, Shape Tests and Hierarchical interpretation. The pre-recognition, in addition to resampling, computing corners, speed and direction metrics, calculates two new features: Normalized distance between direction extremes (NDDE) and Direction Change ratio (DCR). These metrics are at the heart of the various shape tests. NDDE is computed by taking the distance between the points with the highest and lowest slopes. (change in y / change in x) This is normalized by the total stroke length. DCR is calculated by taking the max change in direction and dividing by average change in direction. Before performing the various shape tests, the system also determines if the shape is overtraced and is closed.

A shape is classified as complex when none of the shape tests pass. The final part of the system, the hierarchy function, sorts the results in order of best fit. The system was found to perform well in the given dataset and is comparable to the best low-level recognizers, that don't handle so many shapes.

Discussion:
PaleoSketch is a great example of a geometry based recognizer, and shows how far a constraint based sketch recognition system can go. It was nice to see simple metrics such as NDDE and DCR to be very effective in identifying primitive shapes. I wonder how the authors arrived at the various thresholds used in the paper. (Mainly during preprocessing, when checking for overtracing and if the figure is closed), as those seem crucial to the performance of the shape tests.

Assignment 16: Reading 12 - Paulson Features

Bibliography:
Brandon Paulson, Pankaj Rajan, Pedro Davalos, Ricardo Gutierrez-Osuna, and Tracy Hammond. What!?! No Rubine Features?: Using Geometric-based Features to Produce Normalized Confidence Values for Sketch Recognition. VL/HCC Workshop: Sketch Tools for Diagramming. 2008.

Summary:
This paper presents a hybrid sketch recognition system, for recognizing lower-level primitives that combines features from both gesture and geometric recognition systems. The system aims at making the recognition user independent and allow for interpretation with normalized confidence values.

The system uses a quadratic classifier that contains both geometric and gesture features. The feature set consists of 44 features, 31 geometric and 13 of rubine's gesture based features. Feature subset selection was done using a greedy, sequential feed forward (SFS) technique. Using this subset of features, the accuracy achieved was comparable to that of the PaleoSketch system.

The system is easier to code and performs faster classification when compared to the PaleoSketch system as fewer features need to be compared. 14 of the 31 geometric features were present in the optimal subset, which included NDDE and DCR, (See PaleoSketch), least squares line, major-minor axis ratio. Only total-rotation was chosen from the gesture based features in the optimal subset, which is mainly because the data was chosen such that it was user independent.

Discussion:
It was nice to see that geometric constraints play an important role in user independent recognition systems. While gesture recognition has its place in the field of sketch recognition, I see geometric recognition to be a key player in bringing sketch recognition systems to main stream mobile platforms. Thinking of app developers who'll be building apps on these gesture based devices, I think it makes more sense for them to use recognition engines that are user independent.