Sketch Recognition Course (TAMU CSCE 624): November 2017

Monday, 27 November 2017

Assignment 31: Reading 27 - LaViola MathPad2

Bibliography:
Joseph J. LaViola, Jr. and Robert C. Zeleznik. 2007. MathPad2: a system for the creation and exploration of mathematical sketches. In ACM SIGGRAPH 2007 courses (SIGGRAPH '07). ACM, New York, NY, USA, Article 46

Summary:
This paper introduces MathPad, a prototype application for creating mathematical sketches. The application consists of a user interface, sketch parser and animation engine.

The interface consists of a simple sketch pad that mimics a paper, in which a user can draw freely. In order to prevent erratic gestures and ease of use, special gestures for erasing (such as scribble) and other functions were defined. Users found it easy to learn these gestures. The interface also supports drawing diagrams, but doesn't recognize the same. The application supports associations to be made between mathematical diagrams and sketches.

In addition to this, mathpad also supports computational functions for graphing, solving, simplifying and factoring. The Sketch parser, consists of Mathematical expression recognition, an association inferencing system and defining drawing dimensions and rectification. Finally, mathpad also consists of an animation system, that animates any part of sketch that is animatable. Though the system currently only supports closed-form expressions, the authors believe this can be extended and made into an powerful tool for formulating and visualizing mathematical concepts.

Discussion:
The system introduced here is solving a fairly complex problem, for which pen and paper still dominates. At a high level, this system essentially hopes to handle free hand sketch recognition for mathematical symbols, graphs and texts! It'll be interesting to see how it works with multile users and erratic sketches. Also, its not clear on how the recognition is performed, what algorithms are used. However, it does seem to work well for the given examples, and looks quite neat.

Assignment 30: Reading 26 - Dixon iCanDraw

Bibliography:
Daniel Dixon, Manoj Prasad, and Tracy Hammond. 2010. iCanDraw: using sketch recognition and corrective feedback to assist a user in drawing human faces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '10). ACM, New York, NY, USA, 897-906.

Summary:
The paper talks about an proof of concept application that provides step by step instruction and generated feedback that guides a person to sketch human faces.Inorder to provide helpful assistance to users, nine design principles were developed.

First, the paper elaborates on a popular theory on visual perception of human's, using left vs right brain. Based on this observation and present teaching methods, a step by step corrective teaching model was developed. The user interface consists of a drawing area, a reference image to draw and an area to provide instructions to the user. The user can manually ask for feedback by pressing a button. The aplication provides both textual and visual feedback.

Boundaries were added to later versions in order to help a user to manage the features better. The nine design principles developed were the accuracy of the master template is important, R-mode is important for a users visual perception, feedback should be given only when asked for, corrective feedback must be clear and constant, 'erased' strokes should be temporarily visible as a form of corrective feedback, free hand sketching is iportant, corrective feedback should be adaptive to mature sketches, the application should be mindful of artistic affordances.

Five participants tested the application and felt overwhelmed, in drawing a human face. The step by step instructions were useful.

Discussion:
The principles from the paper gives me confidence to draw better and see it as a skill that can be learned. Certainly, an application like this, which provides feedback and step by step instructions would be very useful. It'll be interesting to see this applied to less complex shapes, other than human faces.

Monday, 20 November 2017

Assignment 29: Reading 25 - Sharon Constellation

Bibliography:
D. Sharon and M. van de Panne; EUROGRAPHICS Workshop on Sketch-Based Interfaces and Modeling (2006)

Summary:
This paper discusses an application of constellation models to develop probabilistic models for object sketches, based on multiple example drawings. These models are applied to estimate 'most-likely' labels for a sketch. The constellation model described in this paper is designed to capture the structure of a particular class of object and is based on local features and pairwise features, such as distances to other parts.

The probabilistic model is first learned from a set of labelled sketches. The recognition algorithm then determines a maximum-likelihood labelling for an unlabelled sketch by using a branch and bound algorithm. The particular method described in the paper allow considerable variability in the way sketches are drawn. However, the algorithm does make 2 assumptions in the way the sketches are drawn. 1) Similar parts are drawn with similar strokes. 2) Mandatory parts in an object are drawn exactly once.

The constellation model consists of 2 main feature vectors 1) The individual object part features 2) Pairwise features. In order to make the model efficient, pairwise features are calculated only form mandatory individual features. From the training examples, an object model is learned as a diagonal covariance matrix. The quality of a particular matching is measured using a cost function. A maximum likelihood search is performed to find the most plausible match. The search over all possible label assignments is carried out by a branch and bound search tree. Branches of a search are bound using multipart thresholding. Upon failure to find a label assignment, the process is repeated with a weaker threshold until a match is found.

The method was tested on 5 classes of objects, with 20-60 training examples each. The recognition time was found to be under 2.5 seconds, with most of the time spent on initialization. The multipass thresholding significantly reduced the computation time.

Discussion:
Constellation models seem like a nice way to identify complex shapes, that are domain specific. I like the fact that these do not depend on smaller basic shapes, that are usually required in a lot of gesture based methods. I am not sure how well this method would work when shapes are very similar. It'll be interesting to see how the threholds play out.

Wednesday, 8 November 2017

Assignment 28: Reading 24 - Rabiner HMM

Bibliography:
Lawrence R. Rabiner. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, Vol. 77, No. 2. pp. 257-286. 1989.

Summary:
This paper discusses the applications of hidden markov models in the area of speech recognition. It has been found that having signal models, allows us to study a great deal about real world signal sources, that are hard/expensive to capture and measure. These models broadly fall into the categories of deterministic and stochastic models. This paper talks about applications of a specific type of stochastic model, namely the hidden markov model.

A Markov model is a type of bayesian model in which the current state depends only on a finite set of previous states. This allows us to define a state machine with transition probabilities between each of the states. Hidden markov model is an extension to the markov model, where the observation is a probabilistic function of the state. The author explains this by giving examples using "a simple coin tossig model" and "the urn and ball model".

HMMs help us solve 3 types of problems. 1) Given an observation sequence and a model, how do we efficiently compute the probability of the observation, given the model. 2) How do we choose states in the model, such that it optimally explains the observations. 3) How do we adjust the model parameters to maximize the probability of observation given the model.

Discussion:
Markov process is a nice way to use simplify and use bayesian systems (or the theorem of total probability), when the current state depends only on a finite set of states in the past. Its interesting to see that HMMs take this one step further, and let us make prediction of the states, when we can only observe some 'effect' of being in a state. This also shows the importance of identifying the cause and effect direction correctly, before modelling our system as a HMM.

Assignment 27: Reading 23 - Segzin HMM

Bibliography:
Tevfik Metin Sezgin and Randall Davis. HMM-Based Efficient Sketch Recognition. Proceedings of the 10th international conference on Intelligent user interfaces. pp. 281-283. 2005.

Summary:
This paper talks about using Hidden Markov Models to recognize sketches. This technique can be used to identify sketches, when sketch data is collected incrementally with (x, y, time) coordinates. This is different from traditional methods where sketches were treated as images.

The method presented in the paper identifies the sketching style of individual users. Thus, rather than a generalized recognition system, the recognizer works well for a user with a specific style. From their user studies, it was found that individual sketching styles persist across sketches. This structure was captured using Hidden markov models. The hmms were trianed on input sketch data, partitioned by length, and the formulation was done using graphs.

The system was evaluated in 2 parts: 1) Evaluating the HMMs with real data. 2) Compare the performance of the algorithm to a baseline method. The HMMs were found to have a high accuracy, and performance improved with more training data. The base line model used to compare HMMs was a feature based pattern-matching system, without ordering information. The HMM based system was found to scale well (by time) when number of objects in the scene increased.

Discussion:
Its interesting to see that sketching style of users persists across sketches. This makes it possible to train sketches on multiple feature based techniques for each user seperately. One issue I see this method, is that it's hard for new information learned to be relayed to the system. (I think Dr. Hammond mentioned this in class)