Sketch Recognition Course (TAMU CSCE 624): 2017

Monday, 27 November 2017

Assignment 31: Reading 27 - LaViola MathPad2

Bibliography:
Joseph J. LaViola, Jr. and Robert C. Zeleznik. 2007. MathPad2: a system for the creation and exploration of mathematical sketches. In ACM SIGGRAPH 2007 courses (SIGGRAPH '07). ACM, New York, NY, USA, Article 46

Summary:
This paper introduces MathPad, a prototype application for creating mathematical sketches. The application consists of a user interface, sketch parser and animation engine.

The interface consists of a simple sketch pad that mimics a paper, in which a user can draw freely. In order to prevent erratic gestures and ease of use, special gestures for erasing (such as scribble) and other functions were defined. Users found it easy to learn these gestures. The interface also supports drawing diagrams, but doesn't recognize the same. The application supports associations to be made between mathematical diagrams and sketches.

In addition to this, mathpad also supports computational functions for graphing, solving, simplifying and factoring. The Sketch parser, consists of Mathematical expression recognition, an association inferencing system and defining drawing dimensions and rectification. Finally, mathpad also consists of an animation system, that animates any part of sketch that is animatable. Though the system currently only supports closed-form expressions, the authors believe this can be extended and made into an powerful tool for formulating and visualizing mathematical concepts.

Discussion:
The system introduced here is solving a fairly complex problem, for which pen and paper still dominates. At a high level, this system essentially hopes to handle free hand sketch recognition for mathematical symbols, graphs and texts! It'll be interesting to see how it works with multile users and erratic sketches. Also, its not clear on how the recognition is performed, what algorithms are used. However, it does seem to work well for the given examples, and looks quite neat.

Assignment 30: Reading 26 - Dixon iCanDraw

Bibliography:
Daniel Dixon, Manoj Prasad, and Tracy Hammond. 2010. iCanDraw: using sketch recognition and corrective feedback to assist a user in drawing human faces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '10). ACM, New York, NY, USA, 897-906.

Summary:
The paper talks about an proof of concept application that provides step by step instruction and generated feedback that guides a person to sketch human faces.Inorder to provide helpful assistance to users, nine design principles were developed.

First, the paper elaborates on a popular theory on visual perception of human's, using left vs right brain. Based on this observation and present teaching methods, a step by step corrective teaching model was developed. The user interface consists of a drawing area, a reference image to draw and an area to provide instructions to the user. The user can manually ask for feedback by pressing a button. The aplication provides both textual and visual feedback.

Boundaries were added to later versions in order to help a user to manage the features better. The nine design principles developed were the accuracy of the master template is important, R-mode is important for a users visual perception, feedback should be given only when asked for, corrective feedback must be clear and constant, 'erased' strokes should be temporarily visible as a form of corrective feedback, free hand sketching is iportant, corrective feedback should be adaptive to mature sketches, the application should be mindful of artistic affordances.

Five participants tested the application and felt overwhelmed, in drawing a human face. The step by step instructions were useful.

Discussion:
The principles from the paper gives me confidence to draw better and see it as a skill that can be learned. Certainly, an application like this, which provides feedback and step by step instructions would be very useful. It'll be interesting to see this applied to less complex shapes, other than human faces.

Monday, 20 November 2017

Assignment 29: Reading 25 - Sharon Constellation

Bibliography:
D. Sharon and M. van de Panne; EUROGRAPHICS Workshop on Sketch-Based Interfaces and Modeling (2006)

Summary:
This paper discusses an application of constellation models to develop probabilistic models for object sketches, based on multiple example drawings. These models are applied to estimate 'most-likely' labels for a sketch. The constellation model described in this paper is designed to capture the structure of a particular class of object and is based on local features and pairwise features, such as distances to other parts.

The probabilistic model is first learned from a set of labelled sketches. The recognition algorithm then determines a maximum-likelihood labelling for an unlabelled sketch by using a branch and bound algorithm. The particular method described in the paper allow considerable variability in the way sketches are drawn. However, the algorithm does make 2 assumptions in the way the sketches are drawn. 1) Similar parts are drawn with similar strokes. 2) Mandatory parts in an object are drawn exactly once.

The constellation model consists of 2 main feature vectors 1) The individual object part features 2) Pairwise features. In order to make the model efficient, pairwise features are calculated only form mandatory individual features. From the training examples, an object model is learned as a diagonal covariance matrix. The quality of a particular matching is measured using a cost function. A maximum likelihood search is performed to find the most plausible match. The search over all possible label assignments is carried out by a branch and bound search tree. Branches of a search are bound using multipart thresholding. Upon failure to find a label assignment, the process is repeated with a weaker threshold until a match is found.

The method was tested on 5 classes of objects, with 20-60 training examples each. The recognition time was found to be under 2.5 seconds, with most of the time spent on initialization. The multipass thresholding significantly reduced the computation time.

Discussion:
Constellation models seem like a nice way to identify complex shapes, that are domain specific. I like the fact that these do not depend on smaller basic shapes, that are usually required in a lot of gesture based methods. I am not sure how well this method would work when shapes are very similar. It'll be interesting to see how the threholds play out.

Wednesday, 8 November 2017

Assignment 28: Reading 24 - Rabiner HMM

Bibliography:
Lawrence R. Rabiner. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, Vol. 77, No. 2. pp. 257-286. 1989.

Summary:
This paper discusses the applications of hidden markov models in the area of speech recognition. It has been found that having signal models, allows us to study a great deal about real world signal sources, that are hard/expensive to capture and measure. These models broadly fall into the categories of deterministic and stochastic models. This paper talks about applications of a specific type of stochastic model, namely the hidden markov model.

A Markov model is a type of bayesian model in which the current state depends only on a finite set of previous states. This allows us to define a state machine with transition probabilities between each of the states. Hidden markov model is an extension to the markov model, where the observation is a probabilistic function of the state. The author explains this by giving examples using "a simple coin tossig model" and "the urn and ball model".

HMMs help us solve 3 types of problems. 1) Given an observation sequence and a model, how do we efficiently compute the probability of the observation, given the model. 2) How do we choose states in the model, such that it optimally explains the observations. 3) How do we adjust the model parameters to maximize the probability of observation given the model.

Discussion:
Markov process is a nice way to use simplify and use bayesian systems (or the theorem of total probability), when the current state depends only on a finite set of states in the past. Its interesting to see that HMMs take this one step further, and let us make prediction of the states, when we can only observe some 'effect' of being in a state. This also shows the importance of identifying the cause and effect direction correctly, before modelling our system as a HMM.

Assignment 27: Reading 23 - Segzin HMM

Bibliography:
Tevfik Metin Sezgin and Randall Davis. HMM-Based Efficient Sketch Recognition. Proceedings of the 10th international conference on Intelligent user interfaces. pp. 281-283. 2005.

Summary:
This paper talks about using Hidden Markov Models to recognize sketches. This technique can be used to identify sketches, when sketch data is collected incrementally with (x, y, time) coordinates. This is different from traditional methods where sketches were treated as images.

The method presented in the paper identifies the sketching style of individual users. Thus, rather than a generalized recognition system, the recognizer works well for a user with a specific style. From their user studies, it was found that individual sketching styles persist across sketches. This structure was captured using Hidden markov models. The hmms were trianed on input sketch data, partitioned by length, and the formulation was done using graphs.

The system was evaluated in 2 parts: 1) Evaluating the HMMs with real data. 2) Compare the performance of the algorithm to a baseline method. The HMMs were found to have a high accuracy, and performance improved with more training data. The base line model used to compare HMMs was a feature based pattern-matching system, without ordering information. The HMM based system was found to scale well (by time) when number of objects in the scene increased.

Discussion:
Its interesting to see that sketching style of users persists across sketches. This makes it possible to train sketches on multiple feature based techniques for each user seperately. One issue I see this method, is that it's hard for new information learned to be relayed to the system. (I think Dr. Hammond mentioned this in class)

Monday, 30 October 2017

Assignment 26: Reading 22 - Wolin ShortStraw

Bibliography:
Aaron Wolin, Brian Eoff, and Tracy Hammond. ShortStraw: a simple and effective corner finder for polylines. Proceedings of the Fifth Eurographics conference on Sketch-Based Interfaces and Modeling. pp. 33-40. 2008.

Summary:
This paper explains a simple corner finding algorithm, that is is highly accurate for polyline corner finding. The authors claim its simplicity and high accuracy as motivating reasons to use it. They also provide a pseudocode for the algorithm in the appendix. ShorStraw involves resampling the points, finding the 'straw distance' and recognizing points with minimum straw distances as corner. The resampling process is done based on a 'interspacing distance' for points, found by length of diagonal divided by a constant. This constant was set to 40, determined empirically. After resampling the points, 'straw' distances are found for every point, with its neighbours. Corners tend to have a small value for the straw distance. After the initial processing to find the corners, some higher level processing is done to identify missing corners and remove false positives. To add missing corners, consecutive corners are checked to see if they form a line. If not, the system looks for an additional corner inbetween these points. The process is repeated till all consecutive corners form lines. For eliminating false positives, three consecutive points are checked for collinearity. If true, then the middle point is removed. The algorithm was tested based on 2 accuracy measures, namely correct corners found' and 'all or nothing' accuracy. ShortStraw was found to have high values for both, compared to Sezgin's and Kim's corner finders.

Discussion:
I find ShortStraw to be a great example of simplicity and minimalism. It does one task and does it very well. It would be interesting to see ShortStraw used as a component in an algorithm such as Sezgin's, so that it can handle polylines in a more complex algorithm, and also be used as a feature for identifying complex shapes.

Assignment 25: Reading 21 - Yu Sketch

Bibliography:
Bo Yu and Shijie Cai. A Domain-Independent System for Sketch Recognition. Proceedings of the 1st international conference on Computer graphics and interactive techniques in Australasia and South East Asia. pp. 141-146. 2003.

Summary:
This paper talks about a domain independent low-level sketch recognition system, whose output is organized in a hierarchical structure, with semantic and syntactic information, along with the raw data. The system that can handle curves and hybrid shapes, as well as it handles polylines. The author aimed to solved two issues in existing systems,
1) Ability for user to sketch naturally. 2) Decompose hybrid and
smooth curves into primitive shapes.

The system records coordinates of the strokes, along with time. The process is divided into 2 stages, imprecise stroke approximation and post-process. The stroke approximation is done immediately (to either lines, curves, circles, ellipses or helixes), giving the user rapid feedback. Once the sketch is done, postprocessing is done and gives a set of hierarchical outputs, lowest level being raw-data and highest level being semantic relations table.

The imprecise approximation is done by looking at the direction and curvature graphs of the strokes. Feature-area is used as a guide during this phase. One main feature of this paper is that, it does not identify vertices in a seperate stage, and is combined with the approximation phase, allowing more information to be used and taken. If a given stroke, cannot be approximated to a primitive shape, it is recursively broken down at the highest curvature curve point and recursively checked. (This process is a little different for self-intersection strokes). The system also provides some basic object recognition, that are common accross all domains.

The system was evaluated by ten randomly chosen students, who all thought it was very easy to use. The system achieved 98% in correctly identifying primitive shapes, about 70% for hybrid shapes with smooth curves and 90% for other hybrid shapes.

Discussion:
The idea of recursively breaking down shapes and trying to create primitive approximations for them is very interesting. It's nice to see that curvature and direction data can get us a long way in primitive shape identification. One thing that was not clear to me was the rapid feedback. Won't the rapid feedback affect the flow of the user while sketching? If the system got the recognition wrong, its better to leave the original stroke as is (like LADDER), than recognizing it incorrectly.

Wednesday, 25 October 2017

Assignment 24: Reading 20 - Sezgin Corner

Bibliography:
Tevfik Metin Sezgin, Thomas Stahovich, and Randall Davis. Sketch Based Interfaces: Early Processing for Sketch Understanding. Proceedings of the 2001 Workshop on Perceptive User Interfaces. pp. 1-8. 2001.

Summary:
The paper describes a method for early stage processing of sketches. The motivation of early stage processing of sketches is to make sketching a natural way to comunicate with computers, without having to deal with icons, menus and tool selection. The system consists of three phases 1) Approximation 2) Beautification 3) Recognition.

Approximation is done by first finding vertices, then detecting curved segments of the stroke. Vertices (or corners) are found by using the direction curvature and speed data of the strokes. Extrema are curvature and speed are regarded as corners, only after reliably removing extrema due to noise. This is done by using average based filtering. This is done by looking at values beyond a threshold. For curvature, max values are found only above a certain threshold. Similarly for speed, min values are found only below a certain threshold. A hybrid fit (of both speed and curvature) is used to determine vertices, from a final set of candidates. (which were obtained from both speed and curvature extremas)

In order to detect curves, bezier curves are fit to the polyline fits created by the hybrid models for detecting corners. For computational efficiency, we use a piecewise linear fit on the bezier curve and compute the error for this approximation. If this error is above a given error-tolerance, the curve is subdivided in the middle and the process is repeated.

After approximation, beautification is done to the result, to make it look as intended. Slopes of a cluster of line segments are weighted and averaged. Once beautification is done, basic objects are recognized from the low-level line segments and curves. This is done by template matching for simple figures.

The system was tested with 13 users, and had a high accuracy of 96 percent. Most users preferred using the system's interface over paper and tablets.

Discussion:
Identification of corners and curves (at corners) was something I was taking for granted in sketch recognition systems. The idea of using speed and curvature data for identifying vertices is pretty cool. I wonder how these methods compare to corner identification techniques from the computer vision domain. Is there a real advantage to using sketching information rather than just using imaging recognition techniques to do the same?

Assignment 23: Reading 19 - Herot Sketch

Bibliography:
Christopher F. Herot. Graphical input through machine recognition of sketches. Proceeding of the SIGGRAPH '76 Proceedings of the 3rd annual conference on Computer graphics and interactive techniques. pp. 97-102. 1976.

Summary:
This paper talks about different approaches to inferring and identifying sketches. The author tries to explain the need for finding a balance between unwieldy fully automated systems and practical ones that rely heavily on the user. Three approaches to recognizing sketches are discussed.

The HUNCH system, tries to identify sketches, just by using the syntax of drawing, without any semantic knowledge. The system is a collection of FOTRAN programs, that identify skethes, at different levels of interpretation. STRAIT, is a program that identifies corners as a function of speed. CURVIT, used the outpu of STRAIT and identified curves by fitting them to B-Splines. Both STRAIT and CURVIT worked well for some users, but not all. The focus shifted to improving STRAT's latching algorithm by comparing candidate fits in 3D instead of 2D projections. Overtracing was another problem that had to be dealt with.

The next section focusses on using context, to improve sketch recognition. The idea was, if some context information can be provided, then the syntax of sketching can be combined with this, to provide meaningful inferences of the sketch. This approach was top-down, starting from a contextual interpretation, moving down the HUNCH system interpretations and looking for matches. CONNIVER language, was one of the earliest to try this approach, but was very resource heavy and hence made it hard to construct the knowledge base.

Lookings at the merits of the two approaches above, the author suggests that the field should move towards a 'interactive system', that relies on both, automatic recognition and user interpretation. The system consists of a database and a set of programs to manipulate it. The database is dynamically updated, as new interpretations are made. Three types of programs, manipulate the database. These are 1) Inference programs that interpret the data. 2) Display programs that display the various interpretations on multiple devices. 3) Manipulation programs, that allows the user to directly manipulate the database contents. The line/curve identifier (such as the one in HUNCH) using this system, does real time recognition of sketches (with context). But the user also modifies the parameters of the programs, inorder to get more satisfactory results.

Discussion:
This paper gives a nice overview on why sketch recognition is complex without user intervention and contextual information. I think contextual information is key in sketch recognition systems, because we humans interpret similar sketches in very different ways based on contextual information.

As seen in more recent papers, seems like the research is moving towards interactive systems, that take in both user feedback and automated recognition.

Monday, 16 October 2017

Assignment 22: Reading 18 - Hammond LADDER

Bibliography
Tracy Hammond and Randall Davis. LADDER, A Sketching Language for User Interface Developers. Computers & Graphics. 29. 2005.

Summary
This paper discusses LADDER - a sketching language for user interface developers. The goal of the authors was to build a domain independent framework for developing sketch recognition interfaces, which can be used by developers to build a recognizer in their domain. The framework requires the developer to write a domain description, which is automatically translated into shape recognizers, editing recognizers and shape exhibitors, for the specific domain.

Shapes can be defined in ladder using primitive shapes or previously defined shapes. The definition of a shape consists of components (elements from which shape is built), constraints (geometric constraints), aliases, editing behavior (like 'cross on arrow' for erase) and display methods (what to display when object is recognized). This description is translated into shape recognizers, exhibitors and editors that can be used in the domain sketch interface. The system makes it easy to define these shapes by providing abstract definitions for reusability and hierarchical definitions for simplicity. Shapes can be grouped together using 'shape groups' to provide chaining effects while editing. LADDER also supports 'Vectors' for defining shapes with arbitrary components. (such as polylines)

One limitation of the system is that the user defined shapes must be solely constrained of using primitive constraints defined in ladder. This makes it difficult to define shapes that have a lot of irregularities. Ladder supports a vase array of constraints such as acute, obtuse, concentric, larger, near, slope etc. Some constraints are defined for a given orientation. It also allows users to specify rotational invariance to shapes.

LADDER's correctness in identifying shapes depends a lot on getting the primitive shapes right. This is because, once the primitive shapes have been identified (can be many), these are then used by a higher level recognizer (written as jess rules) that matches the data with all possible shape combinations that satisfy the rule. The system may select the wrong higher level shape, as a greedy approach is used for picking the higher level shape. (This is done to maintain efficiency and real-time editing capability) A drawing gesture is short-circuited whenever an editing gesture is matched. Finally, a shape constraint solver is used to display a shapes ideal strokes.

LADDER domain desriptions have been tested in a variety of domains such as mechanical engineering, UML class diagrams, finite state machines and flowcharts.

Discussion
LADDER is the first paper I am reading, which offers a full blown generic sketch recognition system since GRANDMA. (explained in Rubine's paper) I particularly like the abstractions defined in LADDER, that makes it look very modular and reusable. I wonder if ladder can be improved using feedback from past decisions made by it. Some kind of reinforcement learning. It will be interesting to combine such a system with other recognition techniques, that may not be geometric.

Wednesday, 4 October 2017

Assignment 21: Reading 17 - Vatavu PDollar

Bibliography:
Radu-Daniel Vatavu, Lisa Anthony, and Jacob O. Wobbrock. Gestures as Point Clouds: A $P Recognizer for User Interface Prototypes. Proceedings of the ACM International Conference on Multimodal Interaction. 2012.

Summary:
This paper discusses $P, a new type of gesture recognition algorithm that extends the $ family of algorithms. The algorithm aims to support multistroke gestures (unlike $1 and Protractor), while avoiding the exponential complexity seen in $N and $N-protractor algorithms.

The $P algorithm, removes the temporal information from the gestures and stores the gesture as a point-cloud. This is matched against the point cloud of each possible template gesture in the training data. Doing this efficiently is what makes $P efficient. This problem can be converted into a form of the well studied 'assignment problem' in graph theory. The 'Hungarian algorithm' is among the gold standard for this, but is very slow, even though its very accurate. The authors instead, studied a set of Greedy-X heuristics for the template matching. The chosen greedy heuristic matches each point in the candidate with an unmatched point in the template point cloud and combines the differences with a weight. This was found to be the most efficient and its accuracy was comparable to the hungarian algorithm.

$P was found to perform comparable to $1 on single strokes and better than the $N recognizer on multiple strokes. The authors also provided the Pseudocode for the algorithm.

Discussion:
For me, this is surprisingly a simple, elegant and efficient algorithm for identifying gestures. The pseudocode provided by the authors makes it very accessible and easy for developers to implement the same in their applications. Since the core of the algorithm is based on a greedy heuristic, I am curious what its failing cases are. As mentioned in the table, since the algorithm is not rotationally invariant, I wonder if it can be combined with the polar-coordinate rotation invariance discussed in the previous paper.

Assignment 20: Reading 16 - Kara Symbol

Bibliography:
Levent Burak Kara and Thomas F. Stahovich. An Image-based, Trainable Symbol Recognizer for Hand-drawn Sketches. Computers and Graphics. Volume 29, Issue 4. August, 2005.

Summary:
The paper talks about an image based trainable, multi-stroke hand drawn symbol recognizer. Some advantages of this image based recognizer is that it avoids the segmentation and feature extration problems, commonly seen in recognizers based on single-stroke recognizers that break an image into constituent primitive lines and curves.

The recognizer can learn new symbol definitions from just one prototype image, allowing users to train the system with new symbols on the fly. The system mainly consists of two steps: 1) Make the image rotation invariant. This is done by transforming the image into polar coordinates and match with given definitions using rotation. Additionally, dissimilar images can be pruned off at this stage. In the next step, multiple classifiers analyze the remaining candidates and produce a list of definitions ranked by similarity using template matching. The template matching is done using Modified Hausdorff distance, Tanimoto similarity coefficient and the Yule coefficient. Finally, the individual classification results are normalized and combined to make the final decision.

A user study was conducted and it was found that the system accurately classifies symbols with only a small amount of training data and works well in both user-dependent and independent studies.

Discussion:
This is the first time I am reading about a recognizer that tries to identify the entire image rather than its constituent primitives. While this works great for specific applications and as an independent system, I wonder if it can be combined with other systems to form a higher level system. That way, we can use the output of this system as a feature for another system.

Assignment 19: Reading 15 - Hammond Tahuti

Bibliography:
Tracy Hammond and Randall Davis. Tahuti: A Geometrical Sketch Recognition System for UML Class Diagrams. SIGGRAPH '06. 2006.

Summary:
The paper discusses Tahuti, a system that recognizes UML diagrams and thereby helps user sketches to be leverages in actual coding. There were a few systems such as Rational Rose, Ideogramic UML for recognizing UML diagrams, but were not very flexible in combining user sketches and recognition techniques. Other systems such as onces based on Rubine features required training.

Tahuti is a multi-layer framework for sketch recognition that leverages geometric properties of the sketches to recognize them. The system has four main steps,names preprocessing, selection, recognition, identification. Preprocessing produces a bunch of possible interpretations of the stroke data along with its probability. Strokes are grouped into collections (and also pruned) by using temporal and spatial data. Experimentally, it was found that 9 strokes is a good threshold for a collection of strokes. The Recognizer interprets strokes as either an 'viewable object' or an editing command. The eight possible viewable objects are general class, interface class, inheritance association, aggregate association, dependency association, interface association text or collection of unrecognized strokes. The identification stage selects the final interpretation based on a set of rules, that accounts for object movement, number of strokes and correctness probability.

Experiments showed that users preferred Tahuti over Paint programs and UML tools such as Rational Rose as it combined the ease of drawing with ease of editing UML.

Discussion:
Though this paper was written in 2002, it still seems the state of the art in UML recognition tools! Having worked with UML diagrams and white board drawing, I would love to use a tool like Tahuti for design discussions in a team. I wonder how one can coming this with gesture recognition techniques and maybe create a software project scaffold just from UML? That would be interesting...

Monday, 2 October 2017

Assignment 18: Reading 14 - Bhat Entropy

Bibliography:
Akshay Bhat and Tracy Anne Hammond. Using Entropy to Identify Shape and Text in Hand Drawn Diagrams. Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence. 2009.

Summary:
This paper focuses on differentiating between text and shape sketches. The motivation for this is that, most existing sketch recognition systems perform well in either text or sketch, but not both. The paper suggests using entropy rate (specifically, zero-order entropy rate), as a distinguishing factor.

Previously, the shape vs text classification was done using some specific features, found by trial and error. This was very time consuming and this motivated the authors to find a single logically coherent feature. It was found that entropy of text is more than shape, when representing them using a general set of coordinate equations. This led the authors to define entropy based on digital ink. First, a model alphabet is defined as a set consisting of 7 symbols, corresponding to a range of angles between 0 and pi. Each point in a stroke is assigned an alphabet based on the angle it makes with neighbouring points. 'zero-order' refers to independence between consecutive symbols when determining the probability.

User storkes were grouped based on thresholds for time and spatial coordinates. After resampling and preprocessing, the alphabet model was applied and entropy for each group was calculated. The system classified the groups as text or shape, and left the ones with an entropy value at the boundary as unclassified. The system had an accuracy of over 90 percent.

Discussion:
Entropy based classification for text vs shape seems very effective. Also, the confidence value emitted by the system makes it usable in other higher-level systems (similar to paulson features paper). The alphabet model, however, seems fairly simple with only 7 symbols. I wonder how this would scale with a more complex alphabet and higher-level non-primitive shapes.

Assignment 17: Reading 13 - Paulson PaleoSketch

Bibliography:
Brandon Paulson and Tracy Hammond. PaleoSketch: Accurate Primitive Sketch Recognition and Beautification. Proceedings of the 13th international conference on Intelligent user interfaces. 2008

Summary:
The paper presents a system - PaleoSketch, that can accurately identify eight low level primitive shapes (line, polyline, ellipse, circle, arc, curve, spiral and helix) and hierarchically construct higher-level shapes based on geometric constraints defined by the user. Additionally, it also beautifies the sketch, thus providing early feedback.

The general architecture of the system consists of three steps: Pre-recognition, Shape Tests and Hierarchical interpretation. The pre-recognition, in addition to resampling, computing corners, speed and direction metrics, calculates two new features: Normalized distance between direction extremes (NDDE) and Direction Change ratio (DCR). These metrics are at the heart of the various shape tests. NDDE is computed by taking the distance between the points with the highest and lowest slopes. (change in y / change in x) This is normalized by the total stroke length. DCR is calculated by taking the max change in direction and dividing by average change in direction. Before performing the various shape tests, the system also determines if the shape is overtraced and is closed.

A shape is classified as complex when none of the shape tests pass. The final part of the system, the hierarchy function, sorts the results in order of best fit. The system was found to perform well in the given dataset and is comparable to the best low-level recognizers, that don't handle so many shapes.

Discussion:
PaleoSketch is a great example of a geometry based recognizer, and shows how far a constraint based sketch recognition system can go. It was nice to see simple metrics such as NDDE and DCR to be very effective in identifying primitive shapes. I wonder how the authors arrived at the various thresholds used in the paper. (Mainly during preprocessing, when checking for overtracing and if the figure is closed), as those seem crucial to the performance of the shape tests.

Assignment 16: Reading 12 - Paulson Features

Bibliography:
Brandon Paulson, Pankaj Rajan, Pedro Davalos, Ricardo Gutierrez-Osuna, and Tracy Hammond. What!?! No Rubine Features?: Using Geometric-based Features to Produce Normalized Confidence Values for Sketch Recognition. VL/HCC Workshop: Sketch Tools for Diagramming. 2008.

Summary:
This paper presents a hybrid sketch recognition system, for recognizing lower-level primitives that combines features from both gesture and geometric recognition systems. The system aims at making the recognition user independent and allow for interpretation with normalized confidence values.

The system uses a quadratic classifier that contains both geometric and gesture features. The feature set consists of 44 features, 31 geometric and 13 of rubine's gesture based features. Feature subset selection was done using a greedy, sequential feed forward (SFS) technique. Using this subset of features, the accuracy achieved was comparable to that of the PaleoSketch system.

The system is easier to code and performs faster classification when compared to the PaleoSketch system as fewer features need to be compared. 14 of the 31 geometric features were present in the optimal subset, which included NDDE and DCR, (See PaleoSketch), least squares line, major-minor axis ratio. Only total-rotation was chosen from the gesture based features in the optimal subset, which is mainly because the data was chosen such that it was user independent.

Discussion:
It was nice to see that geometric constraints play an important role in user independent recognition systems. While gesture recognition has its place in the field of sketch recognition, I see geometric recognition to be a key player in bringing sketch recognition systems to main stream mobile platforms. Thinking of app developers who'll be building apps on these gesture based devices, I think it makes more sense for them to use recognition engines that are user independent.

Wednesday, 27 September 2017

Assignment 15: Reading 11 - Hammond Chapter 2

Bibliography:
Tracy Hammond. Sketch Recognition, Chapter 2: Introduction to Gesture Recognition, 2017

Summary:
This chapter discusses a number of methods for performing gesture recognition using features. While gesture recognition can be used as a sketch recognition technique, it is important to note that it depends on the path of the pen and hence should be used with caution and maybe as one of the steps in sketch recognition. Gesture recognition can be used in sketching when a user can be taught to draw in a prescribed way or the system can be trained for a users data and the user draws the same way every time.

Dean Rubine, was one of the earliest to develop a gesture recognition method for sketches. He selected a set of 13 features from a stroke and built a linear classifier based on these features. His classifier was found to be very effective and accurate event with a small set of training data. (15 examples). His features are based on the stroke, defined by sample values of (x, y, t). His features can be classified into the following categories: starting angle, bounding box, diagonal, start-end distance, rotation measures and time measures. The Rubine system is one of the most widely used and well-known gesture recognition method.

Christopher long created a system called Quill, that contains a gesture recognition system that uses a feature set with less reliance on time, when compared with Rubine's system. The system uses a GUI to learn the gestures and can be trained on the fly for a new user. Long used 22 features in his recognizer, first 11 of which are Rubine's features. The next 11 features consist of a combination of the above features, including aspect, density, curviness and ratios of some other rubine features, logarithm of aspect and area.

Discussion:
This chapter elaborates on the previously studied long and rubine features, and gives a intuitive sense for what the features mean. An interesting insight was the reason long used Log and division in his features - these could not be learned by a linear system. Though these complex features were present, it is interesting to see that the added complexity does not imply better performance.

Assignment 14: Reading 10 - Hammond Chapter 1

Bibliography:
Tracy Hammond. Sketch Recognition, Chapter 1: Stroke Basics, 2017.

Summary:
This reading discusses the basics of Stroke - a mathematical representation of a gesture as a list of points. A stroke is represented by a series of (x: x-coordinate, y: y-coordinate, t: epoch time) samples, obtained by sampling a gesture from pen-down to pen-up on a device. The sampling rate depends on the device. Spatially, a stroke can be represented by concatenating a series of vectors from point to point. Length of a stroke is defined as the sum of all the individual vectors in a stroke.

The chapter explains various useful trignometric identities. The concepts of sine, cosine and tangent are explained as triangle identities using the mnemonic SOH CAH TOA and how they relate to the interior acute angle. Furthermore, inorder to compute the angle between lines in a stroke, the chapter explains the formula using arctan and arccos. Though the computation using arcTan is simple, it can become undefined in some cases. The arcCos and law of cosines can also be used to compute the angle, but it is computationally less efficient as it involves taking square roots.

During preprocessing, the points are sometimes resampled to make them spacially equidistant or even out point densities. The most prefereable way to do this is by using the diagonal length of the strokes bounding box. Other methods using fixed point distance or fixed number of points to not scale well for varying stroke sizes or point densities.

Discussion:
This paper provides some required mathematical background for representing gestures and generating features from them. It is very useful for someone starting out with gesture recognition. The representation of gestures as samples (and a vector) allows for applying existing computational methods to this field.

Wednesday, 20 September 2017

Assignment 13: Reading 9 - Long Gestures

Bibliography:
A. Chris Long, Jr., James A. Landay, Lawrence A. Rowe, and Joseph Michiels. Visual Similarity of Pen Gestures, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '91), 1991

Summary:
The focus of this paper is on coming up with a computational model that helps measure the 'goodness' of a gesture. The 'goodness' is defined by a combination of similarity to other gestures, ease of learning, remembering etc. The authors hope this would help designers design better gestures, using feedback from a tool that gives the 'goodness'.

The paper first describes a set of pen-based devices such as Apple Newton MessagePad and 3Com PalmPilot. Pen gestures have been found to perform better than keyboard commands for a variety of desktop and other applications. Experiements in perceptual similarity revealed that logarithm of quantitative metrics was found to correlate with similarity. The authors use a Multi dimensional scaling system called INDSCAL to reduce the dimensionality of data and identify patterns in the data by viewing plots.

In their first experiment, the authors presented users with triads of figures, and asked them to identify the one that was most different. Using MDS plots and regression analysis, the authors identified geometric properties that influenced perceived similarity and designed a model of gesture similarity that could predict how similar people would perceive a gesture. It was found that short and wide gestures were perceived to be very similar to narrow and tall ones. It was also found that different people perceived similarity using different features.

In their second experiement, the authors systematically varied features and saw how that affected perceived similarity. It was found that gestures whose lines were horizontal and vertical were perceived as more similar than ones whose components were diagonal.

The authors tested their models developed from trial 1 and trial 2 with data from the other model, model 1 was found to perform slightly better than model 2. The authors conclude that human perception is very complicated, but a small number of features were enough to identify the three most saliant dimensions. The authors hope their work will encourage research in gesture similarity, memorability and learnability.

Discussion:
This paper compliments the Rubines features paper nicely. While rubines features talks about what features help a gesture recognizer classify gestures, this paper focusses on what makes it easy for humans to learn and perceive gestures.

I've always thought that designers come up with gestures/ui capabilities based on their intuition and expensive extensive user studies. I feel having a tool that can compute 'good gestures' would revolutionize the UI design field.

Since this is a fairly old paper, I wonder what the state of the art is, in this area. Certainly, I have seen UI designers go with designs only based on intuition and A/B testing in recent days. Hope a system like this already out there or close to being out there!

Assignment 12: Reading 8 - Rubine Gestures

Bibliography:
Dean Rubine. Specifying Gestures by Example, Proceedings of the 18th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '91), 1991

Summary:
The paper talks about a GRANDMA, a toolkit for adding gestures to direct manipulation interfaces. It also describes a trainable single stroke recognizer used by GRANDMA.

At the time of writing this paper, existing gesture based applications relied on hand-coding the gesture recognizer. This made the system complex and hard to generalize. GRANDMA aims to provide a generalized toolkit for automatically creating the gesture recognizer with few examples and training data.

The paper describes GDP, a gesture-based drawing program that uses a single-stoke recognizer built using GRANDMA. The recognizer identifies simple gestures for operations like rectangle, line, rotate, delete, copy etc. Since its a single stroke recognizer, it can avoid problems like segmentation and makes it intuitive from a users tense and relax perspective while making the gesture.

The system seems to use a graphical interface where various gesture-classes can be created. Each class is given a set of training examples of different gestures, that should reflect the variation of that gesture. Empirically, it is found that 15 is a reasonable number of examples.

The single stroke gesture recognizer works by extracting a set of features from the input gesture, and uses a linear machine to classify the example as one of the gesture classes. (defined by the designer in the system). The author describes a set of 13 features, according to the following. Each feature should be incrementally computable in constant time per input point, a small change in input should correspond to a small change in the feature, there should be enough features to differentiate between all the expected gestures. These features include sine, cosine of initial angle, length of stroke, bounding box, sum of angles at each mouse point, speed and duration.

The gesture classification is done by taking a linear combination of these features, with a possibility of rejecting the gesture. The weight vector is obtained by training the classifier on the given sample input gestures. Gesture rejection is done by setting some threshold on a probability function for each class.

The single stroke recognizer was found to perform well in practice, despite its simplicity. The author describes extensions to the system by eager recognition and support for multi-touch interfaces. The multitouch support can be achieved by performing a single stroke identification for each of the strokes and then combining them using a decision tree.

Discussion:
This paper shows the importance of a field like gesture recognition (more generally sketch recognition) to Human-Computer Interaction. A toolkit for gesture recognition makes it much easier to develop this field, as hand-coding recognizers is one of the main blockers to developing effective sketch based interfaces.

The features discussed in the paper gives a nice overview of what all to look for in a single stroke. As explained by Dr. Hammond, the importance of making sure "similar shapes have similar features" is further emphasized in this paper. Intuitively, that makes a lot of sense, as that is how we humans also think about differences in shapes.

I am really curious to see an extension to this algorithm to handle multi-touch gestures, and thereby interfaces. To me, multi seems like an entirely different problem mainly because its hard to breakdown multitouch gestures into composable single-strokes. But maybe using an effective decision tree handles that.

Monday, 18 September 2017

Assignment 11: Mechanix - Do's and Don'ts

Do's:

The UX of the app is generally very intuitive. Like Being able to see the steps/progress on the left, double clicking for label, browse through all problems, questions on top and equations at the bottom etc. - Traffic signal recognizer is robust in some ways. Irrespective of where I start my stroke, the recognizer does a great of recognizing it.
I really like the feedback system, most of the times, it tells me exactly what is wrong with my sketch. For example, when I mistyped the reaction force at a particular node, it points out that the force at that node is wrong, which makes it very easy to 'debug' my sketch.
The arrow recognizer is very robust. It recognizes arrows irrespective of where I start. It even allowed me to draw arrows with more that 3 strokes!
The recognizer does a great of recognizing the direction of the arrow. I drew 'Rax' slightly tilted towards the 'y' direction, it handled it well. But when I made it ambiguous (very close to 45 degrees), it rightly said the direction was wrong.
Handling of coordinate system directions. The equation box allowed me to select my own positive and negative direction for the coordinate system. For example, If I was able to use 'upwards' or 'downwards' as the positive y direction, as long as I was consistent for that problem.

Don'ts:

Drawing the traffic light was interesting, when I used a single stroke, the recognizer accepted a fairly 'shabby' traffic light. But when I used more strokes, it didn't recognize it as a traffic light, though the final diagram was closer to the original.
Seems like the ordering of sides matter when drawing the rectangle/square shapes. When I draw the parallel sides first, it fails. Would be nice if the square/rectangle recognizer can be more robust, like the arrow recognizer.
The message dialog box disappears fast. Can stay for a little longer.
Instructions for labeling forces was a bit confusing. For example, Problem 5 expects us to label the reaction force at B to be 'Rby', but in the previous problem (Problem 4), we labeled the force as '10 lbs'. Maybe the question can be a little more explicit.
The equation box accepts multiple force values such as '10 N + Rax'. But it also accepts the answer if I enter '5 N + 5 N + Rax'. Maybe the number of terms in the equation should equal the number of forces in that direction?

Saturday, 16 September 2017

Assignment 10: Reading 7 - Mechanix

Bibliography:
Randy Brooks, Jung In Koh, SethPolsley, and Tracy Hammond. Score Improvement Distribution When Using Sketch Recognition Software (Mechanix) as a Tutor: Assessment of a High School Classroom Pilot, Frontiers in Pen and Touch, Chapter 11, Springer, 2017

Summary:
This paper talks about the impact of deploying the mechanix application to a high school, based on student performance/grades. While the 'flipped classroom' model was proving to be effective and encourages more peer to peer (and instructor) interaction, the rapid increase in the number of online courses, has led to a need for intelligent tutoring systems, like Mechanix.

The Mechanix software is built based on LADDER, a general sketch recognition technique. Mechanix runs on any machine with java, and makes it easy to practice problems involving trusses, free body diagrams and vector analysis.

Mechanix was deployed in Lovejoy High school, in a Principles of Engineering course. This course was taken mostly by sophomores, but also included juniors and seniors. After an initial pre-mechanix quiz, the instructor deviced specific interventions to help students, solving problems in Mechanix being one of them.

After a week of practicing problems in Mechanix and a post-mechanix quiz, it was found that there was an overall improvement in the performance of students. The scores were grouped into three categories, based on the previous year's Mathematics grades. It was found that there was a significant improvement in scores, in each of these categories. The author concludes by talking about the positive impact of Mechanix, ease of deployment and its applications in other stem fields as well.

Discussion:
This article gives a nice example of impact of sketch recognition in the field of education. It makes me back to the beginnings of the 'learning through the lens of sketch' talk, where Dr.Hammond mentions, how its more intuitive for us humans to perceive and understand concepts better using sketches.

Though the author clearly mentions that they were not interested in studying the effective of mechanix vs traditional methods (due to both, the lack of a control group and existance of numerous studies to see the effect of ITS), I am still curious to know how this compares to a week of traditional tutoring.

Also, Mechanix was 'one of' the interventions done by the instructor to teach the students, I would like to know what the other interventions were and how they might have heped in the score improvement.

I personally believe that using ITS will definitely make education better and more accessible to students all around the world. Looking forward to a large scale deployment of Mechanix!

Wednesday, 13 September 2017

Assignment 8: Reading 5 - Chinese Room

Bibliography:
John R. Searle. John R. Searle's Chinese Room: A Case Study in the Philosophy of Mind and Cognitive Science. http://psych.utoronto.ca/users/reingold/courses/ai/cache/searle.html

Summary:
The article talks about John Searle's chinese room argument. The argument tries to show that the 'strong artificial intelligence' position of the Materialism doctrine is false. The idea behind strong artificial intelligence is that any machine with the right program would be 'mental'. The idea behind Searle's argument is to construct a machine that would be a zombie (not mental) with any program. If this machine exists, then the existance of strong AI is false.

The thought experiment in the chinese room arguement, is that a human behaves as a machine, that implements a program (the rulebook) that has rules for constructing chinese characters from given chinese characters. Though, this human can repond to chinese, he does not understand it. Thus, there exists a machine (the human) which, given any program, does not understand chinese, and hence is not mental. This means that the strong AI argument is false.

Some cognitive scientists criticize this argument and support strong AI. The most convincing counter argument against the chinese room comes from differentiating the person from his brain. Though the person does not understand chinese, he cannot say for sure if his brain understands it. The person is an unreliable source of information about the understanding of his brain. Furthermore, the chinese room is setup such that it is limited to be able to understand chinese, like the way other humans do. If the person in the chinese room is freed, he can go and learn chinese by talking to other people, just like any other human would. Thus, just by changing the cause relations between the man and his environment.

Discussion:
The chinese room argument aims at disproving strong artificial intelligence. I like the argument from CS which calls this an 'intuition pump'. The reason I think so is that, our understanding of what understanding is, is very intuitive. The chinese room argument makes sense in some ways because, we as humans think we know what it means to understand something. This is where the other minds argument also makes sense.

I agree with the counter argument made by the CS, as we still don't know what understanding means from our brains perspective. But, in reality if strong ai is true, than I think technological singularity is not far away :).

Assignment 9: Reading 6 - Intelligence

Bibliography:
Randall Davis. What are intelligence? and Why? 1996 AAAI Presidential Address, AI Magazine, 19(1), 1998.

Summary:
This article talks about intelligence as something that has evolved over time. It argues that it may be incorrect to look for minimalism and basic principles on the way intelligence works. Instead, intelligence is a product of evolution, just like humans are.

The author states that an common trait of an intellgent being is to be able to predict, respond to change, act intentionally with a goal in mind and reason. Reasoning by itself, can be viewed from multiple schools of thought. The logical view describes intelligence as something that can be precisely and concisely expressed by a formal logic system. The psychological view describes intelligence as a complex piece of human behavior, similar to human anatomy and physiology. The societal view argues that intelligence is a aggregated phenomena. The author claims that AI can be all of these views simultaneously. AI is an exploration of the 'design space of intelligences'.

The history of why intelligence came into being is very speculative and not well understand. This can be attributed to the lack of data. What we do know is that evolution has played a major role in modern day humans coming into being. The author describes evolution as a 'blind search' that sometimes works out. This has led to a lot of inefficiency in the anatomy, physiology and other traits of modern day beings. Similarly, our intelligence can also be thought of as something that has evolved over time, with some inefficiencies and quirks in design.

From evidence of fossils, what we do know is that our brain has become very big very fast. (explained using the encephalizatio quotient) We also know that the human brain is functionally lateralized, though it is anatomically symmetric. This assymmetry arose in hominids and is probably unique to them. But what led to this sudden lateralization is not clearly known. The author presents a bunch of theories to do with tool making, throwing, climate, socialize, food sources and language. In order to show that human intelligence has evolved, and parts of it can be seen in other animals, the author presents some form of rationalization and perceptual intelligence in animals such as birds, bees and primates.

The author finally presents a 'design space of intelligeneces' to explore for AI researchers - conceptualize 'thinking as reliving'. There are multiple examples which show that, human's solve a lot of problems by re-enacting and visually perceiving a situation in order to arrive at a solution. Also, another speculation is that there may be ways to combine the concreteness of reasoning with the power of abstract thinking.

Discussion:
This article presents a nice overview of the evolution of our modern day understanding of intelligence. The history of how intelligence came into being is very interesting to me. Particularly, the sudden lateralization of the brain, which is a form of asymmetry, not commonly seen in nature, is pretty cool. Also, the multiple views of intelligence, based on different schools of thought show that there is still a lot to explore.

The author shows human intelligence as something that evolved over time, by giving multiple instances of intelligence seen in other animals. If that is the case, there is no reason for us to think human intelligence as the ultimate form of intelligence. As we evolve, our intelligence can get 'better' in other ways, expanding the design space of intelligences.

Thinking of intelligence as 'reliving' our thoughts, makes me wonder if all the different views we have of intelligence, is just a result of us trying to relive what we know. The reason for so many views of intelligence is simply because of researchers reliving their experiences, in their respective areas. (like Mathematics, Psychology, Economics etc).

Monday, 11 September 2017

Assignment 6: Reading 4 - Sudoku

Bibliography:
Caio Monteiro, Meenakshi Narayanan, Seth Polsley, and Tracy Hammond. A Multilingual Sketch-Based Sudoku Game with Real-Time Recognition, Frontiers in Pen and Touch, Chapter 11, Springer, 2017.

Summary:

The paper is about an software application that recognizes and verifies the sudoku puzzle played by sketching. The current implementation is multilingual, and supports Hindi and Chinese. The

application however,uses a general purpose algorithm so that it can be trained to support multiple languages. The game can be used as a captivating learning tool, in order to learn numbers in a new

language. Also, the system is capable of ignoring rough sketches. This allows the user to do rough work while playing the game, giving it a more paper-pen feel.

The system is implemented as two java applications. One is used to collect training data, for a set of languages. This data is stored in an XML file and passed to the second application, that uses this data

to train and generate templates for each digit, in each of the languages. The UI interface allows the user to check their partial results, at any stage of the game.

The game allows the user to select a language and play the game. User sketches are compared to the template sketches by using a one-sided Hausdorff distance for the sketch with each of the templates. The system then uses a k-NN classifier with neighbourhood size of 5 to improve the accuracy. k-fold cross validation is used for evaluating the accuracy. It was found that the system has a high overall accuracy, but some numbers (such as the hindi two and three) which were very similar had a lower accuracy. The authors' hope to improve the UI and allow the addition of language specific rules for higher accuracy.

Discussion:
The system is indeed a great way to learn a new language. The fact that it uses a general purpose template matching algorithm, makes it very easy to extend it to multiple languages.

I wonder if this template matching algorithm can be used for matching characters other than numbers as well. Seems like it can, as the training data for each language can be given to the first application. The idea of decoupling the training data, with the actual game, makes it very flexible to do this. This way, the system can be used to play more complex games such as cross-word as well.

Assignment 5: Reading 3 - Digital Circuits

Bibliography:
Shuo Ma, Yongbin Sun, Pengchen Lyu, Seth Polsley, and Tracy Hammond. DCSR: A Digital Circuit Sketch Recognition System for Education, Frontiers in Pen and Touch, Chapter 11, Springer, 2017.

Summary:
The paper describes an application of sketch recognition, in the field of digital logic. DCSR (Digital Circuit Sketch Recognition), identifies circuit drawings, and determines the truth value of the output.

Most existing systems, sketchySPICE only permit using AND, OR and NOT gate, with no simulation. Other systems (like LogiPad) rely on drag and drop interaction. LogiSketch was the closest system to true freehand sketching.

DCSR, uses a web interface, where users are allowed to draw circuit sketches on a canvas. The system recognises the various components of the circuit, namely gates, I/O and wires. The primary algorithms used to identify gates, is the $P algorithm. At a high level, this algorithms converts the sketch into a point-cloud and compares it with an existing set of templates. This makes it more efficient than other $-family algorithms. The authors' added a decision tree layer on this algorithm, in order to differentiate between 2 types of gates - one's which have single straight line (like AND, NAND, NOT), and one's that don't (like XOR, OR, NOR). Wires were identified by checking start and end points of the wire strokes, with proximity to gate pins. The system also calculates truth-values recursively, on each wire. Additionally, the hand drawn sketches were beautified by the system, in order to provide a neat workspace and input-output pins.

DCSR was tested by electrical engineering students, and was found to have a high overall accuracy. The decision tree layer played an important part in this, as it prevented misclassification between the two types of gates mentioned above. Some areas where DCSR can be improved were related to more freedom in drawing the gates (such as using a single stroke for type 1 gates), ordering of components. The authors look to add more features and remove the drawing constraints in the future.

Discussion:
DCSR definitely seems like it makes learning and simulating digital circuits a plesant experience. The TruthValue calculation for me, is an really useful feature when it comes to designing circuits. A student can gain confidence by verifying his design with a quick sketch.

Something I'd like to see, is to extend DCSR beyond digital logic, the same ideas can apply to simulating analog circuits, with resistors, capacitors and power sources. Support for simulation in analog circuits would be invaluable as the math involved is much more hairy. (involving complex numbers)

Wednesday, 6 September 2017

Reading-2-bhat-gis

Bibliography:

Aqib Niaz Bhat, Girish Kasiviswanathan, Christy Maria Mathew, Seth Polsley, Erik Prout, Daniel W. Goldberg, and Tracy Hammond. An Intelligent Sketching Interface for Education using GeographicInformation Systems, Frontiers in Pen and Touch, Chapter 11, Springer 2017.

Summary:
This paper is about a sketch recognition system that helps with
education, in the field of geography. The motivation for such a system
is that geographical enitities learnt by drawing on a map would aid in
better recall and comprehension of the various concepts. Current
lessons and grading in the field rely on marking on maps and multiple
choice tests. The main idea is to combine shape and location
information from the sketch and compare that with an actual data set
of geographic features. The initial version of the system works by
allowing students to draw rivers on maps and returns a similarity
score between their sketch and the actual geographical data.

The trade-off between drawing freedom and ease of recognition makes it
challenging to design a robust recognizer. The recognizer has to be
stroke independent, receptive to messy sketches that capture essential
information and be able to identify different drawing styles. The
recognition method used by the system, combined two techniques, shape
context and Hauseldorf distance, that exploit shape and locationn
features of the sketch. The system also includes pa preprocessing step,
that will remove sketches that are 'very far' from the actual
data. This is done by having stroke-length and location thresholds on
the sketch, in comparison to the river's geographical data.

The similarity measure is a weighted some of shape similarity measured
using shape context, location similarity measured using Hausdorff
distance and stroke length ratio. Shape context is calculated for each
point in a shape by measuring its relative distribution with respect
to the other points. A matching cost is between the 2 shapes, by
pairwise matching of the shape-context of individual points.
A modified version of Hausdorff distance is used for finding the
location similarity.

The paper concludes by talking about the user study and accuracy of
the model for both similar and dissimilar cases. Users of the system
were satisfied with both the learning and testing mode of the UI. The
authors look to expand their system to other geographic entities. They
are also looking to improve their classifier and build more
classifiers that include domain specific heuristics.

Discussion:
This system uses technology to solve a problem, that in my opinion,
not many people would think about. Sketching is a great way to learn
geography, and getting immediate feedback on your sketches makes it
very useful to learn by doing. The paper does a great job of
explaining limitations in existing methods and how they had to come up
with clever variations and new techniques, in order to apply it for
geographical matching.

I would love to see this extended for other entities, not just in
geography, but also in fields like of astronomy, history, archeology.

Monday, 4 September 2017

Reading 1 - Learning through the Lens of Sketch

Bibliography:

Hammond, Tracy. "Learning Through the Lens of Sketch", Frontiers in Pen and Touch, Chapter 21, Springer, 2017.

Summary:

This talk gives an overview about Dr. Hammond's research and the evolution of the field of sketch recognition. Her motivation for the field comes from trying to understand, why certain tasks that are inherently simple for humans, turn out to be difficult for computers to perform. By writing computer algorithms that mimic human activities, we may be able to better understand how the brain works. This in turn will facilitate better communication with/using technology.

Automated recognition of sketches is crucial to achieving this, as sketching is a very intuitive way for humans to perceive/understand things. Dr. Hammond categorises sketch recognition algorithms into 3 types: 1) Appearance based 2) Gesture based 3) Geometry based. As each method has its own drawbacks and advantages, the most effective system will use a combination of techniques.

Dr. Hammond's research on geometric algorithms, led to defining a set of constraints on algorithms that are similar to what humans perceive. For example, humans can easily perceive horizontal and verticle lines, but are not very good at perceiving specific angles. Furthermore, this perception of constraints changes greatly based on the size and form of the shape.

While building constraint based systems (such as Mechanix) to describe shapes, an interesting application was using these to teach drawing and perception. Human subjects were able to improve their drawing/perceiving skills by getting immediate personalised feedback. These systems are further improved by taking into account factors such as drawing corners and sound of pen. Visualizing these strokes as a function of speed led to interesting insights that differentiate novices from experts. Measuring oversteering at corners (using techniques such as NDDE and DCR) along with the above mentioned factors, helped recognize perception primitives.

Another goal of her research was to differentiate between shape and text. By looking at the shannon entropy of both, it was found that text had significantly higher entropy than shape, and this helped in distinguishing them with high accuracy.

The talk concludes by looking at temporal data for recognising sketches, namely activity recognition. The research found that, it was possible to gain information about a person, just by looking at their eye tracking data such as saccades, entropy and shapelets.

Discussion:

The talk gives an excellent overview of the field of sketch recognition, the various algorithms that are being developed and the difficulty in trying to create computer algorithms that mimic humans. It also makes me think how evolution and natural selection have made us so remarkably efficient!

The field provides significant insight into how humans perceive things. To me, pedagogy seems to be the primary application of the field. I wonder how this relates back to making computers compliment humans in day to day activities.

While most of the talk focussed on geometric algorithms (with some gesture based methods at the end), I wonder how these techniques compare to ones that use appearance based algorithms. Seems like appearance based techniques will be less insightful, as there is no temporal data.

Extended Discussion:
The video helped me see the emphasis placed on relating sketch recognition to how the brain works. Indeed, how we perceive things does significantly reflect on our behavior and sketches. As explained by Dr. Hammond in the QA, these insights will help make sketch recognition much more valuable to people, rather than just replacing the pen and paper.

It was cool to see the demonstration on how the ladder system grew to recognizing 923 shapes, just based on simple constraints. The speed of recognition was very impressive.

Regarding eye tracking, the graphs with saccades and Dr. Hammond's explanation show how they are unique to each individual, and how it relates to the usage of eye-muscles.

Sunday, 3 September 2017

Introduction

Howdy!

My name is Siddharth (you can call me Sid) and I am a Masters student (starting Fall 2017) in Computer Science, at Texas A&M University. Whoop! I come from Chennai, a city in South India, where the Texas summer is the norm :). I did my bachelor's there, but most of my schooling in various parts of India.

After completing my undergrad in 2013, I started working at System Insights, a Manufacturing Data Analytics startup, based out of Berkeley. Ever since, I love building large scale information processing systems. During my masters, I hope to delve deeper into this and related areas of computer science.

I took the Sketch Recognition course because of its novelty. It sounds really cool to be able to identify hand drawn sketches, though I am a little skeptical about its applications in the real world. Also I wonder how it relates (or competes with?) to computer vision and image processing techniques. Anyways, looking to gain some new perspective there!