Sketch Recognition Course (TAMU CSCE 624): September 2017

Wednesday, 27 September 2017

Assignment 15: Reading 11 - Hammond Chapter 2

Bibliography:
Tracy Hammond. Sketch Recognition, Chapter 2: Introduction to Gesture Recognition, 2017

Summary:
This chapter discusses a number of methods for performing gesture recognition using features. While gesture recognition can be used as a sketch recognition technique, it is important to note that it depends on the path of the pen and hence should be used with caution and maybe as one of the steps in sketch recognition. Gesture recognition can be used in sketching when a user can be taught to draw in a prescribed way or the system can be trained for a users data and the user draws the same way every time.

Dean Rubine, was one of the earliest to develop a gesture recognition method for sketches. He selected a set of 13 features from a stroke and built a linear classifier based on these features. His classifier was found to be very effective and accurate event with a small set of training data. (15 examples). His features are based on the stroke, defined by sample values of (x, y, t). His features can be classified into the following categories: starting angle, bounding box, diagonal, start-end distance, rotation measures and time measures. The Rubine system is one of the most widely used and well-known gesture recognition method.

Christopher long created a system called Quill, that contains a gesture recognition system that uses a feature set with less reliance on time, when compared with Rubine's system. The system uses a GUI to learn the gestures and can be trained on the fly for a new user. Long used 22 features in his recognizer, first 11 of which are Rubine's features. The next 11 features consist of a combination of the above features, including aspect, density, curviness and ratios of some other rubine features, logarithm of aspect and area.

Discussion:
This chapter elaborates on the previously studied long and rubine features, and gives a intuitive sense for what the features mean. An interesting insight was the reason long used Log and division in his features - these could not be learned by a linear system. Though these complex features were present, it is interesting to see that the added complexity does not imply better performance.

Assignment 14: Reading 10 - Hammond Chapter 1

Bibliography:
Tracy Hammond. Sketch Recognition, Chapter 1: Stroke Basics, 2017.

Summary:
This reading discusses the basics of Stroke - a mathematical representation of a gesture as a list of points. A stroke is represented by a series of (x: x-coordinate, y: y-coordinate, t: epoch time) samples, obtained by sampling a gesture from pen-down to pen-up on a device. The sampling rate depends on the device. Spatially, a stroke can be represented by concatenating a series of vectors from point to point. Length of a stroke is defined as the sum of all the individual vectors in a stroke.

The chapter explains various useful trignometric identities. The concepts of sine, cosine and tangent are explained as triangle identities using the mnemonic SOH CAH TOA and how they relate to the interior acute angle. Furthermore, inorder to compute the angle between lines in a stroke, the chapter explains the formula using arctan and arccos. Though the computation using arcTan is simple, it can become undefined in some cases. The arcCos and law of cosines can also be used to compute the angle, but it is computationally less efficient as it involves taking square roots.

During preprocessing, the points are sometimes resampled to make them spacially equidistant or even out point densities. The most prefereable way to do this is by using the diagonal length of the strokes bounding box. Other methods using fixed point distance or fixed number of points to not scale well for varying stroke sizes or point densities.

Discussion:
This paper provides some required mathematical background for representing gestures and generating features from them. It is very useful for someone starting out with gesture recognition. The representation of gestures as samples (and a vector) allows for applying existing computational methods to this field.

Wednesday, 20 September 2017

Assignment 13: Reading 9 - Long Gestures

Bibliography:
A. Chris Long, Jr., James A. Landay, Lawrence A. Rowe, and Joseph Michiels. Visual Similarity of Pen Gestures, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '91), 1991

Summary:
The focus of this paper is on coming up with a computational model that helps measure the 'goodness' of a gesture. The 'goodness' is defined by a combination of similarity to other gestures, ease of learning, remembering etc. The authors hope this would help designers design better gestures, using feedback from a tool that gives the 'goodness'.

The paper first describes a set of pen-based devices such as Apple Newton MessagePad and 3Com PalmPilot. Pen gestures have been found to perform better than keyboard commands for a variety of desktop and other applications. Experiements in perceptual similarity revealed that logarithm of quantitative metrics was found to correlate with similarity. The authors use a Multi dimensional scaling system called INDSCAL to reduce the dimensionality of data and identify patterns in the data by viewing plots.

In their first experiment, the authors presented users with triads of figures, and asked them to identify the one that was most different. Using MDS plots and regression analysis, the authors identified geometric properties that influenced perceived similarity and designed a model of gesture similarity that could predict how similar people would perceive a gesture. It was found that short and wide gestures were perceived to be very similar to narrow and tall ones. It was also found that different people perceived similarity using different features.

In their second experiement, the authors systematically varied features and saw how that affected perceived similarity. It was found that gestures whose lines were horizontal and vertical were perceived as more similar than ones whose components were diagonal.

The authors tested their models developed from trial 1 and trial 2 with data from the other model, model 1 was found to perform slightly better than model 2. The authors conclude that human perception is very complicated, but a small number of features were enough to identify the three most saliant dimensions. The authors hope their work will encourage research in gesture similarity, memorability and learnability.

Discussion:
This paper compliments the Rubines features paper nicely. While rubines features talks about what features help a gesture recognizer classify gestures, this paper focusses on what makes it easy for humans to learn and perceive gestures.

I've always thought that designers come up with gestures/ui capabilities based on their intuition and expensive extensive user studies. I feel having a tool that can compute 'good gestures' would revolutionize the UI design field.

Since this is a fairly old paper, I wonder what the state of the art is, in this area. Certainly, I have seen UI designers go with designs only based on intuition and A/B testing in recent days. Hope a system like this already out there or close to being out there!

Assignment 12: Reading 8 - Rubine Gestures

Bibliography:
Dean Rubine. Specifying Gestures by Example, Proceedings of the 18th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '91), 1991

Summary:
The paper talks about a GRANDMA, a toolkit for adding gestures to direct manipulation interfaces. It also describes a trainable single stroke recognizer used by GRANDMA.

At the time of writing this paper, existing gesture based applications relied on hand-coding the gesture recognizer. This made the system complex and hard to generalize. GRANDMA aims to provide a generalized toolkit for automatically creating the gesture recognizer with few examples and training data.

The paper describes GDP, a gesture-based drawing program that uses a single-stoke recognizer built using GRANDMA. The recognizer identifies simple gestures for operations like rectangle, line, rotate, delete, copy etc. Since its a single stroke recognizer, it can avoid problems like segmentation and makes it intuitive from a users tense and relax perspective while making the gesture.

The system seems to use a graphical interface where various gesture-classes can be created. Each class is given a set of training examples of different gestures, that should reflect the variation of that gesture. Empirically, it is found that 15 is a reasonable number of examples.

The single stroke gesture recognizer works by extracting a set of features from the input gesture, and uses a linear machine to classify the example as one of the gesture classes. (defined by the designer in the system). The author describes a set of 13 features, according to the following. Each feature should be incrementally computable in constant time per input point, a small change in input should correspond to a small change in the feature, there should be enough features to differentiate between all the expected gestures. These features include sine, cosine of initial angle, length of stroke, bounding box, sum of angles at each mouse point, speed and duration.

The gesture classification is done by taking a linear combination of these features, with a possibility of rejecting the gesture. The weight vector is obtained by training the classifier on the given sample input gestures. Gesture rejection is done by setting some threshold on a probability function for each class.

The single stroke recognizer was found to perform well in practice, despite its simplicity. The author describes extensions to the system by eager recognition and support for multi-touch interfaces. The multitouch support can be achieved by performing a single stroke identification for each of the strokes and then combining them using a decision tree.

Discussion:
This paper shows the importance of a field like gesture recognition (more generally sketch recognition) to Human-Computer Interaction. A toolkit for gesture recognition makes it much easier to develop this field, as hand-coding recognizers is one of the main blockers to developing effective sketch based interfaces.

The features discussed in the paper gives a nice overview of what all to look for in a single stroke. As explained by Dr. Hammond, the importance of making sure "similar shapes have similar features" is further emphasized in this paper. Intuitively, that makes a lot of sense, as that is how we humans also think about differences in shapes.

I am really curious to see an extension to this algorithm to handle multi-touch gestures, and thereby interfaces. To me, multi seems like an entirely different problem mainly because its hard to breakdown multitouch gestures into composable single-strokes. But maybe using an effective decision tree handles that.

Monday, 18 September 2017

Assignment 11: Mechanix - Do's and Don'ts

Do's:

The UX of the app is generally very intuitive. Like Being able to see the steps/progress on the left, double clicking for label, browse through all problems, questions on top and equations at the bottom etc. - Traffic signal recognizer is robust in some ways. Irrespective of where I start my stroke, the recognizer does a great of recognizing it.
I really like the feedback system, most of the times, it tells me exactly what is wrong with my sketch. For example, when I mistyped the reaction force at a particular node, it points out that the force at that node is wrong, which makes it very easy to 'debug' my sketch.
The arrow recognizer is very robust. It recognizes arrows irrespective of where I start. It even allowed me to draw arrows with more that 3 strokes!
The recognizer does a great of recognizing the direction of the arrow. I drew 'Rax' slightly tilted towards the 'y' direction, it handled it well. But when I made it ambiguous (very close to 45 degrees), it rightly said the direction was wrong.
Handling of coordinate system directions. The equation box allowed me to select my own positive and negative direction for the coordinate system. For example, If I was able to use 'upwards' or 'downwards' as the positive y direction, as long as I was consistent for that problem.

Don'ts:

Drawing the traffic light was interesting, when I used a single stroke, the recognizer accepted a fairly 'shabby' traffic light. But when I used more strokes, it didn't recognize it as a traffic light, though the final diagram was closer to the original.
Seems like the ordering of sides matter when drawing the rectangle/square shapes. When I draw the parallel sides first, it fails. Would be nice if the square/rectangle recognizer can be more robust, like the arrow recognizer.
The message dialog box disappears fast. Can stay for a little longer.
Instructions for labeling forces was a bit confusing. For example, Problem 5 expects us to label the reaction force at B to be 'Rby', but in the previous problem (Problem 4), we labeled the force as '10 lbs'. Maybe the question can be a little more explicit.
The equation box accepts multiple force values such as '10 N + Rax'. But it also accepts the answer if I enter '5 N + 5 N + Rax'. Maybe the number of terms in the equation should equal the number of forces in that direction?

Saturday, 16 September 2017

Assignment 10: Reading 7 - Mechanix

Bibliography:
Randy Brooks, Jung In Koh, SethPolsley, and Tracy Hammond. Score Improvement Distribution When Using Sketch Recognition Software (Mechanix) as a Tutor: Assessment of a High School Classroom Pilot, Frontiers in Pen and Touch, Chapter 11, Springer, 2017

Summary:
This paper talks about the impact of deploying the mechanix application to a high school, based on student performance/grades. While the 'flipped classroom' model was proving to be effective and encourages more peer to peer (and instructor) interaction, the rapid increase in the number of online courses, has led to a need for intelligent tutoring systems, like Mechanix.

The Mechanix software is built based on LADDER, a general sketch recognition technique. Mechanix runs on any machine with java, and makes it easy to practice problems involving trusses, free body diagrams and vector analysis.

Mechanix was deployed in Lovejoy High school, in a Principles of Engineering course. This course was taken mostly by sophomores, but also included juniors and seniors. After an initial pre-mechanix quiz, the instructor deviced specific interventions to help students, solving problems in Mechanix being one of them.

After a week of practicing problems in Mechanix and a post-mechanix quiz, it was found that there was an overall improvement in the performance of students. The scores were grouped into three categories, based on the previous year's Mathematics grades. It was found that there was a significant improvement in scores, in each of these categories. The author concludes by talking about the positive impact of Mechanix, ease of deployment and its applications in other stem fields as well.

Discussion:
This article gives a nice example of impact of sketch recognition in the field of education. It makes me back to the beginnings of the 'learning through the lens of sketch' talk, where Dr.Hammond mentions, how its more intuitive for us humans to perceive and understand concepts better using sketches.

Though the author clearly mentions that they were not interested in studying the effective of mechanix vs traditional methods (due to both, the lack of a control group and existance of numerous studies to see the effect of ITS), I am still curious to know how this compares to a week of traditional tutoring.

Also, Mechanix was 'one of' the interventions done by the instructor to teach the students, I would like to know what the other interventions were and how they might have heped in the score improvement.

I personally believe that using ITS will definitely make education better and more accessible to students all around the world. Looking forward to a large scale deployment of Mechanix!

Wednesday, 13 September 2017

Assignment 8: Reading 5 - Chinese Room

Bibliography:
John R. Searle. John R. Searle's Chinese Room: A Case Study in the Philosophy of Mind and Cognitive Science. http://psych.utoronto.ca/users/reingold/courses/ai/cache/searle.html

Summary:
The article talks about John Searle's chinese room argument. The argument tries to show that the 'strong artificial intelligence' position of the Materialism doctrine is false. The idea behind strong artificial intelligence is that any machine with the right program would be 'mental'. The idea behind Searle's argument is to construct a machine that would be a zombie (not mental) with any program. If this machine exists, then the existance of strong AI is false.

The thought experiment in the chinese room arguement, is that a human behaves as a machine, that implements a program (the rulebook) that has rules for constructing chinese characters from given chinese characters. Though, this human can repond to chinese, he does not understand it. Thus, there exists a machine (the human) which, given any program, does not understand chinese, and hence is not mental. This means that the strong AI argument is false.

Some cognitive scientists criticize this argument and support strong AI. The most convincing counter argument against the chinese room comes from differentiating the person from his brain. Though the person does not understand chinese, he cannot say for sure if his brain understands it. The person is an unreliable source of information about the understanding of his brain. Furthermore, the chinese room is setup such that it is limited to be able to understand chinese, like the way other humans do. If the person in the chinese room is freed, he can go and learn chinese by talking to other people, just like any other human would. Thus, just by changing the cause relations between the man and his environment.

Discussion:
The chinese room argument aims at disproving strong artificial intelligence. I like the argument from CS which calls this an 'intuition pump'. The reason I think so is that, our understanding of what understanding is, is very intuitive. The chinese room argument makes sense in some ways because, we as humans think we know what it means to understand something. This is where the other minds argument also makes sense.

I agree with the counter argument made by the CS, as we still don't know what understanding means from our brains perspective. But, in reality if strong ai is true, than I think technological singularity is not far away :).

Assignment 9: Reading 6 - Intelligence

Bibliography:
Randall Davis. What are intelligence? and Why? 1996 AAAI Presidential Address, AI Magazine, 19(1), 1998.

Summary:
This article talks about intelligence as something that has evolved over time. It argues that it may be incorrect to look for minimalism and basic principles on the way intelligence works. Instead, intelligence is a product of evolution, just like humans are.

The author states that an common trait of an intellgent being is to be able to predict, respond to change, act intentionally with a goal in mind and reason. Reasoning by itself, can be viewed from multiple schools of thought. The logical view describes intelligence as something that can be precisely and concisely expressed by a formal logic system. The psychological view describes intelligence as a complex piece of human behavior, similar to human anatomy and physiology. The societal view argues that intelligence is a aggregated phenomena. The author claims that AI can be all of these views simultaneously. AI is an exploration of the 'design space of intelligences'.

The history of why intelligence came into being is very speculative and not well understand. This can be attributed to the lack of data. What we do know is that evolution has played a major role in modern day humans coming into being. The author describes evolution as a 'blind search' that sometimes works out. This has led to a lot of inefficiency in the anatomy, physiology and other traits of modern day beings. Similarly, our intelligence can also be thought of as something that has evolved over time, with some inefficiencies and quirks in design.

From evidence of fossils, what we do know is that our brain has become very big very fast. (explained using the encephalizatio quotient) We also know that the human brain is functionally lateralized, though it is anatomically symmetric. This assymmetry arose in hominids and is probably unique to them. But what led to this sudden lateralization is not clearly known. The author presents a bunch of theories to do with tool making, throwing, climate, socialize, food sources and language. In order to show that human intelligence has evolved, and parts of it can be seen in other animals, the author presents some form of rationalization and perceptual intelligence in animals such as birds, bees and primates.

The author finally presents a 'design space of intelligeneces' to explore for AI researchers - conceptualize 'thinking as reliving'. There are multiple examples which show that, human's solve a lot of problems by re-enacting and visually perceiving a situation in order to arrive at a solution. Also, another speculation is that there may be ways to combine the concreteness of reasoning with the power of abstract thinking.

Discussion:
This article presents a nice overview of the evolution of our modern day understanding of intelligence. The history of how intelligence came into being is very interesting to me. Particularly, the sudden lateralization of the brain, which is a form of asymmetry, not commonly seen in nature, is pretty cool. Also, the multiple views of intelligence, based on different schools of thought show that there is still a lot to explore.

The author shows human intelligence as something that evolved over time, by giving multiple instances of intelligence seen in other animals. If that is the case, there is no reason for us to think human intelligence as the ultimate form of intelligence. As we evolve, our intelligence can get 'better' in other ways, expanding the design space of intelligences.

Thinking of intelligence as 'reliving' our thoughts, makes me wonder if all the different views we have of intelligence, is just a result of us trying to relive what we know. The reason for so many views of intelligence is simply because of researchers reliving their experiences, in their respective areas. (like Mathematics, Psychology, Economics etc).

Monday, 11 September 2017

Assignment 6: Reading 4 - Sudoku

Bibliography:
Caio Monteiro, Meenakshi Narayanan, Seth Polsley, and Tracy Hammond. A Multilingual Sketch-Based Sudoku Game with Real-Time Recognition, Frontiers in Pen and Touch, Chapter 11, Springer, 2017.

Summary:

The paper is about an software application that recognizes and verifies the sudoku puzzle played by sketching. The current implementation is multilingual, and supports Hindi and Chinese. The

application however,uses a general purpose algorithm so that it can be trained to support multiple languages. The game can be used as a captivating learning tool, in order to learn numbers in a new

language. Also, the system is capable of ignoring rough sketches. This allows the user to do rough work while playing the game, giving it a more paper-pen feel.

The system is implemented as two java applications. One is used to collect training data, for a set of languages. This data is stored in an XML file and passed to the second application, that uses this data

to train and generate templates for each digit, in each of the languages. The UI interface allows the user to check their partial results, at any stage of the game.

The game allows the user to select a language and play the game. User sketches are compared to the template sketches by using a one-sided Hausdorff distance for the sketch with each of the templates. The system then uses a k-NN classifier with neighbourhood size of 5 to improve the accuracy. k-fold cross validation is used for evaluating the accuracy. It was found that the system has a high overall accuracy, but some numbers (such as the hindi two and three) which were very similar had a lower accuracy. The authors' hope to improve the UI and allow the addition of language specific rules for higher accuracy.

Discussion:
The system is indeed a great way to learn a new language. The fact that it uses a general purpose template matching algorithm, makes it very easy to extend it to multiple languages.

I wonder if this template matching algorithm can be used for matching characters other than numbers as well. Seems like it can, as the training data for each language can be given to the first application. The idea of decoupling the training data, with the actual game, makes it very flexible to do this. This way, the system can be used to play more complex games such as cross-word as well.

Assignment 5: Reading 3 - Digital Circuits

Bibliography:
Shuo Ma, Yongbin Sun, Pengchen Lyu, Seth Polsley, and Tracy Hammond. DCSR: A Digital Circuit Sketch Recognition System for Education, Frontiers in Pen and Touch, Chapter 11, Springer, 2017.

Summary:
The paper describes an application of sketch recognition, in the field of digital logic. DCSR (Digital Circuit Sketch Recognition), identifies circuit drawings, and determines the truth value of the output.

Most existing systems, sketchySPICE only permit using AND, OR and NOT gate, with no simulation. Other systems (like LogiPad) rely on drag and drop interaction. LogiSketch was the closest system to true freehand sketching.

DCSR, uses a web interface, where users are allowed to draw circuit sketches on a canvas. The system recognises the various components of the circuit, namely gates, I/O and wires. The primary algorithms used to identify gates, is the $P algorithm. At a high level, this algorithms converts the sketch into a point-cloud and compares it with an existing set of templates. This makes it more efficient than other $-family algorithms. The authors' added a decision tree layer on this algorithm, in order to differentiate between 2 types of gates - one's which have single straight line (like AND, NAND, NOT), and one's that don't (like XOR, OR, NOR). Wires were identified by checking start and end points of the wire strokes, with proximity to gate pins. The system also calculates truth-values recursively, on each wire. Additionally, the hand drawn sketches were beautified by the system, in order to provide a neat workspace and input-output pins.

DCSR was tested by electrical engineering students, and was found to have a high overall accuracy. The decision tree layer played an important part in this, as it prevented misclassification between the two types of gates mentioned above. Some areas where DCSR can be improved were related to more freedom in drawing the gates (such as using a single stroke for type 1 gates), ordering of components. The authors look to add more features and remove the drawing constraints in the future.

Discussion:
DCSR definitely seems like it makes learning and simulating digital circuits a plesant experience. The TruthValue calculation for me, is an really useful feature when it comes to designing circuits. A student can gain confidence by verifying his design with a quick sketch.

Something I'd like to see, is to extend DCSR beyond digital logic, the same ideas can apply to simulating analog circuits, with resistors, capacitors and power sources. Support for simulation in analog circuits would be invaluable as the math involved is much more hairy. (involving complex numbers)

Wednesday, 6 September 2017

Reading-2-bhat-gis

Bibliography:

Aqib Niaz Bhat, Girish Kasiviswanathan, Christy Maria Mathew, Seth Polsley, Erik Prout, Daniel W. Goldberg, and Tracy Hammond. An Intelligent Sketching Interface for Education using GeographicInformation Systems, Frontiers in Pen and Touch, Chapter 11, Springer 2017.

Summary:
This paper is about a sketch recognition system that helps with
education, in the field of geography. The motivation for such a system
is that geographical enitities learnt by drawing on a map would aid in
better recall and comprehension of the various concepts. Current
lessons and grading in the field rely on marking on maps and multiple
choice tests. The main idea is to combine shape and location
information from the sketch and compare that with an actual data set
of geographic features. The initial version of the system works by
allowing students to draw rivers on maps and returns a similarity
score between their sketch and the actual geographical data.

The trade-off between drawing freedom and ease of recognition makes it
challenging to design a robust recognizer. The recognizer has to be
stroke independent, receptive to messy sketches that capture essential
information and be able to identify different drawing styles. The
recognition method used by the system, combined two techniques, shape
context and Hauseldorf distance, that exploit shape and locationn
features of the sketch. The system also includes pa preprocessing step,
that will remove sketches that are 'very far' from the actual
data. This is done by having stroke-length and location thresholds on
the sketch, in comparison to the river's geographical data.

The similarity measure is a weighted some of shape similarity measured
using shape context, location similarity measured using Hausdorff
distance and stroke length ratio. Shape context is calculated for each
point in a shape by measuring its relative distribution with respect
to the other points. A matching cost is between the 2 shapes, by
pairwise matching of the shape-context of individual points.
A modified version of Hausdorff distance is used for finding the
location similarity.

The paper concludes by talking about the user study and accuracy of
the model for both similar and dissimilar cases. Users of the system
were satisfied with both the learning and testing mode of the UI. The
authors look to expand their system to other geographic entities. They
are also looking to improve their classifier and build more
classifiers that include domain specific heuristics.

Discussion:
This system uses technology to solve a problem, that in my opinion,
not many people would think about. Sketching is a great way to learn
geography, and getting immediate feedback on your sketches makes it
very useful to learn by doing. The paper does a great job of
explaining limitations in existing methods and how they had to come up
with clever variations and new techniques, in order to apply it for
geographical matching.

I would love to see this extended for other entities, not just in
geography, but also in fields like of astronomy, history, archeology.

Monday, 4 September 2017

Reading 1 - Learning through the Lens of Sketch

Bibliography:

Hammond, Tracy. "Learning Through the Lens of Sketch", Frontiers in Pen and Touch, Chapter 21, Springer, 2017.

Summary:

This talk gives an overview about Dr. Hammond's research and the evolution of the field of sketch recognition. Her motivation for the field comes from trying to understand, why certain tasks that are inherently simple for humans, turn out to be difficult for computers to perform. By writing computer algorithms that mimic human activities, we may be able to better understand how the brain works. This in turn will facilitate better communication with/using technology.

Automated recognition of sketches is crucial to achieving this, as sketching is a very intuitive way for humans to perceive/understand things. Dr. Hammond categorises sketch recognition algorithms into 3 types: 1) Appearance based 2) Gesture based 3) Geometry based. As each method has its own drawbacks and advantages, the most effective system will use a combination of techniques.

Dr. Hammond's research on geometric algorithms, led to defining a set of constraints on algorithms that are similar to what humans perceive. For example, humans can easily perceive horizontal and verticle lines, but are not very good at perceiving specific angles. Furthermore, this perception of constraints changes greatly based on the size and form of the shape.

While building constraint based systems (such as Mechanix) to describe shapes, an interesting application was using these to teach drawing and perception. Human subjects were able to improve their drawing/perceiving skills by getting immediate personalised feedback. These systems are further improved by taking into account factors such as drawing corners and sound of pen. Visualizing these strokes as a function of speed led to interesting insights that differentiate novices from experts. Measuring oversteering at corners (using techniques such as NDDE and DCR) along with the above mentioned factors, helped recognize perception primitives.

Another goal of her research was to differentiate between shape and text. By looking at the shannon entropy of both, it was found that text had significantly higher entropy than shape, and this helped in distinguishing them with high accuracy.

The talk concludes by looking at temporal data for recognising sketches, namely activity recognition. The research found that, it was possible to gain information about a person, just by looking at their eye tracking data such as saccades, entropy and shapelets.

Discussion:

The talk gives an excellent overview of the field of sketch recognition, the various algorithms that are being developed and the difficulty in trying to create computer algorithms that mimic humans. It also makes me think how evolution and natural selection have made us so remarkably efficient!

The field provides significant insight into how humans perceive things. To me, pedagogy seems to be the primary application of the field. I wonder how this relates back to making computers compliment humans in day to day activities.

While most of the talk focussed on geometric algorithms (with some gesture based methods at the end), I wonder how these techniques compare to ones that use appearance based algorithms. Seems like appearance based techniques will be less insightful, as there is no temporal data.

Extended Discussion:
The video helped me see the emphasis placed on relating sketch recognition to how the brain works. Indeed, how we perceive things does significantly reflect on our behavior and sketches. As explained by Dr. Hammond in the QA, these insights will help make sketch recognition much more valuable to people, rather than just replacing the pen and paper.

It was cool to see the demonstration on how the ladder system grew to recognizing 923 shapes, just based on simple constraints. The speed of recognition was very impressive.

Regarding eye tracking, the graphs with saccades and Dr. Hammond's explanation show how they are unique to each individual, and how it relates to the usage of eye-muscles.

Sunday, 3 September 2017

Introduction

Howdy!

My name is Siddharth (you can call me Sid) and I am a Masters student (starting Fall 2017) in Computer Science, at Texas A&M University. Whoop! I come from Chennai, a city in South India, where the Texas summer is the norm :). I did my bachelor's there, but most of my schooling in various parts of India.

After completing my undergrad in 2013, I started working at System Insights, a Manufacturing Data Analytics startup, based out of Berkeley. Ever since, I love building large scale information processing systems. During my masters, I hope to delve deeper into this and related areas of computer science.

I took the Sketch Recognition course because of its novelty. It sounds really cool to be able to identify hand drawn sketches, though I am a little skeptical about its applications in the real world. Also I wonder how it relates (or competes with?) to computer vision and image processing techniques. Anyways, looking to gain some new perspective there!