ebook img

Understanding Freehand Diagrams: Combining Appearance and Context for Multi-Domain Sketch ... PDF

102 Pages·2012·12.5 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Understanding Freehand Diagrams: Combining Appearance and Context for Multi-Domain Sketch ...

Understanding Freehand Diagrams: Combining Appearance and Context for Multi-Domain Sketch Recognition by Tom Yu Ouyang Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of ARCHIVES MASSACHUS-ETr, ' NSTiTLJ1I - Doctor of Philosophy in Computer Science at the MAR 2 MASSACHUSETTS INSTITUTE OF TECHNOLOGY BRRF February 2012 © Massachusetts Institute of Technology 2012. All rights reserved. A Author ... Department oDff Elee ctrical EE ngineering aa nd CC omputer SS....c.....i...e....n.ncce January 13, 2012 C ertified by ... . ....... ......... ............................. Randall Davis Professor Thesis Supervisor Accepted by ......... V Leslie A. Kolodziejski Chair, Committee on Graduate Students 2 Understanding Freehand Diagrams: Combining Appearance and Context for Multi-Domain Sketch Recognition by Tom Yu Ouyang Submitted to the Department of Electrical Engineering and Computer Science on January 13, 2012, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science Abstract As our interaction with computing shifts away from the traditional desktop model (e.g., towards smartphones, tablets, touch-enabled displays), the technology that drives this in- teraction needs to evolve as well. Wouldn't it be great if we could talk, write, and draw to a computer just like we do with each other? This thesis addresses the drawing aspect of that vision: enabling computers to understand the meaning and semantics of free-hand diagrams. We present a novel framework for sketch recognition that seamlessly combines a rich representation of local visual appearance with a probabilistic graphical model for capturing higher level relationships. This joint model makes our system less sensitive to noise and drawing variations, improving accuracy and robustness. The result is a recog- nizer that is better able to handle the wide range of drawing styles found in messy freehand sketches. To preserve the fluid process of sketching on paper, our interface allows users to draw diagrams just as they would on paper, using the same notations and conventions. For the isolated symbol recognition task our method exceeds state-of-the-art performance in three domains: handwritten digits, PowerPoint shapes, and electrical circuit symbols. For the complete diagram recognition task it was able to achieve excellent performance on both chemistry and circuit diagrams, improving on the best previous results. Furthermore, in an on-line study our new interface was on average over twice as fast as the existing CAD-based method for authoring chemical diagrams, even for novice users who had little or no experience using a tablet. This is one of the first direct comparisons that shows a sketch recognition interface significantly outperforming a professional industry-standard CAD-based tool. Thesis Supervisor: Randall Davis Title: Professor 4 Acknowledgments I have learned and experienced so much during my time here at MIT, and I am so thankful to the many people who have helped me along the way. This thesis would not have been possible without them. First and foremost, I would like to thank Randy Davis for being a wonderful mentor and advisor. His vision and guidance have helped shape this work from the very beginning, and our frequent intellectual discussions have been some of my most formative and valued moments here. I am constantly amazed at how easily he can distill the most complicated ideas down to their core issues, and communicate them in such a clear and concise manner. He has taught me how to always look at the big picture and ask the right research questions, to not shy away from the hard problems, and at the same time to avoid falling into the bottomless pits. I want to thank my committee members Bill Freeman and Rob Miller. This work would not be what it is today without their input and encouragement, and the insights from their diverse backgrounds in machine learning and HCI. I also want to thank Seth Teller for his support and advice on my M.S. thesis, and helping me find the direction for my dissertation. I am also very grateful to have had wonderful collaborators and fellow lab members over the years, many of whom have become close friends: Aaron Adler, Sonya Cates, Danica Chang, Chih-Yu Chao, Andrew Correa, Jacob Eisenstein, Tracy Hammond, James Oleinik, Mike Oltmans, Lakshman Sankar, Metin Sezgin, Jeremy Scott, Yale Song, and Ying Yin. A special thanks to Nira Manokharan, who was always there to help me and point me to the right resources. Thanks also to all of the great people in The Infrastructure Group (TIG) for making the lab run so smoothly. I would also like to thank all of my friends here who have made this a truly unforgettable experience. I won't try to list them all because I know I will forget someone. Finally, my biggest thanks to my family for their support, their patience, and their love. To my girlfriend Jennifer, who over the years has been my source of strength and inspiration. Thank you for sharing the journey with me. 6 Contents 1 Introduction 1.1 Background . . . . . . . . . . . . . . . . . . 1.2 Challenges. . . . . . . . . . . . . . . . . . . 1.3 Natural Sketch Interaction . . . . . . . . . . 1.4 Chemistry and Analog Circuits . . . . . . . . 1.5 Contributions . . . . . . . . . . . . . . . . . 2 Isolated Symbol Recognition 2.1 Symbol Recognition in Context . . . . . . . . 2.1.1 Relation to Handwriting Recognition 2.2 Visual Symbol Recognition . . . . . . 2.2.1 Input Normalization . . . . . 2.2.2 Feature Representation . . . . 2.2.3 Smoothing and Downsampling 2.2.4 Classification . . . . . . . . . 2.2.5 Performance Optimizations 2.2.6 Rotational Invariance . . . . . 2.3 Experimental Evaluation . . . . . . . 2.3.1 Pen Digits . . . . . . . . . . . 2.3.2 HHReco PowerPoint Shapes . 2.3.3 Electrical Circuits . . . . . . . 2.3.4 Runtime Performance . . . . 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 5 3 Complete Sketch Understanding 47 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.2.1 Segment Features . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3 Candidates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.4 Feature Image Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.5 Candidate Selection using a Graphical Model . . . . . . . . . . . . . . . . 57 3.5.1 Inference and Parameter Estimation . . . . . . . . . . . . . . . . . 61 3.6 Structure Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.6.1 Real-Time Recognition . . . . . . . . . . . . . . . . . . . . . . . . 63 4 Evaluation 65 4.1 Off-Line Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.1.1 Corner Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.1.2 Symbol Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.2 On-line Comparative Evaluation . . . . . . . . . . . . . . . . . . . . . . . 69 4.2.1 Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.2.2 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.2.3 Qualitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.3 Electrical Circuit Drawings . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5 Related Work 79 5.1 Single-Stroke Gestures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.2 Geometric Primitives and Shape Descriptions . . . . . . . . . . . . . . . . 80 5.3 Domain-Specific Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.4 Visual Appearance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.5 Chemistry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.6 Quantitative Comparisons to Related Work . . . . . . . . . . . . . . . . . 84 8 6 Conclusion 87 6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.3.1 Natural Correction and Editing . . . . . . . . . . . . . . . 89 6.3.2 Parts-based Recognition . . . . . . . . . . . . . . . . . . 89 6.3.3 Learning from Offline Structured Data . . . . . . . . . . . 89 6.3.4 Continuous Learning and Adaptation . . . . . . . . . . 91 6.3.5 Learning New Domains . . . . . . . . . . . . . . . . . . 92 6.3.6 Multimodal Interaction . . . . . . . . . . . . . . . . . . . 92 A Drawing Notations A. 1 Chemistry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.2 Analog Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Description:
5-5 ChemPad sketch-based interface for stereochemistry visualization and ed- With the growing popularity of mobile devices such as smart-phones (e.g., Android, iPhone), authoring tools in terms of usability and speed.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.