ebook img

Toward Widely-Available and Usable Multimodal Conversational Interfaces Alexander Gruenstein PDF

166 Pages·2009·7.75 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Toward Widely-Available and Usable Multimodal Conversational Interfaces Alexander Gruenstein

Toward Widely-Available and Usable Multimodal Conversational Interfaces by Alexander Gruenstein B.S. Stanford University (2003) M.S. Stanford University (2003) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2009 (cid:13)c Massachusetts Institute of Technology 2009. All rights reserved. Author .............................................................. Department of Electrical Engineering and Computer Science May 18, 2009 Certified by.......................................................... Stephanie Seneff Principal Research Scientist Thesis Supervisor Accepted by......................................................... Terry P. Orlando Chairman, Department Committee on Graduate Students 2 Toward Widely-Available and Usable Multimodal Conversational Interfaces by Alexander Gruenstein Submitted to the Department of Electrical Engineering and Computer Science on May 18, 2009, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract Multimodal conversational interfaces, which allow humans to interact with a com- puter using a combination of spoken natural language and a graphical interface, offer the potential to transform the manner by which humans communicate with comput- ers. While researchers have developed myriad such interfaces, none have made the transition out of the laboratory and into the hands of a significant number of users. This thesis makes progress toward overcoming two intertwined barriers preventing more widespread adoption: availability and usability. Toward addressing the problem of availability, this thesis introduces a new plat- form for building multimodal interfaces that makes it easy to deploy them to users via the World Wide Web. One consequence of this work is City Browser, the first multimodal conversational interface made publicly available to anyone with a web browser and a microphone. City Browser serves as a proof-of-concept that significant amounts of usage data can be collected in this way, allowing a glimpse of how users interact with such interfaces outside of a laboratory environment. City Browser, in turn, has served as the primary platform for deploying and evaluating three new strategies aimed at improving usability. The most pressing usability challenge for conversational interfaces is their limited ability to accurately transcribe and understand spoken natural language. The three strategies developed in this thesis – context-sensitive language modeling, response confidence scoring, and user behavior shaping – each attack the problem from a different angle, but they are linked in that each critically integrates information from the conversational context. Thesis Supervisor: Stephanie Seneff Title: Principal Research Scientist 3 4 To “Norbert” I can’t wait to meet you 5 6 Acknowledgments From the first day I met my advisor, Stephanie Seneff, I thought we would get along just fine. Luckily, I was right; more so than I could have imagined. Stephanie has been the best advisor, mentor, advocate, colleague, and friend I could have asked for. Thank you. I am grateful to my committee members, Victor Zue and Randall Davis, for their comments. This thesis has improved a great deal based on their feedback. It’sbeenadistinctpleasuretoworkwiththestaffoftheSpokenLanguageSystems group over the last five years. Jim Glass, in particular, has provided mentorship, encouragement, advice, and support on a daily basis. Chao Wang taught me how to do so many things I’ve lost count. T.J. Hazen helped with the recognition confidence module, andamajorpartofthisthesiswouldnothavebeenpossiblewithouthishelp. Scott Cyphers and Lee Hetherington helped with innumerable technical challenges. Finally, I am grateful to Marcia Davidson, who was always ready with a laugh to buy whatever random item I needed on very short notice. Collaborating with Ian McGraw on all things WAMI has been amazingly fun and rewarding. Sean Liu has been a constant collaborator on City Browser, and I shudder to think of what the interface might have looked like without him. I’d like to thank my officemates, Ali Mohammad, Harr Chen, Tara Sainath, and Yuan Shen, who have provided friendship, diversion, advice, and many fascinat- ing discussions. I’ve also benefited from my interactions other SLS students, in- cluding Ibrahim Badr, Brad Cater, Ghinwa Choueiter, Ed Filisko, Paul Hsu, John Lee, JingJing Liu, Gary Matthias, Liz Murnane, Alex Park, Mitchell Peabody, Ken Schutte, Han Shu, Yushi Xu, Brandon Yoshimoto, and Helen You. It was only with the help of a number of collaborators that the automotive City Browser system could be created, and data collected. I’d like to thank in particular Jeff Zabel, Shannon Roberts, Jarrod Orszulak, and Bryan Reimer. My interest in the field began at Stanford, under the mentorship of Stanley Peters and Oliver Lemon. I am particularly indebted to Oliver, both for his friendship, and for teaching me what it is to do research. If not for him, I have no idea what I would be doing today. Another important experience over the last five years was meeting other young researchers in the field at the Young Researchers Roundtable on Spoken Dialogue Systems. My friendships with Verena Rieser and Mihai Rotaru, in particular, stand out – and I thank them for making so many trips much more fun and interesting. I could never have written this thesis without the constant love and support of my family and friends. My parents, John and Carolyn, and my sisters, Cassie and Elizabeth, have always supported me and loved me unconditionally. Justin, who feels more like family than friend, is always challenging me, and making me laugh. I would be lost without my wife Anna, who keeps me sane, happy, focused, and relaxed; and somehow put up with five years of this. This research is funded in part by the T-Party project, a joint research program between MIT and Quanta Computer Inc., Taiwan. 7 8 Contents 1 Introduction 19 1.1 Multimodal Interfaces on the Web . . . . . . . . . . . . . . . . . . . . 21 1.2 Data Collection and Annotation . . . . . . . . . . . . . . . . . . . . . 21 1.3 Context-Sensitive Language Modeling . . . . . . . . . . . . . . . . . . 22 1.4 Context-Sensitive Confidence Scoring . . . . . . . . . . . . . . . . . . 22 1.5 Contextual User Utterance Shaping . . . . . . . . . . . . . . . . . . . 23 1.6 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2 City Browser: A Widely Available Multimodal Conversational In- terface 27 2.1 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2 Architectural Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.3 Web Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4 Graphical User Interface . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.4.1 Multimodal Error Correction . . . . . . . . . . . . . . . . . . 33 2.5 Natural Language Processing Pipeline . . . . . . . . . . . . . . . . . . 35 2.5.1 Speech Recognizer . . . . . . . . . . . . . . . . . . . . . . . . 35 2.5.2 Natural Language Parser . . . . . . . . . . . . . . . . . . . . . 35 2.5.3 Discourse and Gesture Resolution . . . . . . . . . . . . . . . . 35 2.5.4 Dialogue Management . . . . . . . . . . . . . . . . . . . . . . 36 2.5.5 Natural Language Generation . . . . . . . . . . . . . . . . . . 37 2.5.6 Suggestions Module . . . . . . . . . . . . . . . . . . . . . . . . 37 2.5.7 Confidence Annotator . . . . . . . . . . . . . . . . . . . . . . 37 2.5.8 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.5.9 Speech Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.6 Database Creation via Web Crawling . . . . . . . . . . . . . . . . . . 38 2.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.7.1 Web-Based Speech Interfaces . . . . . . . . . . . . . . . . . . 38 2.7.2 Multimodal Conversational Interfaces . . . . . . . . . . . . . . 40 2.7.3 Widely Available Multimodal Speech Interfaces . . . . . . . . 40 2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3 The WAMI Toolkit and Example Applications 43 3.1 Toolkit Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.1.1 Toolkit-Only . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 9 3.1.2 Toolkit+Portal . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2 Lightweight Semantic Understanding . . . . . . . . . . . . . . . . . . 45 3.2.1 Incremental Understanding . . . . . . . . . . . . . . . . . . . . 45 3.3 WAMI Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.3.1 SLS Applications . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.3.2 Student Applications . . . . . . . . . . . . . . . . . . . . . . . 51 3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4 Corpora 57 4.1 Overview of Corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.1.1 Tablet Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.1.2 Web Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.1.3 Car and Car-Pilot corpora . . . . . . . . . . . . . . . . . . . 63 4.2 Comparison to Similar Corpora . . . . . . . . . . . . . . . . . . . . . 66 4.3 Annotation Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5 Context-Sensitive Language Modeling 73 5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.1.1 n-gram Language Models . . . . . . . . . . . . . . . . . . . . . 76 5.1.2 Probabilistic context-free grammars . . . . . . . . . . . . . . . 77 5.1.3 Training Corpora . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.1.4 Word Error Rate . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.2.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.3 Contextualized Semantic Classes . . . . . . . . . . . . . . . . . . . . . 83 5.3.1 Cues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.3.2 Scalability and Flexibility . . . . . . . . . . . . . . . . . . . . 86 5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6 An Empirical Evaluation of Contextualized Semantic Classes 89 6.1 Experiments in the Flight Reservation Domain . . . . . . . . . . . . . 89 6.1.1 Verbal Cues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6.1.2 Prompt cues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.1.3 Experimental Conditions . . . . . . . . . . . . . . . . . . . . . 92 6.1.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 94 6.2 City Browser Experiments . . . . . . . . . . . . . . . . . . . . . . . . 97 6.2.1 Graphical and Implicit Cues . . . . . . . . . . . . . . . . . . . 98 6.2.2 Experiments on the Tablet corpus . . . . . . . . . . . . . . . . 98 6.2.3 Experiments on the Car-Pilot and Car corpora . . . . . . . . 100 6.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 10

Description:
Submitted to the Department of Electrical Engineering and Computer. Science usability challenge for conversational interfaces is their limited ability to I could never have written this thesis without the constant love and support of 40. 2.7.3 Widely Available Multimodal Speech Interfaces 40
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.