ebook img

Natural Language Processing Outline of today's lecture NLP and linguistics Also note PDF

67 Pages·2012·1.28 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Natural Language Processing Outline of today's lecture NLP and linguistics Also note

NaturalLanguageProcessing NaturalLanguageProcessing Outline of today’s lecture Natural Language Processing Lecture 1: Introduction Overviewofthe course SimoneTeufel WhyNLPis hard ScopeofNLP ComputerLaboratory UniversityofCambridge Asample application: sentimentclassification MoreNLPapplications January2012 NLPcomponents Lecture Materialscreatedby AnnCopestake NaturalLanguageProcessing NaturalLanguageProcessing Lecture1:Introduction Lecture1:Introduction Overviewofthecourse Overviewofthecourse NLP and linguistics Also note: NLP:thecomputationalmodellingofhumanlanguage. 1. Morphology—the structure ofwords: lecture 2. 2. Syntax—the waywordsareused to form phrases: Exercises: pre-lecture andpost-lecture ◮ lectures 3,4and5. Glossary 3. Semantics ◮ RecommendedBook: Jurafsky andMartin (2008). Compositionalsemantics—theconstructionofmeaning ◮ ◮ basedonsyntax: lecture6. Lexicalsemantics—themeaningofindividualwords: ◮ lecture6. 4. Pragmatics—meaningin context: lecture 7. NaturalLanguageProcessing NaturalLanguageProcessing Lecture1:Introduction Lecture1:Introduction WhyNLPishard WhyNLPishard Querying a knowledge base Why is this difficult? Userquery: Similarstrings meandifferentthings,differentstrings meanthe Hasmyordernumber4291beenshippedyet? ◮ same thing: Database: 1. Howfast is theTZ? ORDER 2. Howfast willmyTZ arrive? Ordernumber Dateordered Dateshipped 3. Pleasetell mewhenIcan expectthe TZI ordered. 4290 2/2/09 2/2/09 Ambiguity: 4291 2/2/09 2/2/09 Doyou sell Sonylaptopsanddisk drives? 4292 2/2/09 ◮ Doyou sell (Sony(laptopsanddisk drives))? ◮ USER:Hasmyordernumber4291beenshippedyet? Doyou sell (Sonylaptops)anddisk drives)? ◮ DB QUERY:order(number=4291,date_shipped=?) RESPONSE:Ordernumber4291wasshippedon2/2/09 NaturalLanguageProcessing NaturalLanguageProcessing Lecture1:Introduction Lecture1:Introduction WhyNLPishard WhyNLPishard Why is this difficult? Why is this difficult? Similarstrings meandifferentthings,differentstrings meanthe Similarstrings meandifferentthings,differentstrings meanthe same thing: same thing: 1. Howfast is theTZ? 1. Howfast is theTZ? 2. Howfast willmyTZ arrive? 2. Howfast willmyTZ arrive? 3. Pleasetell mewhenIcan expectthe TZI ordered. 3. Pleasetell mewhenIcan expectthe TZI ordered. Ambiguity: Ambiguity: Doyou sell Sonylaptopsanddisk drives? Doyou sell Sonylaptopsanddisk drives? ◮ ◮ Doyou sell (Sony(laptopsanddisk drives))? Doyou sell (Sony(laptopsanddisk drives))? ◮ ◮ Doyou sell (Sonylaptops)anddisk drives)? Doyou sell (Sonylaptops)anddisk drives)? ◮ ◮ NaturalLanguageProcessing NaturalLanguageProcessing Lecture1:Introduction Lecture1:Introduction WhyNLPishard WhyNLPishard Why is this difficult? Why is this difficult? Similarstrings meandifferentthings,differentstrings meanthe Similarstrings meandifferentthings,differentstrings meanthe same thing: same thing: 1. Howfast is theTZ? 1. Howfast is theTZ? 2. Howfast willmyTZ arrive? 2. Howfast willmyTZ arrive? 3. Pleasetell mewhenIcan expectthe TZI ordered. 3. Pleasetell mewhenIcan expectthe TZI ordered. Ambiguity: Ambiguity: Doyou sell Sonylaptopsanddisk drives? Doyou sell Sonylaptopsanddisk drives? ◮ ◮ Doyou sell (Sony(laptopsanddisk drives))? Doyou sell (Sony(laptopsanddisk drives))? ◮ ◮ Doyou sell (Sonylaptops)anddisk drives)? Doyou sell (Sonylaptops)anddisk drives)? ◮ ◮ NaturalLanguageProcessing NaturalLanguageProcessing Lecture1:Introduction Lecture1:Introduction WhyNLPishard WhyNLPishard Why is this difficult? Wouldn’t it be better if ...? Similarstrings meandifferentthings,differentstrings meanthe Thepropertieswhich make naturallanguagedifficult to process same thing: are essentialto humancommunication: 1. Howfast is theTZ? Flexible ◮ 2. Howfast willmyTZ arrive? Learnablebutcompact ◮ 3. Pleasetell mewhenIcan expectthe TZI ordered. Emergent,evolvingsystems ◮ Ambiguity: Synonymyandambiguitygo alongwith these properties. Doyou sell Sonylaptopsanddisk drives? Naturallanguagecommunicationcan be indefinitelyprecise: ◮ ◮ Doyou sell (Sony(laptopsanddisk drives))? ◮ Ambiguityis mostly local(for humans) ◮ Doyou sell (Sonylaptops)anddisk drives)? ◮ Semi-formaladditionsandconventionsfor differentgenres NaturalLanguageProcessing NaturalLanguageProcessing Lecture1:Introduction Lecture1:Introduction WhyNLPishard ScopeofNLP Wouldn’t it be better if ...? Some NLP applications spelling andgrammar ◮ information extraction ◮ Thepropertieswhich make naturallanguagedifficult to process checking questionanswering ◮ are essentialto humancommunication: opticalcharacter ◮ summarization Flexible recognition(OCR) ◮ ◮ text segmentation ◮ Learnablebutcompact ◮ screen readers ◮ exam marking ◮ Emergent,evolvingsystems ◮ augmentativeand ◮ alternative communication ◮ reportgeneration Synonymyandambiguitygo alongwith these properties. Naturallanguagecommunicationcan be indefinitelyprecise: ◮ machineaidedtranslation ◮ machinetranslation ◮ Ambiguityis mostly local(for humans) ◮ lexicographers’tools ◮ naturallanguageinterfaces to databases ◮ Semi-formaladditionsandconventionsfor differentgenres ◮ information retrieval emailunderstanding documentclassification ◮ ◮ dialoguesystems documentclustering ◮ ◮ NaturalLanguageProcessing NaturalLanguageProcessing Lecture1:Introduction Lecture1:Introduction Asampleapplication:sentimentclassification Asampleapplication:sentimentclassification Sentiment classification: finding out what people think Motorola KRZR (from the Guardian) about you Motorolahasstruggledto come upwith a worthy successor to theRAZR,arguablythe mostinfluential ◮ Task: scan documentsforpositive andnegativeopinions phoneofthe pastfew years. Its latestattemptis the onpeople,productsetc. KRZR,which hasthe same clamshelldesignbuthas Findallreferencesto entity in somedocumentcollection: some additionalfeatures. It hasa striking bluefinish ◮ list as positive,negative(possibly with strength)orneutral. onthe frontandthe backofthe handsetis verytactile brushedrubber. Like its predecessors,the KRZRhas Summariesplustext snippets. ◮ a laser-etchedkeypad,butin this instance Motorola Fine-grainedclassification: ◮ hasincludedridgesto make iteasierto use. e.g.,forphone,opinionsabout: overalldesign,keypad, ...Overall there’s notmuch to dislike aboutthephone, camera. butits slightly quirky designmeansthatit probably Still oftendonebyhumans... ◮ won’tbeashugeorashotasthe RAZR. NaturalLanguageProcessing NaturalLanguageProcessing Lecture1:Introduction Lecture1:Introduction Asampleapplication:sentimentclassification Asampleapplication:sentimentclassification Sentiment classification: the research task IMDb: An American Werewolf in London (1981) Rating: 9/10 Ooooo. Scary. Fulltask: informationretrieval, cleaningup text structure, ◮ Theoldadageof thesimplest ideasbeingthe bestis namedentity recognition,identificationof relevantpartsof onceagaindemonstratedin this, oneofthemost text. Evaluationbyhumans. entertainingfilms ofthe early80’s, andalmost ◮ Researchtask: preclassified documents,topicknown, certainlyJon Landis’best workto date. Thescript is opinionin text alongwith some straightforwardly lightandwitty, the visuals are greatandthe extractable score. atmosphereis top class. Plusthere aresome great Moviereview corpus,with ratings. freeze-framemomentsto enjoyagainandagain. Not ◮ forgetting,ofcourse,the greattransformation scene which still impresses to thisday. In Summary: Topbanana NaturalLanguageProcessing NaturalLanguageProcessing Lecture1:Introduction Lecture1:Introduction Asampleapplication:sentimentclassification Asampleapplication:sentimentclassification Bag of words technique Sentiment words Treatthereviews ascollections ofindividualwords. ◮ Classify reviews according to positive ornegativewords. ◮ Coulduse wordlists preparedbyhumans,butmachine ◮ learningbasedona portionofthe corpus(training set) is thanks preferable. Use star rankingsfortraining andevaluation. ◮ Pangetal,2002: Chancesuccess is50%(moviedatabase ◮ wasartifically balanced),bag-of-wordsgives80%. NaturalLanguageProcessing NaturalLanguageProcessing Lecture1:Introduction Lecture1:Introduction Asampleapplication:sentimentclassification Asampleapplication:sentimentclassification Sentiment words Sentiment words thanks never from Potts andSchwarz(2008) NaturalLanguageProcessing NaturalLanguageProcessing Lecture1:Introduction Lecture1:Introduction Asampleapplication:sentimentclassification Asampleapplication:sentimentclassification Sentiment words Sentiment words never quite from Potts andSchwarz(2008) NaturalLanguageProcessing NaturalLanguageProcessing Lecture1:Introduction Lecture1:Introduction Asampleapplication:sentimentclassification Asampleapplication:sentimentclassification Sentiment words Sentiment words: ever quite ever from Potts andSchwarz(2008) NaturalLanguageProcessing NaturalLanguageProcessing Lecture1:Introduction Lecture1:Introduction Asampleapplication:sentimentclassification Asampleapplication:sentimentclassification Sentiment words: ever Some sources of errors for bag-of-words ever Negation: ◮ RidleyScotthasneverdirecteda badfilm. Overfitting the trainingdata: ◮ e.g.,if trainingset includesa lotoffilms from before2005, Ridleymaybe astrong positive indicator,butthenwe test onreviews for ‘KingdomofHeaven’? Comparisonsandcontrasts. ◮ from Potts andSchwarz(2008) NaturalLanguageProcessing NaturalLanguageProcessing Lecture1:Introduction Lecture1:Introduction Asampleapplication:sentimentclassification Asampleapplication:sentimentclassification Contrasts in the discourse More contrasts AN AMERICANWEREWOLFIN PARISis a failed attempt...Julie Delpyis fartoo goodfor this movie. SheimbuesSerafinewith spirit, spunk,andhumanity. Thisfilm shouldbebrilliant. It soundslike a greatplot, Thisisn’t necessarily agoodthing,since it preventsus the actorsare first grade,andthesupportingcast is from relaxing andenjoyingANAMERICAN goodas well,andStalloneisattemptingto delivera WEREWOLFIN PARISasa completelymindless, goodperformance. However,it can’tholdup. campyentertainmentexperience. Delpy’s injection of class into an otherwiseclassless productionraisesthe specterofwhatthisfilm could havebeenwith a better script anda bettercast ...She wasradiant, charismatic, andeffective ... NaturalLanguageProcessing NaturalLanguageProcessing Lecture1:Introduction Lecture1:Introduction Asampleapplication:sentimentclassification Asampleapplication:sentimentclassification Sample data Doing sentiment classification ‘properly’? Morphology,syntax andcompositionalsemantics: ◮ whois talking aboutwhat,whatterms are associated with what,tense... http://www.cl.cam.ac.uk/~sht25/sentiment/ Lexicalsemantics: (linked from ◮ are wordspositive ornegativein this context? Word http://www.cl.cam.ac.uk/~sht25/stuff.html) senses(e.g.,spirit)? Seetest datatexts in: http://www.cl.cam.ac.uk/~sht25/sentiment/test/ ◮ Pragmaticsanddiscourse structure: classified into positive/negative. whatis the topicofthis section oftext? Pronounsand definitereferences. Butgettingallthistoworkwellonarbitrarytextisveryhard. ◮ Ultimatelythe problemis AI-complete,butcan wedo well ◮ enoughforNLPto be useful? NaturalLanguageProcessing NaturalLanguageProcessing Lecture1:Introduction Lecture1:Introduction Asampleapplication:sentimentclassification MoreNLPapplications Doing sentiment classification ‘properly’? IR, IE and QA Morphology,syntax andcompositionalsemantics: ◮ whois talking aboutwhat,whatterms are associated with Information retrieval: return documentsin responseto a ◮ what,tense... userquery(Internet Searchis a specialcase) Lexicalsemantics: ◮ Information extraction: discoverspecific informationfrom a ◮ are wordspositive ornegativein this context? Word set ofdocuments(e.g. companyjointventures) senses(e.g.,spirit)? Questionanswering: answera specific user questionby ◮ Pragmaticsanddiscourse structure: ◮ returninga section ofa document: whatis the topicofthis section oftext? Pronounsand Whatis the capitalofFrance? definitereferences. Paris hasbeenthe Frenchcapitalfor manycenturies. Butgettingallthistoworkwellonarbitrarytextisveryhard. ◮ Much moreaboutthesein the IR course. Ultimatelythe problemis AI-complete,butcan wedo well ◮ enoughforNLPto be useful? NaturalLanguageProcessing NaturalLanguageProcessing Lecture1:Introduction Lecture1:Introduction MoreNLPapplications MoreNLPapplications MT Human translation? Earliest attemptedNLPapplication ◮ Qualitydependson restricting the domain ◮ Utility greatly increasedwith increase in availabilityof ◮ electronictext Goodapplicationsfor badMT ... ◮ Spokenlanguagetranslation is viable forlimited domains ◮ NaturalLanguageProcessing NaturalLanguageProcessing Lecture1:Introduction Lecture1:Introduction MoreNLPapplications MoreNLPapplications Human translation? Natural language interfaces and dialogue systems Allrely on alimited domain: LUNAR:classic exampleofa naturallanguageinterface to ◮ a database(NLID):1970–1975 SHRDLU:(text-based) dialoguesystem: 1973 ◮ Currentspoken dialoguesystems ◮ Limiteddomainallowsdisambiguation: e.g.,in LUNAR,rock hadonesense. I amnotin the office atthe moment. Pleasesendanyworkto be translated. NaturalLanguageProcessing NaturalLanguageProcessing Lecture1:Introduction Lecture1:Introduction MoreNLPapplications MoreNLPapplications Siri Example Dialogues Siri Example Dialogues Man: Whatdoesmyday looklike? Man(jogging): Movemymeetingwith Kelly Altekto 12. Siri: Nottoo bad,onlytwo meetings(showsthemonscreen) Siri: You alreadyhavea meetingaboutbudgetsat12. ShallI *** scheduleit anyway? Woman: DoI needanumbrellatonight? Man: Moveit to 2. ...Play myrunningmix. Siri: Thereis norain in the forecast fortonight. *** *** Woman: Iam locked out. Siri: I foundthree locksmiths fairly close to you (showsthemon Andmore requeststo Siri: screen) Man: Howdo Itie a bowtieagain? Child: Whatdoesa weasellooklike? Woman: Wehave a flattire.

Description:
NLP and linguistics. NLP: the computational modelling of human language. 1. Morphology .. http://www.cl.cam.ac.uk/~sht25/sentiment/test/ classified
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.