ebook img

Computational Linguistics and Intelligent Text Processing: 15th International Conference, CICLing 2014, Kathmandu, Nepal, April 6-12, 2014, Proceedings, Part I PDF

554 Pages·2014·11.25 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Computational Linguistics and Intelligent Text Processing: 15th International Conference, CICLing 2014, Kathmandu, Nepal, April 6-12, 2014, Proceedings, Part I

Alexander Gelbukh (Ed.) Computational Linguistics 3 0 4 and Intelligent 8 S C Text Processing N L 15th International Conference, CICLing 2014 Kathmandu, Nepal, April 6–12, 2014 Proceedings, Part I 123 Lecture Notes in Computer Science 8403 CommencedPublicationin1973 FoundingandFormerSeriesEditors: GerhardGoos,JurisHartmanis,andJanvanLeeuwen EditorialBoard DavidHutchison LancasterUniversity,UK TakeoKanade CarnegieMellonUniversity,Pittsburgh,PA,USA JosefKittler UniversityofSurrey,Guildford,UK JonM.Kleinberg CornellUniversity,Ithaca,NY,USA AlfredKobsa UniversityofCalifornia,Irvine,CA,USA FriedemannMattern ETHZurich,Switzerland JohnC.Mitchell StanfordUniversity,CA,USA MoniNaor WeizmannInstituteofScience,Rehovot,Israel OscarNierstrasz UniversityofBern,Switzerland C.PanduRangan IndianInstituteofTechnology,Madras,India BernhardSteffen TUDortmundUniversity,Germany DemetriTerzopoulos UniversityofCalifornia,LosAngeles,CA,USA DougTygar UniversityofCalifornia,Berkeley,CA,USA GerhardWeikum MaxPlanckInstituteforInformatics,Saarbruecken,Germany Alexander Gelbukh (Ed.) Computational Linguistics and Intelligent Text Processing 15th International Conference, CICLing 2014 Kathmandu, Nepal, April 6-12, 2014 Proceedings, Part I 1 3 VolumeEditor AlexanderGelbukh NationalPolytechnicInstitute CenterforComputingResearch Av.JuanDiosBátiz,Col.NuevaIndustrialVallejo 07738MexicoD.F.,Mexico E-mail:[email protected] ISSN0302-9743 e-ISSN1611-3349 ISBN978-3-642-54905-2 e-ISBN978-3-642-54906-9 DOI10.1007/978-3-642-54906-9 SpringerHeidelbergNewYorkDordrechtLondon LibraryofCongressControlNumber:2014934305 LNCSSublibrary:SL1–TheoreticalComputerScienceandGeneralIssues ©Springer-VerlagBerlinHeidelberg2014 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped.Exemptedfromthislegalreservationarebriefexcerptsinconnection withreviewsorscholarlyanalysisormaterialsuppliedspecificallyforthepurposeofbeingenteredand executedonacomputersystem,forexclusiveusebythepurchaserofthework.Duplicationofthispublication orpartsthereofispermittedonlyundertheprovisionsoftheCopyrightLawofthePublisher’slocation, inistcurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Permissionsforuse maybeobtainedthroughRightsLinkattheCopyrightClearanceCenter.Violationsareliabletoprosecution undertherespectiveCopyrightLaw. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Whiletheadviceandinformationinthisbookarebelievedtobetrueandaccurateatthedateofpublication, neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityforanyerrorsor omissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,withrespecttothe materialcontainedherein. Typesetting:Camera-readybyauthor,dataconversionbyScientificPublishingServices,Chennai,India Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Preface CICLing 2014 was the 15th annual Conference on Intelligent Text Processing and Computational Linguistics. The CICLing conferences provide a wide-scope forumfordiscussionof the artandcraftofnaturallanguageprocessingresearch as well as the best practices in its applications. This set of two books contains four invited papers and a selection of regular papers accepted for presentation at the conference. Since 2001, the proceedings of the CICLing conferences have been published in Springer’s Lecture Notes in Computer Science series as volume numbers 2004,2276,2588,2945, 3406,3878, 4394, 4919, 5449, 6008, 6608, 6609, 7181, 7182, 7816, and 7817. The set has been structured into 17 sections, representative of the current trends in researchand applications of natural language processing: – Lexical Resources – Document Representation – Morphology, POS-tagging, and Named Entity Recognition – Syntax and Parsing – Anaphora Resolution – Recognizing Textual Entailment – Semantics and Discourse – Natural Language Generation – Sentiment Analysis and Emotion Recognition – Opinion Mining and Social Networks – Machine Translation and Multilingualism – Information Retrieval – Text Classificationand Clustering – PlagiarismDetection – Style and Spelling Checking – Speech Processing – Applications The 2014 event received submissions from 57 countries, a record high number in the 15-year history of the CICLing series. Exactly 300 papers (third highest numberinthehistoryofCICLing)by639authorsweresubmittedforevaluation by the international Program Committee (see Figure 1 and Tables 1 and 2). This two-volume set contains revised versions of 85 regular papers selected for presentation; thus the acceptance rate for this set was 28.3%. In addition to regular papers, the books feature invited papers by: – Jerry Hobbs, ISI, USA – Bing Liu, University of Illinois, USA – Suresh Manandhar, University of York, UK – Johanna D. Moore, University of Edinburgh, UK VI Preface Table 1. Numberof submissions and accepted papersby topic1 Accepted Submitted% Accepted Topic 19 45 42 Semantics,pragmatics, discourse 14 43 33 Lexical resources 12 31 39 Machinetranslation and multilingualism 12 33 36 Practical applications 12 35 34 Emotions, sentiment analysis, opinion mining 12 40 30 Clustering and categorization 12 56 21 Text mining 11 48 23 Information retrieval 10 29 34 Underresourcedlanguages 8 26 31 Syntaxand chunking 7 44 16 Information extraction 6 18 33 Social networks and microblogging 5 16 31 Naturallanguage generation 4 11 36 Noisy text processing and cleaning 4 16 25 Summarization 3 4 75 Spellingand grammar checking 3 9 33 Plagiarism detection 3 12 25 Word sense disambiguation 3 16 19 POS tagging 2 5 40 Coreference resolution 2 7 29 Computational terminology 2 7 29 Other 2 9 22 Textualentailment 2 13 15 Formalisms and knowledgerepresentation 2 17 12 Named entityrecognition 2 20 10 Morphology 1 6 17 Speechprocessing 1 10 10 Naturallanguage interfaces 1 11 9 Question answering 0 3 0 Computational humor 1 Asindicatedbytheauthors.Apapermaybelongtoseveraltopics. These speakers presented excellent keynote lectures at the conference. Publica- tion of full-text invited papers in the proceedings is a distinctive feature of the CICLing conferences. Furthermore, in addition to presentation of their invited papers, the keynote speakers organized separate vivid informal events; this is also a distinctive feature of this conference series. In addition, Professor Jens Allwood of the University of Gothenburg was a special guest of the conference. With this event we continued with our policy of giving preference to papers with verifiable and reproducible results. In addition to the verbal description of their findings given in the paper, we encouraged the authors to provide a proof of their claims in electronic form. If the paper claimed experimental re- sults, we asked the authors to make available to the community all the input data necessary to verify and reproduce these results: if it claimed to introduce Preface VII Table 2. Numberof submittedand accepted papersby countryor region Country Authors Papers2 Country Authors Papers2 orregion Subm. Subm. Accp. or region Subm. Subm. Accp. Afghanistan 1 1 – Japan 22 8.33 3 Algeria 2 0.67 – Jordan 12 3.33 – Australia 8 3 1 Kazakhstan 6 1.67 1.67 Bangladesh 9 3 – Korea (South) 12 3.5 0.50 Belgium 3 2 – Latvia 6 2 1 Brazil 18 6.17 2.17 Malaysia 4 1.67 – Bulgaria 1 1 – Mexico 19 12.42 2.67 Canada 13 7 4 Mongolia 1 0.5 0.5 China 57 21.1 7.35 Morocco 5 3 – Christmas Isl. 1 0.2 0.2 Nepal 12 6 2 Colombia 3 1 1 Norway 1 0.2 – Croatia 1 0.33 0.33 Pakistan 4 1.83 – Czech Rep. 20 11.4 3 Poland 2 2 – Denmark 3 0.38 – Portugal 5 2.5 1 Egypt 12 7 1 Romania 10 5.67 – Ethiopia 5 4 2 Russia 9 5.17 – Finland 5 2 2 Singapore 9 2.78 1.78 France 29 12.42 9.67 Slovenia 2 0.67 0.67 Germany 19 7.33 4.33 Spain 13 3.7 0.67 Greece 1 0.33 0.33 Sweden 5 4 1 HongKong 4 2 1 Switzerland 6 5 2 Hungary 3 1 – Taiwan 5 1 – India 136 75.1 10.33 Thailand 2 1 – Indonesia 3 1 – Tunisia 20 8.83 1.83 Iran 4 2 – Turkey 3 2.83 1.5 Iraq 0 1 – UK 10 3.83 3.33 Ireland 0 0.5 – USA 48 21.48 7.17 Israel 14 7 2 Vietnam 5 1.67 – Italy 6 2.5 1 Total: 639 300 85 2 By thenumberof authors: e.g., a paperby two authors from theUSA and one from UK is counted as 0.67 for theUSA and 0.33 for UK. an algorithm, we encouragedthe authors to make the algorithm itself, in a pro- gramming language, available to the public. This additional electronic material willbe permanentlystoredonthe CICLing’sserver,www.CICLing.org,andwill be available to the readers of the corresponding paper for download under a license that permits its free use for researchpurposes. Inthelongrun,weexpectthatcomputationallinguisticswillhaveverifiability and clarity standards similar to those of mathematics: In mathematics, each claim is accompanied by a complete and verifiable proof (usually much longer thantheclaimitself);eachtheorem’scompleteandpreciseproof—andnotjusta vaguedescriptionofits generalidea—ismade availabletothe reader.Electronic VIII Preface Fig.1. Submissions by country or region. The area of a circle represents the number of submitted papers. media allowcomputationallinguiststo providematerialanalogousto the proofs and formulas in mathematic in full length—which can amount to megabytes or gigabytesofdata—separatelyfroma12-pagedescriptionpublished inthe book. More information can be found on www.CICLing.org/why verify.htm. Toencourageprovidingalgorithmsanddataalongwiththepublishedpapers, we selected a winner of our Verifiability, Reproducibility, and Working Descrip- tionAward.Themainfactorsinchoosingtheawardedsubmissionweretechnical correctness and completeness, readability of the code and documentation, sim- plicity of installation and use, and exact correspondence to the claims of the paper.Unnecessarysophisticationofthe user interfacewasdiscouraged;novelty and usefulness of the results were not evaluated—instead, they were evaluated for the paper itself and not for the data. ThefollowingpapersreceivedtheBestPaperAwards,theBestStudentPaper Award,1 as well as the Verifiability, Reproducibility, and Working Description Award, respectively: 1st Place: “Agraph-basedautomaticplagiarismdetectiontechniquetohandle artificialwordreorderingandparaphrasing”,byNirajKumar,India 2nd Place: “Dealingwithfunctionwordsinunsuperviseddependencyparsing,” by David Mareˇcek, Zdenˇek Zˇabokrtsky´, Czech Republic 3rd Place: “Extended CFG formalism for grammar checker and parser devel- opment,”byDaigaDeksne,IngunaSkadina,RaivisSkadinˇs,Latvia , , and “How preprocessing affects unsupervised keyphrase extraction,”by Rui Wang, Wei Liu, Chris McDonald, Australia Student: “Iterative bilingual lexicon extraction from comparable corpora with topical and contextual knowledge,”by Chenhui Chu, Toshi- aki Nakazawa, Sadao Kurohashi, Japan 1 The best student paper was selected among papers of which the first author was a full-time student,excludingthepapers that received aBest PaperAward. Preface IX Verifiability: “How document properties affect document relatedness measures,” by Jessica Perrie, Aminul Islam, Evangelos Milios, Canada The authors of the awarded papers (except for the Verifiability award) were given extended time for their presentations. In addition, the Best Presentation Award and the Best PosterAward winners were selected by a ballot among the attendees of the conference. Besides its high scientific level, one of the success factors of CICLing confer- ences is their excellent cultural program. The attendees of the conference had a chance to visit the wonderful historical and cultural attractions of the lesser- knowncountry Nepal—the birthplace of the Buddha and the place where pago- das were invented before their spread to China and Japan to become an iconic image of East Asia. Of the world’s ten highest mountains, eight are in Nepal, including the highest one Everest; the participants had a chance to see Everest duringa tourofthe Himalayasona smallairplane.Theyalsoattendedthe Seto MachindraNath Chariot festival and visited three historical Durbar squares of the Kathmandu valley, a UNESCO world cultural heritage site. But probably the best of Nepal, after the Himalayas, are its buddhist and hindu temples and monasteries, of which the participants visited quite a few. Even the Organizing Committee secretary and author of one of the best evaluated papers published in this set was the hereditary Supreme Priest of an ancient Buddhist temple! Iwouldliketothankallthoseinvolvedintheorganizationofthisconference. Firstly, the authors of the papers that constitute this book: it is the excellence of their research work that gives value to the book and sense to the work of all the rest. I thank all those who served on the Program Committee, Software Reviewing Committee, Award Selection Committee, as well as the additional reviewers,for their hardand very professional work.Special thanks go to Push- pak Bhattacharyya, Samhaa El-Beltagy, Aminul Islam, Cerstin Mahlow, Dunja Mladenic, Constantin Orasan, and Grigori Sidorov for their invaluable support in the reviewing process. Iwouldliketothanktheconferencestaff,volunteers,andthemembersofthe local Organizing Committee headed by ProfessorMadhav PrasadPokharel and advised by Professor Jai Raj Awasthi. In particular, I am very grateful to Mr. SagunDhakhwa,the secretaryof the OrganizingCommittee, for his greateffort inplanningalltheaspectsoftheconference.IwanttothankMs.SaharaMishra for administrative support and Mr. Sushan Shrestha for the website design and technical support. I am deeply grateful to the administration of the Centre for CommunicationandDevelopmentStudies(CECODES)fortheirhelpfulsupport, warm hospitality, and in general for providing this wonderful opportunity to hold CICLing in Nepal. I acknowledge support from the project CONACYT Mexico—DST India 122030 “Answer Validation through Textual Entailment” and SIP-IPN grant 20144534. The entire submission and reviewing process was supported for free by the EasyChairsystem(www.EasyChair.org).Lastbutnotleast,I deeplyappreciate X Preface thepatienceandhelpofSpringerstaffineditingthesevolumesandgettingthem printed in very short time—it is always a great pleasure to work with Springer. February 2014 Alexander Gelbukh

Description:
This two-volume set, consisting of LNCS 8403 and LNCS 8404, constitutes the thoroughly refereed proceedings of the 14th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2014, held in Kathmandu, Nepal, in April 2014. The 85 revised papers presented togeth
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.