ebook img

Human Language Technologies — the Baltic Perspective: Proceedings of the Fifth International Conference Baltic HLT 2012 PDF

312 Pages·2012·3.944 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Human Language Technologies — the Baltic Perspective: Proceedings of the Fifth International Conference Baltic HLT 2012

HUMAN LANGUAGE TECHNOLOGIES THE BALTIC PERSPECTIVE Frontiers in Artificial Intelligence and Applications FAIA covers all aspects of theoretical and applied artificial intelligence research in the form of monographs, doctoral dissertations, textbooks, handbooks and proceedings volumes. The FAIA series contains several sub-series, including “Information Modelling and Knowledge Bases” and “Knowledge-Based Intelligent Engineering Systems”. It also includes the biennial ECAI, the European Conference on Artificial Intelligence, proceedings volumes, and other ECCAI – the European Coordinating Committee on Artificial Intelligence – sponsored publications. An editorial panel of internationally well-known scholars is appointed to provide a high quality selection. Series Editors: J. Breuker, N. Guarino, J.N. Kok, J. Liu, R. López de Mántaras, R. Mizoguchi, M. Musen, S.K. Pal and N. Zhong Volume 247 Recently published in this series Vol. 246. H. Fujita and R. Revetria (Eds.), New Trends in Software Methodologies, Tools and Techniques – Proceedings of the Eleventh SoMeT_12 Vol. 245. B. Verheij, S. Szeider and S. Woltran (Eds.), Computational Models of Argument – Proceedings of COMMA 2012 Vol. 244. S. Scheider, Grounding Geographic Information in Perceptual Operations Vol. 243. M. Graña, C. Toro, J. Posada, R.J. Howlett and L.C. Jain (Eds.), Advances in Knowledge-Based and Intelligent Information and Engineering Systems Vol. 242. L. De Raedt, C. Bessiere, D. Dubois, P. Doherty, P. Frasconi, F. Heintz and P. Lucas (Eds.), ECAI 2012 – 20th European Conference on Artificial Intelligence Vol. 241. K. Kersting and M. Toussaint (Eds.), STAIRS 2012 – Proceedings of the Sixth Starting AI Researchers’ Symposium Vol. 240. M. Virvou and S. Matsuura (Eds.), Knowledge-Based Software Engineering – Proceedings of the Tenth Joint Conference on Knowledge-Based Software Engineering Vol. 239. M. Donnelly and G. Guizzardi (Eds.), Formal Ontology in Information Systems – Proceedings of the Seventh International Conference (FOIS 2012) Vol. 238. A. Respício and F. Burstein (Eds.), Fusing Decision Support Systems into the Fabric of the Context Vol. 237. J. Henno, Y. Kiyoki, T. Tokuda, H. Jaakkola and N. Yoshida (Eds.), Information Modelling and Knowledge Bases XXIII Vol. 236. M.A. Biasiotti and S. Faro (Eds.), From Information to Knowledge – Online Access to Legal Information: Methodologies, Trends and Perspectives ISSN 0922-6389 (print) ISSN 1879-8314 (online) Human Language Technologies The Baltic Perspective Proceedings of the Fifth International Conference Baltic HLT 2012 Edited by Arvi Tavast Institute of the Estonian Language Kadri Muischnek University of Tartu and Mare Koit University of Tartu Amsterdam • Berlin • Tokyo • Washington, DC © 2012 The Authors and IOS Press. All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without prior written permission from the publisher. ISBN 978-1-61499-132-8 (print) ISBN 978-1-61499-133-5 (online) Library of Congress Control Number: 2012947888 Publisher IOS Press BV Nieuwe Hemweg 6B 1013 BG Amsterdam Netherlands fax: +31 20 687 0019 e-mail: [email protected] Distributor in the USA and Canada IOS Press, Inc. 4502 Rachael Manor Drive Fairfax, VA 22032 USA fax: +1 703 323 3668 e-mail: [email protected] LEGAL NOTICE The publisher is not responsible for the use which might be made of the following information. PRINTED IN THE NETHERLANDS Human Language Technologies – The Baltic Perspective v A. Tavast et al. (Eds.) IOS Press, 2012 © 2012 The Authors and IOS Press. Preface This volume contains papers presented at the Fifth International Conference “Human Language Technologies – The Baltic Perspective” (Baltic HLT 2012), held in Tartu, Estonia on 4–5 October 2012. Since its first edition in 2004, Baltic HLT has served as a special venue for new and ongoing work in computational linguistics and related disciplines in the Baltic states as well as in a broader geographical perspective. The main aim of this conference is to provide a forum for the sharing of new ideas and recent advances in human language processing and to promote cooperation be- tween the research communities of computer science and linguistics from the Baltic countries and the rest of the world. The conference brings together scientists, develop- ers, providers and users to discuss state-of-the-art of human language technologies in the Baltic countries, to exchange information and to discuss problems, to find new syn- ergies and to promote initiatives for international cooperation. The call for papers for the fifth Baltic HLT laid special emphasis on multilinguali- ty in language resources and on applications of human language technology, while also encouraging the potential authors to submit papers on other subfields of computational linguistics and related disciplines. 51 submissions were received; each submission was evaluated by at least two re- viewers. The Programme Committee consisted of 25 members from 13 different countries. Based on their scores and the comments they provided on the content and quality of the papers, 20 long papers and 20 posters or demos were accepted for presentation and publication. The accepted submissions cover a wide range of topics: morphological disambigu- ation, dependency syntax and valency, computational semantics, named entities, dialo- gue modeling, terminology extraction and management, machine translation, corpus and parallel corpus compiling, speech modeling and multimodal communication. A few papers give a general overview of the state of the art of the human language tech- nology and/or language resources in the Baltic states. Completing the programme are the invited lectures by Lori Lamel “Multilingual Speech Processing Activities in Quaero: application to multimedia search in unstruc- tured data” and Bente Maegaard “A Multilingual Research Infrastructure”. We wish to express our gratitude to the members of the Programme Committee who worked hard to review all submissions. We also want to thank the organizers and supporters of this conference: Institute of Computer Science, University of Tartu and Estonian Ministry of Education and Re- search as funder of National Programme for Estonian Language Technology. The con- ference is also supported by the CLARIN and META-NORD projects. Arvi Tavast Kadri Muischnek Kadri Vider Mare Koit vi The publication of these Proceedings was supported by the European Regional Devel- opment Fund through the Estonian Center of Excellence in Computer Science, EXCS. vii Programme Committee Arvi Tavast (chair), Institute of the Estonian Language, Estonia Tanel Alumäe, Institute of Cybernetics, Tallinn University of Technology, Estonia Eckhard Bick, University of Southern Denmark, Denmark Koenraad De Smedt, University of Bergen, Norway Mark Fishel, University of Zurich, Switzerland Markus Forsberg, University of Gothenburg, Sweden Gintarė Grigonytė, Vytautas Magnus University, Lithuania Karin Harbusch, University of Koblenz-Landau, Germany Heiki-Jaan Kaalep, University of Tartu, Estonia Kaarel Kaljurand, University of Zurich, Switzerland Adam Kilgarriff, Lexical Computing Ltd, UK Ramón López-Cózar Delgado, University of Granada, Spain Bente Maegaard, University of Copenhagen, Denmark Rūta Marcinkevičienė, Vytautas Magnus University, Lithuania Beata Megyesi, Uppsala University, Sweden Costanza Navarretta, University of Copenhagen, Denmark Joakim Nivre, Uppsala University, Sweden Bolette Sandford Pedersen, University of Copenhagen, Denmark Fabio Rinaldi, University of Zurich, Switzerland Eiríkur Rögnvaldsson, University of Iceland, Iceland Inguna Skadiņa, IMCS/Tilde, Latvia Jörg Tiedemann, Uppsala University, Sweden Martin Volk, University of Zurich, Switzerland Peter Wittenburg, Max Planck Institute for Psycholinguistics, Netherlands Anssi Yli-Jyrä, University of Helsinki, Finland This page intentionally left blank ix Contents Preface v Arvi Tavast, Kadri Muischnek, Kadri Vider and Mare Koit Programme Committee vii Multilingual Speech Processing Activities in Quaero: Application to Multimedia Search in Unstructured Data 1 Lori Lamel A Multilingual Research Infrastructure 9 Bente Maegaard Transcription System for Semi-Spontaneous Estonian Speech 10 Tanel Alumäe Towards the Automatic Extraction of Term-Defining Contexts in Lithuanian 18 Agnė Bielinskienė, Loïc Boizou, Jolanta Kovalevskaitė and Andrius Utka Automatic Inference of Base Forms for Multiword Terms in Lithuanian 27 Loïc Boizou, Gintarė Grigonytė, Erika Rimkutė and Andrius Utka Data Pre-Processing to Train a Better Lithuanian-English MT System 36 Daiga Deksne and Raivis Skadiņš Perception of Russian Vowels in Singing 42 Karina Evgrafova and Vera Evdokimova In-Domain Data FTW 50 Mark Fishel Improving SMT by Using Parallel Data of a Closely Related Language 58 Petra Galuščáková and Ondřej Bojar Terminology Extraction from Comparable Corpora for Latvian 66 Tatiana Gornostay, Anita Ramm, Ulrich Heid, Emmanuel Morin, Rima Harastani and Emmanuel Planas Change of Biomedical Domain Terminology Over Time 74 Gintarė Grigonytė, Fabio Rinaldi and Martin Volk A Trivial Method for Choosing the Right Lemma 82 Heiki-Jaan Kaalep, Riin Kirt and Kadri Muischnek Geoinformational Database of Lithuanian Toponyms 90 Dalia Kačinaitė-Vrubliauskienė Cross-Linking Experience of Estonian WordNet 96 Neeme Kahusk, Heili Orav and Kadri Vare x Automatic Generation of Specialized Dictionaries Using the Dictionary Writing System EELex 103 Jelena Kallas and Margit Langemets Managing Word Form Variation of Text Retrieval in Practice – Why Five Character Truncation Takes It All? 111 Kimmo Kettunen Towards Automatic Recognition of the Structure of Estonian Directory Inquiries 120 Mare Koit Adaptation of Morpheme-Based Speech Recognition for Foreign Entity Names 129 André Mansikkaniemi and Mikko Kurimo Towards Audiovisual TTS in Estonian 138 Einar Meister, Sascha Fagel and Rainer Metsvahi Multimodal Corpus of Speech Production: Work in Progress 146 Einar Meister and Lya Meister Towards a Latvian Valency Lexicon 154 Gunta Nešpore, Baiba Saulīte, Normunds Grūzītis and Ginta Garkāje Creation of HMM-Based Speech Model for Estonian Text-to-Speech Synthesis 162 Tõnis Nurk Towards Named Entity Annotation of Latvian National Library Corpus 169 Peteris Paikens, Ilze Auzina, Ginta Garkaje and Madara Paegle MT Adaptation for Under-Resourced Domains – What Works and What Not 176 Mārcis Pinnis and Raivis Skadiņš Syntactic Issues Identified Developing the Latvian Treebank 185 Lauma Pretkalniņa and Laura Rituma How Does the Choice of Morphological Analyser Influence the Quality of Syntactical Analysis? 193 Tiina Puolakainen Knowledge Acquisition Tool for Dialogue Systems 201 Raul Sirel Dynamic User Interfaces for Synchronous Encoding and Linguistic Uniforming of Textual Clinical Data 206 Raul Sirel Noisy-Channel Spelling Correction Models for Estonian Learner Language Corpus Lemmatisation 213 Kairit Sirts Linguistically Motivated Evaluation of English-Latvian Statistical Machine Translation 221 Inguna Skadiņa, Kristīne Levāne-Petrova and Guna Rābante

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.