Workshop Programme 3rd Workshop on the Representation and Processing of Sign Languages: Construction and Exploitation of Sign Language Corpora 08:45 – 09:00 Workshop opening & welcome 09:00 – 09:30 Diane Lillo-Martin, Deborah Chen Pichler: Development of sign language acquisition corpora 09:30 – 10:00 Onno Crasborn, Inge Zwitserlood: The Corpus NGT: an online corpus for professionals and laymen 10:00 – 10:30 Trevor Johnston: Corpus linguistics & signed languages: no lemmata, no corpus. 10:30 – 11:00 Coffee break 11:00 – 11:30 Lorraine Leeson, Brian Nolan: Digital Deployment of the Signs of Ireland Corpus in Elearning 11:30 – 12:00 Johanna Mesch, Lars Wallin: Use of sign language materials in teaching 12:00 – 13:30 Poster session 1 13:30 – 14:30 Lunch 14:30 – 16:00 Poster session 2 16:00 – 16:30 Coffee break 16:30 – 17:00 Onno Crasborn: Open Access to Sign Language Corpora 17:00 – 17:30 Adam Schembri: British Sign Language Corpus Project: Open Access Archives and the Observer’s Paradox 17:30 – 18:00 Cat Fung H-M, Scholastica Lam, Felix Sze, Gladys Tang: Simultaneity vs. Sequentiality: Developing a transcription system of Hong Kong Sign Language acquisition data 18:00 – 18:45 General discussion 18:45 – 19:00 Workshop closing i Workshop Organisers Onno Crasborn, Radboud University Nijmegen, the Netherlands Eleni Efthimiou, Institute for Language and Speech Processing, Athens, Greece Thomas Hanke, University of Hamburg, Germany Ernst D. Thoutenhoofd, Virtual Knowledge Studio for the Humanities & Social Sciences, Amsterdam, the Netherlands Inge Zwitserlood, Radboud University Nijmegen, the Netherlands Programme Committee Penny Boyes Braem, Center for Sign Language Research, Basel, Switzerland Annelies Braffort, LIMSI/CNRS, Orsay, France Patrice Dalle, IRIT, Toulouse, France Evita Fotinea, Institute for Language and Speech Processing, Athens, Greece Jens Heßmann, University of Applied Sciences Magdeburg-Stendal, Germany Trevor Johnston, Macquarie University, Sydney, Australia Lorraine Leeson, Trinity College, Dublin, Ireland Adam Schembri, University College London, UK Graham Turner, Heriot-Watt University, Edinburgh, UK Meike Vaupel, University of Applied Sciences Zwickau, Germany Chiara Vettori, EURAC, Bolzano, Italy ii Table of Contents Title Page Patricia Álvarez Sánchez, Inmaculada C. Báez Montero, Ana Mª Fernández 9 Soneira: Linguistic, sociological and technical difficulties in the development of a Spanish Sign Language (LSE) corpus Louise de Beuzeville: Pointing and verb modification: the expression of semantic 13 roles in the Auslan Corpus Cat Fung H-M, Scholastica Lam, Joe Mak, Gladys Tang: Establishment of a 17 corpus of Hong Kong Sign Language acquisition data: from ELAN to CLAN Cat Fung H-M, Felix Sze, Scholastica Lam, Gladys Tang: Simultaneity vs. 22 Sequentiality: Developing a transcription system of Hong Kong Sign Language acquisition data Emilie Chételat-Pelé, Annelies Braffort, Jean Véronis: Annotation of Non 28 Manual Gestures: Eyebrow movement description Onno Crasborn: Open Access to Sign Language Corpora 33 Onno Crasborn, Han Sloetjes: Enhanced ELAN functionality for sign language 39 corpora Onno Crasborn, Inge Zwitserlood: The Corpus NGT: an online corpus for 44 professionals and laymen Philippe Dreuw, Hermann Ney: Towards Automatic Sign Language Annotation 50 for the ELAN Tool Paul Dudis, Kristin Mulrooney, Clifton Langdon, Cecily Whitworth: Annotating 54 Real-Space Depiction Eleni Efthimiou, Stavroula-Evita Fotinea: Annotation and Management of the 58 Greek Sign Language Corpus (GSLC) Thomas Hanke, Jakob Storz: iLex – A database tool integrating sign language 64 corpus linguistics and sign language lexicography Annika Herrmann: Sign language corpora and the problems with ELAN and the 68 ECHO annotation conventions Jens Heßmann, Meike Vaupel: Building up digital video resources for sign 74 language interpreter training Marek Hrúz, Pavel Campr, Miloš Železný: Semi-automatic Annotation of Sign 78 Language Corpora Trevor Johnston: Corpus linguistics & signed languages: no lemmata, no corpus 82 Jakub Kanis, Pavel Campr, Marek Hrúz, Zdeněk Krňoul, Miloš Železný: 88 Interactive HamNoSys Notation Editor for Signed Speech Annotation Lutz König, Susanne König, Reiner Konrad, Gabriele Langer: Corpus-based Sign 94 Dictionaries of Technical Terms – Dictionary Projects at the IDGS in Hamburg Markus Koskela, Jorma Laaksonen, Tommi Jantunen, Ritva Takkinen, Päivi 101 Rainò, Antti Raike: Content-based video analysis and access for Finnish Sign Language – a multidisciplinary research project Klaudia Krammer, Elisabeth Bergmeister, Silke Bornholdt, Franz Dotter, 105 Christian Hausch, Marlene Hilzensauer, Anita Pirker, Andrea Skant, Natalie Unterberger: The Klagenfurt lexicon database for sign languages as a web application: LedaSila, a free sign language database for international use iii Lorraine Leeson, Brian Nolan: Digital Deployment of the Signs of Ireland 112 Corpus in Elearning François Lefebvre-Albaret, Frederick Gianni, Patrice Dalle: Toward a computer- 123 aided sign segmentation Diane Lillo-Martin, Deborah Chen Pichler: Development of sign language 129 acquisition corpora Johanna Mesch, Lars Wallin: Use of sign language materials in teaching 134 Cédric Moreau, Bruno Mascret: LexiqueLSF 138 Yuji Nagashima, Mina Terauchi, Kaoru Nakazono: Construction of Japanese 141 Sign Language Dialogue Corpus: KOSIGN Victoria Nyst: Documenting an Endangered Language: Creating a Corpus of 145 Langue des Signes Malienne (CLaSiMa) Elena Antinoro Pizzuto, Isabella Chiari, Paolo Rossini: The Representation Issue 150 and its Multifaceted Aspects in Constructing Sign Language Corpora: Questions, Answers, Further Problems Siegmund Prillwitz, Thomas Hanke, Susanne König, Reiner Konrad, Gabriele 159 Langer, Arvid Schwarz: DGS corpus project – Development of a corpus based electronic dictionary German Sign Language / German Adam Schembri: British Sign Language Corpus Project: Open Access Archives 165 and the Observer’s Paradox Sandrine Schwartz: Tactile sign language corpora: capture and annotation issues 170 Jérémie Segouat, Annelies Braffort, Laurence Bolot, Annick Choisier, Michael 174 Filhol, Cyril Verrecchia: Building 3D French Sign Language lexicon Saori Tanaka, Yosuke Matsusaka, Kaoru Nakazono: Interface Development for 178 Computer Assisted Sign Language Learning: Compact Version of CASLL Inge Zwitserlood, Asli Özyürek, Pamela Perniss: Annotation of sign and gesture 185 cross-linguistically iv Author Index Álvarez Sánchez, Patricia 9 Leeson, Lorraine 112 Báez Montero, Inmaculada C. 9 Lefebvre-Albaret, François 123 Bergmeister, Elisabeth 105 Lillo-Martin, Diane 129 Beuzeville, Louise de 13 Mak, Joe 17 Bolot, Laurence 174 Mascret, Bruno 138 Bornholdt, Silke 105 Matsusaka, Yosuke 178 Braffort, Annelies 28, 174 Mesch, Johanna 134 Campr, Pavel 78, 88 Moreau, Cédric 138 Cat Fung, H-M 17, 22 Mulrooney, Kristin 54 Chen Pichler, Deborah 129 Nagashima, Yuji 141 Chételat-Pelé, Emilie 28 Nakazono, Kaoru 141, 178 Chiari, Isabella 150 Ney, Hermann 50 Choisier, Annick 174 Nolan, Brian 112 Crasborn, Onno 33, 39, 44 Nyst, Victoria 145 Dalle, Patrice 123 Özyürek, Asli 185 Dotter, Franz 105 Perniss, Pamela 185 Dreuw, Philippe 50 Pirker, Anita 105 Dudis, Paul 54 Pizzuto, Elena Antinoro 150 Efthimiou, Eleni 58 Prillwitz, Siegmund 159 Fernández Soneira, Ana Maria 9 Raike, Antti 101 Filhol, Michael 174 Rainò, Päivi 101 Fotinea, Stavroula-Evita 58 Rossini, Paolo 150 Gianni, Frederick 123 Schembri, Adam 165 Hanke, Thomas 64, 159 Schwartz, Sandrine 170 Hausch, Christian 105 Schwarz, Arvid 159 Herrmann, Annika 68 Segouat, Jérémie 174 Heßmann, Jens 74 Skant, Andrea 105 Hilzensauerm Marlene 105 Sloetjes, Han 39 Hrúz, Marek 78, 88 Storz, Jakob 64 Jantunen, Tommi 101 Sze, Felix 22 Johnston, Trevor 82 Takkinen, Ritva 101 Kanis, Jakub 88 Tanaka, Saori 178 König, Lutz 94 Tang, Gladys 17, 22 König, Susanne 94, 159 Terauchi, Mina 141 Konrad, Reiner 94, 159 Unterberger, Natalie 105 Koskela, Markus 101 Vaupel, Meike 74 Krammer, Claudia 105 Véronis, Jean 28 Krňoul, Zdeněk 88 Verrecchia, Cyril 174 Laakson, Jorma 101 Wallin, Lars 134 Lam, Scholastica 17, 22 Whitworth, Cecily 54 Langdon, Clifton 54 Železný, Miloš 78, 88 Langer, Gabriele 94, 159 Zwitserlood, Inge 44, 185 v Editors’ Preface This collection of papers stems from the third workshop in a series on “the representation and processing of sign languages”. The first took place in 2004 (Lisbon, Portugal), the second in 2006 (Genova, Italy). All workshops were tied to Language Resources and Evaluation Conferences (LREC), the 2008 one taking place in Marrakech, Morocco. While there has been occasional attention for signed languages in the main LREC conference, the main focus there is on written and spoken forms of spoken languages. The wide field of language technology has been the focus of the LREC conferences, where academic and commercial research and applications meet. It will be clear to every researcher that there is a wide gap between our knowledge of spoken versus signed languages. This holds not only for language technology, where difference in modality and the absence of commonly used writing systems for signed languages obviously pose new challenges, but also for the linguistic knowledge that can be used in language technologies. The domains addressed in the two previous sign language workshops have thus been fairly wide, and we see the same variety in the present proceedings volume. However, where the first and the second workshop had a strong focus on sign synthesis and automatic recognition, the theme of the third workshop concerns construction and exploitation of sign language corpora. Recent technological developments allow sign language researchers to create relatively large video corpora of sign language use that were unimaginable ten years ago. Several national projects are currently underway, and more are planned. In the present volume, sign language linguistics researchers and researchers from the area of sign language technologies share their experiences from completed and ongoing efforts: what are the technical problems that were encountered and the solutions created, what are the linguistic decisions that were taken? At the same time, the contributions also look into the future. How can we establish standards for linguistic tagging and metadata, and how can we add sign language specifics to well-established or emerging best practices from the general language resource community? How can we work towards (semi-) automatic annotation by computer recognition from video? These are all questions of interest to both linguists and language technology experts: the sign language corpora that are being created are needed for more reliable linguistic analyses, for studies on sociolinguistic variation, and for building tools that can recognize sign language use from video or generate animations of sign language use. The contributions composing this volume are presented in alphabetical order by the first author. For the reader’s convenience, an author index is provided as well. We would like to thank the programme committee that helped us reviewing the abstracts for the workshop: Penny Boyes Braem; Annelies Braffort; Patrice Dalle; Evita Fotinea; Jens Heßmann; Trevor Johnston; Lorraine Leeson; Adam Schembri; Graham Turner; Meike Vaupel; Chiara Vettori Finally, we would like to point the reader to the proceedings of the previous two workshops, which form important resources in a growing field of research; both works were made available as PDF files for participants of the workshop. O. Streiter & C. Vettori (2004, Eds.) From SignWriting to Image Processing. Information techniques and their implications for teaching, documentation and communication. [Proceedings vi of the Workshop on the Representation and Processing of Sign Languages. 4th International Conference on Language Resources and Evaluation, LREC 2004, Lisbon.] Paris: ELRA. C. Vettori (2006, Ed.) Lexicographic Matters and Didactic Scenarios. [Proceedings of the 2nd Workshop on the Representation and Processing of Sign Languages. 5th International Conference on Language Resources and Evaluation, LREC 2006, Genova.] Paris: ELRA. We hope the present volume will stimulate further research by making the presentations accessible for those who could not attend the workshop. The Editors, Onno Crasborn, Radboud University Nijmegen (NL) Eleni Efthimiou, Institute for Language and Speech Processing (GR) Thomas Hanke, University of Hamburg (DE) Ernst D. Thoutenhoofd, Virtual Knowledge Studio for the Humanities & Social Sciences (NL) Inge Zwitserlood, Radboud University Nijmegen (NL) vii Workshop Papers 3rd Workshop on the Representation and Processing of Sign Languages Linguistic, Sociological and Technical Difficulties in the Development of a Spanish Sign Language (LSE) Corpus Patricia Álvarez Sánchez, Inmaculada C. Báez Montero, Ana Fernández Soneira Universidad de Vigo – Research Group on Sign Languages1 Lagoas-Marcosende (36310) Vigo [email protected], [email protected], [email protected] Abstract The creation of a Spanish Sign Language corpus has been, since 1995 until 2000, one of the main aims of our Sign Languages Research Group at the University of Vigo. This research has the aim of helping us in the description of LSE and developing tools for research: labeling, transcription, etc. We obtained language samples from 85 informants whose analysis raised several difficulties, both technical and sociolinguistic. At this stage, with renewed energy, we have taken up again our initial aims, crossing the technical, linguistic and sociological obstacles that had hindered our proposal to reach its end. In our panel we will present, apart from the difficulties that we have encountered, the new proposals for solving and overcoming them, thus, finally reaching our initial aim: to develop a public Spanish Sign Language corpus that could be consulted online. We will go into details with the criteria of versatility and representativity which condition the technical aspects; the sociolinguistic criteria for selecting type of discourses and informants; the labels for marking the corpus and the utilities that we pretend to give the corpus, not only centered in the use of linguistic data for the quantitative and qualitative research of the LSE, but also centered in the use for teaching. 1. General Approach exploitation of this type of linguistic resource.” (A, Martí, The study of LSE should not be dealt with in a different 1999) manner to that of any other oral language. It will be 2.1. Aims mandatory to have a textual corpus. The production of a sign language has a kinetic nature. Its reception is visual, Our work was focused on obtaining a LSE textual corpus of so the conversations in sign language have to be registered Galician signers from which to start the research on LSE. in video formats. These were our initial researching aims: Our contribution to the congress, in the form of a panel, is a) Starting the description of LSE divided into three sections that correspond with the three b) Determining which are the relevant linguistic units in SL stages of the development of our corpus. Each step is c) Knowing the grammatical relational processes marked by a general reflection. d) Developing tools for research: labeling, transcription, etc The first stage covers our group work from 1995 until 2000 and it represents the beginning of the process. We will 2.2. Corpus features present subsequently, the aims set, the steps made for the We considered these the main features for creating a actual conception of the corpus and the difficulties corpus: encountered. - It must contain real data The second phase goes from 2000 till 2007. It was stressed - It must constitute an irreplaceable basis for linguistic by an analysis process of the work done, and description reconsiderations on our basis due to the problems at the - It must be completed with computing support in order to first stage. We will here present the data obtained and the make easy its use. new goals that we set. - It must gather: The third and last stage corresponds with the present time. a) Informants data It is the time of showing our advances and the decisions b) Different types of discourse samples made on the linguistic, sociolinguistic and technical sides. c) Wide range of topics depending on the type of discourse we want to obtain, etc. 2. Initial Work - It must be transcribed in Spanish glossas (conventions “Linguistic corpora have come to fill a privileged position adapted from Klima & Bellugi, 1979) and subtitled in because they constitute a valuable source of information written language. for the creation of dictionaries, computational lexicon and grammars. (…) As a result, a new discipline appears: 2.3. Process stages CORPUS LINGUISTICS, aimed at the processing and We have divided into seven stages the process of creating our corpus: 1 http://webs.uvigo.es/lenguadesignos/sordos 9 3rd Workshop on the Representation and Processing of Sign Languages 6% a) Tool design for the creation of a corpus 16% b) Criteria for the selection of informants 21-34 years c) Creation of a database of informants’ details 35-50 years d) Collection of language samples > 50 years e) Data storage 78% f) Data labeling and marking g) Transcription and notation systems 2.4. Difficulties in the process Figure 1: Distribution of informants by age group. The difficulties that aroused throughout the research Distribution of informants by age group: process are: From 21 to 35 years: 25 a) The lack of a research tradition on Sign Languages in From 36 to 50 years: 5 Corpus Linguistics forces us to solve problems from the Over 50 years: 2 very beginning: Total: 32 interviews. - How to delimit units in sign languages. - How to label the different formations for their later 5% analysis. 31% Guided monologue - Other related issues. b) Creation of social networks in the Deaf community with Semiguided interview the aim of avoiding the social identity of our informants to Public discourse be threatened. 64% c) Technical restrictions. We have to select appropriate Figure 2: Distribution of language samples by gender types. material in order to avoid problems in compatibility between the different devices (video cameras. video player, Distribution by gender types: computers, software…) Guided monologue - 23 minutes The signer is asked for a description of his family, his house 3. Analysis and Reconsiderations and a short anecdote. “(…) the paradox exists that once a system is available for Semiguided interviews - 271 minutes its use, its technology becomes obsolete with regard to the Signers are interviewed on several topics, depending on one that is operative at that moment and in many cases, it their age, sex, preferences, etc. Thus, the discourse is more must be reprogrammed” (A, Martí, 1999) spontaneous. After these first steps, it was time to analyse the gathered Public discourse - 130 minutes. data. For this purpose, we created a database of informants Conferences and round tables give us a more programmed which we are going to present now. and formal style. 3.1. Where did we collect our data? 3.2. Reconsiderations We have developed an interview filing card with the After the research, we had to reconsiderate certain issues purpose of ascertaining the social and linguistic profile of for a better development of our corpus. We will now sum the Galician deaf people that were later registered in these up: videotapes. a) Revision of the projects carried out in other countries. This is the data gathered from our 85 informants: b) Creation of social networks: a) Identification: name, address and phone (for future Inside the Deaf community: contacts); Preparation of the members of the community for the b) Origin and social environment: place and date of birth, carrying out of the interviews age of deafness occurrence, deafness degree, deaf/hearing In the institutions: family, job of closest family members; Participation in national networks for research in order c) School: degree and type of studies, special/ordinary to contact with the Deaf community all over Spain. school, use/absence of SL in school; Support of the LSE Standardization Center in the d) Linguistic skills: in LSE, oral Spanish, lip-reading, creation of the corpus. written Spanish; e) Place of residence: in order to reflect and control 4. For the time being linguistic variation. “If our research manages to correct mistaken or unsuitable information, we will have made a good service to linguistics; however, this type of study usually needs for certain knowledge and experiences that do not correspond with the young researcher. (López Morales, 1994, 25)” At this stage, with renewed energy, we have taken up again our initial aims, crossing the technical, linguistic and 10
Description: