Connectionist Natural Language Processing Readings from Conneotion Soienoe Edited by Noel Sharkey University of Exeter SPRINGER-SCIENCE+BUSINESS MEDIA, B.V. This compilation's Copyright © 1992 Springer Science+Business Media Dordrecht Originally published by Kluwer Academic Publishers in 1992 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without written permission. Consulting editor: Masoud Yazdani Cover design: Mark Lewis British Library Cataloguing in Publication Data Sharkey, N. E. (Noel E) Connectionist natural language processing I. Title 418 Library of Congress Cataloging-in-Publication Data Connectionist natural language processing: readings from Connection science / edited by Noel Sharkey p. cm. Includes index ISBN 978-94-010-5160-6 ISBN 978-94-011-2624-3 (eBook) DOI 10.1007/978-94-011-2624-3 1. Natural language processing (Computer science) 2. Connection machines. I. Sharkey, N. E. (Noel E.) II. Connection science. QA76.9.N38C66 1992 006.3'S--dc20 91-39077 iii Contents Preface v Dedication v Introduction VI 1 Connectionism and Cognitive Linguistics 1 Catherine L Harris 2 A Connectionist Model of Motion and Government on 28 Chomsky's Government-binding Theory John Rager & George Berg 3 Syntactic Transformations on Distributed 46 Represen tations David J Chalmers 4 Syntactic Neural Networks 56 S M Lucas & R I Damper 5 Incremental Syntactic Tree Formation in Human 83 Sentence Processing: a Cognitive Architecture Based on Activation Decay and Simulated Annealing Gerard Kempen & Theo Vosse 6 A Hybrid Symbolic/Connectionist Model for Noun 101 Phrase Understanding Stefan Wermter & Wendy G Lehnert 7 Connectionism and Determinism in a Syntactic 119 Parser Stan C Kwasny & Kanaan A Faisal 8 A Single Layer Higher Order Neural Net and its 139 Application to Context Free Grammar Recognition Peter J "yard & Charles Nightingale iv Contents 9 Connectionist Language Users 163 Robert B Allen 10 Script Recognition with Hierarchical Feature Maps 196 Risto Miikkulainen 11 Learning Distributed Representations of Conceptual 215 Knowledge and their Application to Script-based Story Processing Guenbee Lee. Margot Flowers & Michael Dyer 12 A Hybrid Model of Script Generation: or Getting the 24B Best from Both Worlds Suzanne M Mannes & Stephanie M Doane 13 Identification of Topical Entities in Discourse: 275 a Connectionist Approach to Attentional Mechanisms in Language Lorraine F R Karen 14 The Role of Similarity in Hungarian Vowel Harmony: 295 a Connectionist Account Mary Hare 15 Representation and Recognition of Temporal Patterns 323 Robert F Port 16 Networks that Learn about Phonological Feature 349 Persistence Michael Gasser & Chan-Do Lee 17 Pronunciation of Digit Sequences in Text-to-Speech 363 Systems W A Ainsworth & N P Warren Index 372 v Preface This is a book of readings published in the journal Connection Science between 1989 and 1991. In these first years in the life of the journal we received a lot of papers on Natural Language Processing and Cognition. All of the papers have gone through the normal rigorous journal reviewing process and thus represent much of the state of the art. There are seventeen papers in all. Eight of these are from a special issue on natural language and I have included the editorial from that issue as a general introduction. It describes the rapid rise of the subject and provides an historical bibliography up to 1990. The book is laid out roughly in the traditional categories of language research starting with syntax and moving through question answering to knowledge application and speech processing. I would like to thank the editorial board for the Natural Language special issue for all of their efforts (listed on the page after the editorial). I would also like to thank Jim Hendler who edited the special issue on Hybrid Systems (where two of the papers appeared), Paul Day who helped with the journal generally and has written the book index, and David Green at Carfax who has been an inspiration and, of course, all of the anonymous referees who worked hard for little reward. Finally, I must acknowledge the industrious Lyn Shackleton (editorial assistant) without whom the whole enterprise would have been a much more laborious task. Noel Sharkey Dedication To my four wonderful aunts: Madge Lundy, Thelma Pringle, Eileen Burns, and Alice Murray for being there. vi Introduction Connection Science and Natural Language: an Emerging Discipline NOEL E. SHARKEY The journal Connection Science is pleased to present this special issue on Connection ist Natural Language Processing (CNLP) to mark the coming of age of this new approach to natural language. CNLP has really only taken off in the last five years.! Before that, very little CNLP research was actually published. Connectionist parsing got under way with the 10calist work of Small et al. (1982), and work on distributed propositional representations in semantic memory was started by Hinton (1981). The Hinton paper was very influential in pointing to issues on representation that were to be the motivation for later research (e.g. distributed v. localist representations, classical v. uniquely connectionist representation, type/token v. part/whole hierar chies). However, it was not until 1985 that CNLP began to emerge as a field of enquiry in its own right. That year saw three papers on parsing using quite different techniques: Fanty (1985) employed localist techniques to context free parsing; Selman (1985) utilized Boltzmann machine ideas for syntactic parsing; and Waltz & Pollack (1985) presented the first hybrid system with a connectionist semantic net fronted by a 'symbolic' chart parser. Cottrell's (1985) thesis research, on word sense disambig uation, also explored the use of connectionist syntactic constraints. The following year began the line of research inspired by AI theories of Natural Language Understanding (e.g. Golden, 1986; Lehnert, 1986; Sharkey et aI., 1986). This was followed closely by the publication of the highly influential two volume PDP books edited by Rumelhart & McClelland (1986a). These volumes contained a number of papers relating to aspects of natural language processing such as case assignment (McClelland & Kawamoto, 1986); learning the past tense of verbs (Rumelhart & McClelland, 1986b); and reading (McClelland, 1986). Moreover, the two volumes expanded on some of the representational issues discussed earlier by Hinton (1981). Since 1986, many more CNLP papers have appeared than is possible to mention here. Among these was further work on the application of world knowledge on language understanding (e.g. Dolan & Dyer, 1987; Chun & Mimo, 1987; Sharkey, 1989a; Miikkulainen & Dyer, 1987); and further research on various aspects of syntax and parsing (e.g. Hanson & Kegl, 1987; Howells, 1988; Benello et al., 1989). In Introduction vii addition, we have begun to see a marked increase in the number of topics explored in CNLP: phrase generation (e.g. Kukich, 1987; Gasser, 1988), qnestion answering (Allen, 1988), prepositional attachment (e.g. Cosic & Munro, 1988; Sharkey, 1990; Wermter & Lehnert, 1990; St John & McClelland, 1990), anaphora (Allen & Riecken, 1988), goals and plans (Sharkey, 1988), inference (Lange & Dyer, 1989), variable binding (Smolensky, 1987), and lexical processing (Kawamoto, 1989; Sharkey, 1989b). Perhaps the biggest boost to CNLP research came unintentionally from a critique of the field by Fodor & Pylyshyn (1988). Their aim was to do the same sort of 'hatchet job' on connectionist language research as Chomsky had done on behaviourist language research in the 1950s. However, this time the criticisms have prompted an industrious research campaign to show that unique connectionist representations have the proper ties necessary to represent natural language in terms of functional compositionality (van Gelder, 1990); an ability to encode temporal structures (Elman, 1989) and an ability to encode distributed recursive representations (Pollack, 1990; Smolensky, 1990). It is clear to those who work in CNLP that the area is expanding rapidly, both in terms of theory and applications. This is an exciting area, although it is difficult to keep abreast of the most recent work because it is often published in obscure conference proceedings. The number of submissions we received for this special issue shows that the field is very healthy, and some of the best recent work is contained herein. Nonetheless, we would like to see much more CNLP research published in Connection Science. The competition is tough but we wholeheartedly welcome research papers on any area of CNLP. We would particularly like to see more research and discussion on some of the new representational issues on which the fate of CNLP may rest in the nineties. Note 1. This is not counting research on word recognition. References Allen, R.B. (1988) Sequential connectionist networks for answering simple questions abou microworld. Proceedings of the 10th Annual Conference of the Cognitive Society, Montreal. Allen, R.B. & Riecken, M.E. (1988) Anaphora and reference in connectionist language users. International Computer Science Conference, Hong Kong. Benello, J., Mackie, A.W. & Anderson, J.A. (1989) Syntactic category disambiguation with neural networks. Computer Speech and Language, 3, 203-217. Chun, H.W. & Mimo, A. (1987) A model of schema selection using marker parsing and connectionist spreading activation. Proceedings of the 9th Annual Conference of the Cognitive Science Society, Seattle, WA, pp. 887-896. Cosic, C. & Munro, P. (1988) Learning to represent and understand locative prepositional phrases. TR LISOO2/IS88002, School of Library and Information Service, University of Pittsburgh, PA. Cottrell, G.W. (1985) A connectionist approach to word sense disambiguation. PhD thesis, TR154, Department of Computer Science, University of Rochester, NY. Dolan, C.P. & Dyer, M.G. (1987) Symbolic schemata, role binding and the evolution of structure in connectionist memories. IEEE First International Conference on Neural Networks, San Diego, 21-24 June, II, pp. 287-298. Elman, J.L. (1989) Representation and structure in connectionist models. TR 8903, CRL, University of California, San Diego, CA. Fanty, M. (1985) Context-free parsing in connectionist networks. University of Rochester, NY, Department of Computer-Science, Technical Report, TR-174. Fodor, J.A. & Pylyshyn, Z.W. (1988) Connectionism and cognitive architecture: a critical analysis. Cognition, 28, 2-71. viii Introduction Gasser, M.E. (1988) A connectionist model of sequence generation in afirst and second language. TR UCLA AI-88-13, AI Lab, Computer Science Deptartment, UCLA, July. Gelder, T., van (1990) Compositionality: a connectionist variation on a classical theme. Cognitive Science, 14. Golden, R.M. (1986) Representing causal schemata in connectionist systems. Proceedings of the 8th Annual Conference of the Cognitive Science Society, pp. 13-21. Hanson, S.J. & Kegl, J. (1987) PARSNIP: a connectionist network that learns natural language grammar from exposure to natural language sentences. Proceedings of the 9th Annual Conference of the Cognitive Science Society, Seattle, WA, pp. 106-119. Hinton, G.E. (1981) Implementing semantic networks in parallel hardware. In G. E. Hinton & J. A. Anderson (Eds) Parallel Models of Associative Memory. Hillsdale, NJ: Lawrence Erlbaum. Howells, T. (1988) VITAL, a connectionist parser. Proceedings of the 10th Annual Conference of the Cognitive Science Society, Montreal. Kukich, K. (1987) Where do phrases come from: some preliminary experiments in connectionist phrase generation. In G. Kempem (Ed.) Natural Language Generation: New Results from Artificial Intelligence, Psychology and Linguistics. Dordrecht: Kluwer Academic, pp. 405-421. Lange, T.E. & Dyer, M.G. (1989) High-level inferencing in a connectionist network. Connection Science, 1, 181-217. Lehnen, W.G. (1986) Possible implications of connectionism. Theoretical Issues in Natural Language Processing. University of Mexico, pp. 78-83. Kawamoto, A.H. (1989) Distributed representations of ambiguous words and their resolution in a connectionist network. In S. L. Small, G. W. Cottrell & M. K. Tnanhaus (Eds) Lexical Ambiguity Resolution. San Mateo, CA: Morgan Kaufmann. McClelland, J.L. (1986) Parallel distributed processing and role assigning constraints. Theoretical Issues in Natural Language Processing, University of New Mexico, pp. 72-77. McClelland, J.L. & Kawamoto, A.H. (1986) Mechanisms of sentence processing: assigning roles to constituents. In J. L. McLelland & D. E. Rumelhan (Eds) Parallel Distributed Processing, Vol. 2. Cambridge, MA: MIT Press. Miikkulainen, R & Dyer, M.G. (1987) Building distributed representations without microf eatures. Technical Repon UCLA-AI-87-17, AI Laboratory, Computer Science Depanment, University of California at Los Angeles, CA. Pollack, J.B. (1990) Recursive distributed representations. Artificial Intelligence (in press). Rumelhan, D.E. & McClelland, J.L. (Eds) (1986a) Parallel Distributed Processing, Vols. 1 & 2. Cambridge, MA: MIT Press. Rumelhan, D.E. & McClelland, J.L. (1986b) On learning the past tense of verbs. In D. E. Rumelhan & J. L. McClelland (Eds) Parallel Distributed Processing, Vol. 2, Pyschological and Biological Models. Cambridge, MA: MIT Press, pp. 216-271. St John, M.F. & McClelland, J.L. (1990) Learning and applying contextual constraints in sentence comprehension. In R Reilly & N. E. Sharkey (Eds) Connectionist Approaches to Natural Language Processing. Hove: Lawrence Erlbaum (in press). Selman, B. (1985) Rule-based processing in a connectionist system for natural language understanding. TR CSRI-168, Computer Systems Research Institute, University of Toronto. Sharkey, N.E. (1988) A PDP system for goal-plan decisions. In R Trappl (Ed.) Cybernetics and Systems. Dordrecht: Kluwer Academic, pp. 1031-1038. Sharkey, N.E. (1989a) A PDP learning approach to natural language understanding. In I. Aleksander (Ed.) Neural Computing Architectures. London: North Oxford Academic. Sharkey, N.E. (1989b) The lexical distance model and word priming. Proceedings of the Eleventh Cognitive Science Society Conference. Sharkey, N.E. (1990) Implementing soft preferences for structural disambiguation. KONNAI (in press). Sharkey, N.E., Sutcliffe, RF.E. & Wobcke, W.R (1986) Mixing binary and continuous connection schemes for knowledge access. Proceedings of the American Association for Artificial Intelligence. Small, S.L., Cottrell, G.W. & Shastri, L. (1982) Towards connectionist parsing. In Proceedings of the National Conference on Artificial Intelligence, Pittsburgh, PA. Smolensky, P. (1987) On variable binding and the representation of symbolic structures in connectionist systems. TR CU-CS-355-87. Depanment of Computer Science, University of Colorado, Boulder, CO. Smolensky, P. (1990) Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence (in press). Waltz, D.L. & Pollack, J.B. (1985) Massively parallel parsing: a strongly interactive model of natural language interpretation. Cognitive Science, 9, 51-74. Wermter, S. & Lehnen, W.G. (1990) Noun phrase analysis with connectionist networks. In R Reilly & N. E. Sharkey (Eds) Connectionist Approaches to Natural Language Processing. Hove: Lawrence Erlbaum (in press). ix Special Editorial Review Panel Robert Allen, Bell Communication Research Garrison W. Cottrell, University of California, San Diego Michael G. Dyer, University of California, Los Angeles Jeffrey L. Elman, University of California, San Diego George Lakoff, University of California, Berkeley Wendy W. Lehnert, University of Massachusetts, Amherst Jordan Pollack, Ohio State University Ronan Reilly, Beckmann Institute, Illinois Bart Selman, University of Toronto Paul Smolensky, University of Colorado, Boulder 1 Chapter 1 Connectionism Cognitive Linguistics ~nd CATHERINE L. HARRIS Cognitive linguists hypothesize that language is the product ofg eneral cognitive abilities. Semantic and functional motivations are sought for grammatical patterns, sentence meaning is viewed as the result of constraint satisfaction, and highly regular linguistic patterns are thought to be mediated by the same processes as irregular patterns. In this paper. recent cognitive linguistics arguments emphasizing the schematicity continuum, the non-autonomy of syntax, and the non-compositionality of semantics are presented and their amenability to connectionist modeling described. Some of the conceptual matches between cognitive linguistics and connectionism are then illustrated by a back propagation model of the diverse meanings of the preposition over. The pattern set consisted of a distribution of form-meaning pairs that was meant to be evocative of English usage in that the regularities implicit in the distribution spanned the spectrum from rules to partial regularities to exceptions. Under pressure to encode these regularities with limited resources. the network used one hidden layer to recode the inputs into a set of abstract properties. The properties discovered by the network correspond closely to semantic features that linguists have proposed when giving an account of the meaning of over. KEYWORDS: Connectionism, semantics, syntax, polysemy, lexicon, schemas. 1. Introduction Over the past decade a small but growing number of papers have argued that solutions to enduring problems in semantics and grammar will require abandoning the theoreti cal framework that has dominated linguistic research in the last 25-30 years (Lakoff, 1987a, 1987b; Langacker, 1982, 1986, 1987a, 1988; Bates & MacWhinney, 1982, 1987; Fauconnier, 1985; Fillmore, 1988; Kuno, 1987; Talmy, 1975, 1983; Givon, 1979). While the proponents of this refocusing have emphasized different linguistic problems, they concur in rejecting the two major tenets of Chomskyan linguistics: the separ ateness and specialness of language (Chomsky's hypothesized 'innate mental organ'; Chomsky, 1980) and the modularity of different types of linguistic information (syntax, semantics, morphology, phonology). In this new framework, language is viewed as a product of cognitive processes. Researchers in cognitive linguistics have sought to show that neither the form nor the Catherine L. Harris, Department of Cognitive Science, 0-015, University of California, La Jolla, CA 92093, USA; Email: [email protected]; Tel: (619) 534-4348. This work was supported in part by an NSF graduate fellowship to the author. The author thanks Farrel Ackerman, Ken Baldwin, Elizabeth Bates, George Lakoff, David Touretzky and Cyma Van Petten for assistance on this project.