ebook img

Machine Translation PDF

246 Pages·2015·2.352 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Machine Translation

Accessing the E-book edition Using the VitalSource® ebook DOWNLOAD AND READ OFFLINE Access to the VitalBookTM ebook accompanying this book is To use your ebook offline, download BookShelf to your PC, via VitalSource® Bookshelf – an ebook reader which allows Mac, iOS device, Android device or Kindle Fire, and log in to you to make and share notes and highlights on your ebooks your Bookshelf account to access your ebook: and search across all of the ebooks that you hold on your On your PC/Mac VitalSource Bookshelf. You can access the ebook online or Go to http://bookshelf.vitalsource.com/ and follow the offline on your smartphone, tablet or PC/Mac and your notes instructions to download the free VitalSource Bookshelf and highlights will automatically stay in sync no matter where app to your PC or Mac and log into your Bookshelf account. you make them. On your iPhone/iPod Touch/iPad 1. Create a VitalSource Bookshelf account at Download the free VitalSource Bookshelf App available https://online.vitalsource.com/user/new or log into via the iTunes App Store and log into your Bookshelf your existing account if you already have one. account. You can find more information at https://support. vitalsource.com/hc/en-us/categories/200134217- 2. Redeem the code provided in the panel below Bookshelf-for-iOS to get online access to the ebook. Log in to Bookshelf and click the Account menu at the top right On your Android™ smartphone or tablet of the screen. Select Redeem and enter the redemption Download the free VitalSource Bookshelf App available code shown on the scratch-off panel below in the Code via Google Play and log into your Bookshelf account. You can To Redeem box. Press Redeem. Once the code has find more information at https://support.vitalsource.com/ been redeemed your ebook will download and appear in hc/en-us/categories/200139976-Bookshelf-for-Android- your library. and-Kindle-Fire On your Kindle Fire Download the free VitalSource Bookshelf App available from Amazon and log into your Bookshelf account. You can find more information at https://support.vitalsource.com/ hc/en-us/categories/200139976-Bookshelf-for-Android- and-Kindle-Fire N.B. The code in the scratch-off panel can only be used once. When you have created a Bookshelf account and redeemed the code you will be able to access the ebook online or offline on your smartphone, tablet or PC/Mac. SUPPORT If you have any questions about downloading Bookshelf, creating your account, or accessing and using your ebook edition, please visit http://support.vitalsource.com/ Machine Translation Pushpak Bhattacharyya Indian Institute of Technology Bombay Mumbai, India CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2015 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20141121 International Standard Book Number-13: 978-1-4398-9719-5 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit- ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright. com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com To My Mother TThhiiss ppaaggee iinntteennttiioonnaallllyy lleefftt bbllaannkk Contents List of Figures .........................................................................................................xi List of Tables ..........................................................................................................xv Preface ...................................................................................................................xix Acknowledgments ............................................................................................xxiii About the Author ...............................................................................................xxv 1. Introduction .....................................................................................................1 1.1 A Feel for a Modern Approach to Machine Translation: Data-Driven MT ....................................................................................2 1.2 MT Approaches: Vauquois Triangle ...................................................4 1.2.1 Understanding Transfer over the Vauquois Triangle .........9 1.2.2 Understanding Ascending and Descending Transfer ......14 1.2.2.1 Descending Transfer ..............................................14 1.2.2.2 Ascending Transfer ................................................16 1.2.2.3 Ascending Transfer due to Tool and Resource Disparity .................................................17 1.3 Language Divergence with Illustration between Hindi and English ..........................................................................................19 1.3.1 Syntactic Divergence .............................................................19 1.3.1.1 Constituent Order Divergence .............................19 1.3.1.2 Adjunction Divergence ..........................................20 1.3.1.3 Preposition-Stranding Divergence ......................21 1.3.1.4 Null Subject Divergence ........................................21 1.3.1.5 Pleonastic Divergence ............................................22 1.3.2 Lexical-Semantic Divergence ...............................................22 1.3.2.1 Conflational Divergence ........................................22 1.3.2.2 Categorial Divergence ...........................................23 1.3.2.3 Head-Swapping Divergence .................................23 1.3.2.4 Lexical Divergence .................................................24 1.4 Three Major Paradigms of Machine Translation ............................25 1.5 MT Evaluation .....................................................................................29 1.5.1 Adequacy and Fluency .........................................................30 1.5.2 Automatic Evaluation of MT Output ..................................32 1.6 Summary ..............................................................................................33 Further Reading .............................................................................................34 2. Learning Bilingual Word Mappings ........................................................37 2.1 A Combinatorial Argument ..............................................................39 2.1.1 Necessary and Sufficient Conditions for Deterministic Alignment in Case of One-to-One Word Mapping .............39 v vi Contents 2.1.2 A Naïve Estimate for Corpora Requirement......................40 2.1.2.1 One-Changed-Rest-Same ......................................41 2.1.2.2 One-Same-Rest-Changed ......................................42 2.2 Deeper Look at One-to-One Alignment ..........................................46 2.2.1 Drawing Parallels with Part of Speech Tagging ...............46 2.3 Heuristics-Based Computation of the V × V Table .....................50 E F 2.4 Iterative (EM-Based) Computation of the V × V Table ...............51 E F 2.4.1 Initialization and Iteration 1 of EM .....................................52 2.4.2 Iteration 2 ................................................................................53 2.4.3 Iteration 3 ................................................................................54 2.5 Mathematics of Alignment ................................................................56 2.5.1 A Few Illustrative Problems to Clarify Application of EM ..................................................................57 2.5.1.1 Situation 1: Throw of a Single Coin .....................57 2.5.1.2 Throw of Two Coins...............................................57 2.5.1.3 Generalization: Throw of More Than One “Something,” Where That “Something” Has More Than One Outcome .............................59 2.5.2 Derivation of Alignment Probabilities ...............................62 2.5.2.1 Key Notations .........................................................62 2.5.2.2 Hidden Variables (a; the alignment variables) ......63 2.5.2.3 Parameters (θ) .........................................................63 2.5.2.4 Data Likelihood ......................................................64 2.5.2.5 Data Likelihood L(D;θ), Marginalized over A .....64 2.5.2.6 Marginalized Data Log-Likelihood LL(D, A;θ) ....64 2.5.2.7 Expectation of Data Log-Likelihood E(LL(D; Θ)) ...64 2.5.3 Expressing the E- and M-Steps in Count Form .................67 2.6 Complexity Considerations ...............................................................68 2.6.1 Storage .....................................................................................68 2.6.2 Time .........................................................................................70 2.7 EM: Study of Progress in Parameter Values ....................................70 2.7.1 Necessity of at Least Two Sentences ...................................71 2.7.2 One-Same-Rest-Changed Situation .....................................71 2.7.3 One-Changed-Rest-Same Situation .....................................72 2.8 Summary ..............................................................................................73 Further Reading .............................................................................................76 3. IBM Model of Alignment ...........................................................................79 3.1 Factors Influencing P(f|e) ...................................................................81 3.1.1 Alignment Factor a ................................................................81 3.1.2 Length Factor m ......................................................................82 3.2 IBM Model 1 .........................................................................................86 3.2.1 The Problem of Summation over Product in IBM Model 1 ...........................................................................86 Contents vii 3.2.2 EM for Computing P(f|e) ......................................................88 3.2.3 Alignment in a New Input Sentence Pair ..........................91 3.2.4 Translating a New Sentence in IBM Model 1: Decoding ............................................................................91 3.3 IBM Model 2 .........................................................................................93 3.3.1 EM for Computing P(f|e) in IBM Model 2 ..........................94 3.3.2 Justification for and Linguistic Viability of P(i|j,l,m) ........96 3.4 IBM Model 3 .........................................................................................98 3.5 Summary ............................................................................................102 Further Reading ...........................................................................................103 4. Phrase-Based Machine Translation ........................................................105 4.1 Need for Phrase Alignment.............................................................106 4.1.1 Case of Promotional/Demotional Divergence ................106 4.1.2 Case of Multiword (Includes Idioms) ...............................107 4.1.3 Phrases Are Not Necessarily Linguistic Phrases ............108 4.2 An Example to Illustrate Phrase Alignment Technique .............108 4.2.1 Two-Way Alignments ..........................................................109 4.2.2 Symmetrization ....................................................................110 4.2.3 Expansion of Aligned Words to Phrases ..........................111 4.2.3.1 Principles of Phrase Construction .....................111 4.3 Phrase Table .......................................................................................115 4.4 Mathematics of Phrase-Based SMT ................................................116 4.4.1 Understanding Phrase-Based Translation through an Example............................................................................117 4.4.2 Deriving Translation Model and Calculating Translation and Distortion Probabilities ..........................119 4.4.3 Giving Different Weights to Model Parameters ..............120 4.4.4 Fixing λ Values: Tuning ......................................................121 4.5 Decoding ............................................................................................122 4.5.1 Example to Illustrate Decoding .........................................125 4.6 Moses ..................................................................................................128 4.6.1 Installing Moses ...................................................................128 4.6.2 Workflow for Building a Phrase-Based SMT System .....129 4.6.3 Preprocessing for Moses .....................................................129 4.6.4 Training Language Model ..................................................131 4.6.5 Training Phrase Model ........................................................131 4.6.6 Tuning ....................................................................................132 4.6.6.1 MERT Tuning ........................................................132 4.6.7 Decoding Test Data ..............................................................133 4.6.8 Evaluation Metric .................................................................133 4.6.9 More on Moses .....................................................................133 4.7 Summary ............................................................................................134 Further Reading ...........................................................................................135

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.