SANDHI SPLITTER AND ANALYZER FOR SANSKRIT (With Special Reference to aC Sandhi) Dissertation submitted to Jawaharlal Nehru University In partial fulfillment of the requirements for award of the degree of MASTER OF PHILOSOPHY SACHIN KUMAR SPECIAL CENTRE FOR SANSKRIT STUDIES JAWAHARLAL NEHRU UNIVERSITY NEW DELHI-110067 INDIA 2007 ÌuÉÍvÉ· xÉÇxM×üiÉ AkrÉrÉlÉ MåülSì eÉuÉÉWûUsÉÉsÉ lÉåWûà ÌuɵÉÌuɱÉsÉrÉ lÉD ÌSssÉÏ-110067 SPECIAL CENTRE FOR SANSKRIT STUDIES JAWAHARLAL NEHRU UNIVERSITY NEW DELHI-110067 July 30, 2007 D E C L A R A T I O N I declare that the dissertation entitled “Sandhi Splitter and Analyzer for Sanskrit (with special reference to aC sandhi)” submitted by me for the award of the degree of Doctor of Philosophy is an original research work and has not been previously submitted for any other degree or diploma in any other institution/university. (Sachin Kumar) ÌuÉÍvÉ· xÉÇxM×üiÉ AkrÉrÉlÉ MåülSì eÉuÉÉWûUsÉÉsÉ lÉåWûà ÌuɵÉÌuɱÉsÉrÉ lÉD ÌSssÉÏ-110067 SPECIAL CENTRE FOR SANSKRIT STUDIES JAWAHARLAL NEHRU UNIVERSITY NEW DELHI-110067 July 30, 2007 C E R T I F I C A T E This dissertation entitled “Sandhi Splitter and Analyzer for Sanskrit (with special reference to aC sandhi)” submitted by Sachin Kumar to Special Centre for Sanskrit Studies, Jawaharlal Nehru University, New Delhi- 110067, for the award of the degree of Master of Philosophy, is an original work and has not been submitted so far, in part or full, for any other degree or diploma of any University. This may be placed before the examiners for evaluation. Dr. C. Upender Rao Dr. Girish Nath Jha (Chairperson) (Supervisor) To MY LATE GRANDFATHER SH. HEMRAJ MEHANDIRATTA ACKNOWLEDGEMENT I would like to express my heartfelt gratitude to all those without whom I would not have been able to complete this work. The first and foremost, I am extremely thankful to my supervisor Dr. Girish Nath Jha, who provided me his generous support, encouragement and inspiration all through this work. It is his impeccable and persuasive guidance, constructive suggestions which have molded my work in the present shape. I express my deepest sense of gratitude to him. I also express my sincere gratitude to faculty members of my centre, Shashiprabha Ma’am, Upender Sir, Hari Ram Sir, Rajnish Sir, Ram Nath Sir and Santosh Sir for their teaching and assistance. I also acknowledge all the staff members for their cooperation and extending facilities to complete my work. I duly acknowledge University Grants Commission (UGC) for providing me the financial assistance. I also extend my special thanks to the library staffs of my University, especially to Malik Sir who generously helped in finding the relevant material. I am deeply indebted to my Dadi, Papa, Mummy, Chacha, Chachi, Sisters and Brothers for their prayers, affection and immense support. I am grateful to my Shyam Chacha whose constant inspiration and never-ending love is a pillar of strength to me. My special thanks to Sudhir ji, Chandra ji, Subhash ji, Ainura ji, Narayan ji, Muktanand, Manji, Diwakar, Diwakar Mishra, Surjit, Vijendra and Mayank for helping me to systematize the ideas for my R & D. I extend my special appreciation to Alok, Mukesh and Bala for helping me in data entry. I am obliged to acknowledge my friends and my seniors like Subhash Sir, Ajay Sir, Piyush Sir, Vijay Bhaiya, Devendra Sir, Vimal Sir, Yogesh Ji, Ved, Nandi, Ramanuj, Chander and many others who gave immense moral courage in the course of my writing. They were always eager to know about the progress of my work till I completed. Sachin Kumar Contents Page No ACKNOWLEDGEMENT i CONTENTS ii-iv List of abbreviations used in the dissertation v List of Tables vi Transliteration key used in the dissertation vii-viii Devangar input mechanism according to Baraha software ix Introduction 1-3 Chapter - I Sanskrit sandhi and its computation 4-18 1.1 Introduction Forward Computation of sandhi Reverse Computation of sandhi 1.2 Computational Morpho-phonemics 1.2.1 Computational Phonology 1.2.2 Issues in Computational 2honology 1.2.3 Computational Morphology Complexity of word formation Morphological processes Morpheme combination 1.2.4 Issues in Computational Morphology 1.2.5 Morphophonemics or Morphophonology 1.2.6 Issues in Morphophonemics 1.2.7 Morphophonemics in Sanskrit 1.3 Need for the sandhi analyzer 1.4 Survey of R&D and available literature in this area 1.4.1 Work related to sandhi processing 1.4.2 Work related to NLP of Sanskrit and other Indian Languages 1.4.2.1 ASR, Melkote 1.4.2.2 The Sanskrit Heritage Site 1.4.2.3 CDAC, Banglore 1.4.2.4 IIT, Kanpur 1.4.2.5 IIIT, Hyderabad 1.4.2.6 IIT, Bombay 1.4.2.7 Rashtriya Sanskrit Vidyapeetha (RSV), Tirupati 1.4.2.8 RCILTS – Utkal University 1.4.2.9 AU-KBC Research Centre 1.4.2.10 The Sanskrit Library 1.4.2.11 Sanskrit Studies Links and Information 1.4.2.12 Jawaharlal Nehru University (JNU) 1.4.2.13 Special Center for Sanskrit Studies, JNU Chapter - II Sandhi formalism of Pāini 19-34 2.1 System of Pāini 2.1.1 iva stras or pratyhra stra 2.1.2 The Place and Manner (uccāraa sthāna and prayatna) 2.2 Sandhi 2.2.1 Sandhi: morphophonological or morpholexical alternation 2.2.2 External and Internal sandhi 2.2.3 Types of sandhi 2.3 Vowel sandhi 2.3.1 Types of vowel sandhi ya sandhi ayādi sandhi gua sandhi vddhi sandhi drgha sandhi prvarpa sandhi pararpa sandhi 2.3.2 Exceptions of vowel sandhi Chapter - III Lexical Resources for Reverse Sandhi Analysis 35-51 3.1 Introduction 3.2 Viccheda patterns 3.2.1 Rule-base for ya sandhi 3.2.2 Rule base for ayādi sandhi 3.2.3 Rule base for gua sandhi 3.2.4 Rule base for vddhi sandhi 3.2.5 Rule base for drgha sandhi 3.2.6 Rule base for prvarpa sandhi 3.2.7 Rule base for pararpa sandhi 3.3 Sandhi Lexicon 3.4 Search corpus 3.4.1 Verb database 3.4.2 Avyaya database 3.4.3 Subanta corpus 3.4.4 Place Name database 3.4.5 Noun database 3.5 Example database 3.5.1 Vārttika list 3.5.2 Example List Chapter - IV Online Sandhi Analyzer System 52-63 4.1 Introduction 4.2 The web interface of Sandhi Analyzer for Sanskrit (SAS) 4.3 Viccheda Modules 4.3.1 Preprocessor 4.3.1.1 Check Punctuation 4.3.1.2 Check example base 4.3.2 Subanta Analyzer 4.3.3 Fixed List checking 4.3.4 Sandhi Analysis 4.3.4.1 Sandhi marking and pattern identification 4.3.4.2 Result generator 4.4 Illustration Conclusion 64-70 Appendices 71-82 Bibliography 83-88 SAS CD Enclosed List of Abbreviations A. Aādhyāy ASR Academy of Sanskrit Research JNU Jawaharlal Nehru University JSP Java Server Pages Kā. vt Kāikāvtti LTRC Language Technologies Research Centre MAT Machine Aided Translation MT Machine Translation MTS Machine Translation System MWSDD Monier Williams Sanskrit Digital Dictionary NL Natural Language NLP Natural Language Processing OCR Optical Character Recognition POS Part of Speech R&D Research and Development RCILTS Resource Centre for Indian Language Technology Solutions RSV Rashtriya Sanskrit Vidyapeetha SAS Sandhi Analyzer for Sanskrit SCSS Special Centre for Sanskrit Studies Sid. Kau. Siddhāntakaumud TDIL Technology Development for Indian Languages List of Tables Table No. Name of the Table 2.1 chart of place and manner of articulation 2.2 outline of forward ya sandhi 2.3 outline of forward ayādi sandhi 2.4 outline of forward gua sandhi 2.5 outline of forward vddhi sandhi 2.6 outline of forward drgha sandhi 2.7 outline of forward prvarpa sandhi 2.8 outline of forward pararpa sandhi 3.1 outline of reverse ya sandhi 3.2 outline of reverse ayādi sandhi 3.3 outline of extension of reverse ayādi sandhi 3.4 outline of reverse gua sandhi 3.5 outline of reverse gua sandhi - exception 3.6 outline of reverse vddhi sandhi 3.7 outline of reverse drgha sandhi 3.8 outline of reverse drgha sandhi exception 3.9 outline of reverse prvarpa sandhi 3.10 outline of reverse pararpa sandhi
Description: