Getting Started with Deep Learning for Natural Language Processing Learn How to Build NLP Applications with Deep Learning Sunil Patel www.bpbonline.com FIRST EDITION 2021 Copyright © BPB Publications, India ISBN: 978-93-89898-11-8 All Rights Reserved. No part of this publication may be reproduced, distributed or transmitted in any form or by any means or stored in a database or retrieval system, without the prior written permission of the publisher with the exception to the program listings which may be entered, stored and executed in a computer system, but they can not be reproduced by the means of publication, photocopy, recording, or by any electronic and mechanical means. LIMITS OF LIABILITY AND DISCLAIMER OF WARRANTY The information contained in this book is true to correct and the best of author’s and publisher’s knowledge. The author has made every effort to ensure the accuracy of these publications, but publisher cannot be held responsible for any loss or damage arising from any information in this book. All trademarks referred to in the book are acknowledged as properties of their respective owners but BPB Publications cannot guarantee the accuracy of this information. Distributors: BPB PUBLICATIONS 20, Ansari Road, Darya Ganj New Delhi-110002 Ph: 23254990/23254991 MICRO MEDIA Shop No. 5, Mahendra Chambers, 150 DN Rd. Next to Capital Cinema, V.T. (C.S.T.) Station, MUMBAI-400 001 Ph: 22078296/22078297 DECCAN AGENCIES 4-3-329, Bank Street, Hyderabad-500195 Ph: 24756967/24756400 BPB BOOK CENTRE 376 Old Lajpat Rai Market, Delhi-110006 Ph: 23861747 Published by Manish Jain for BPB Publications, 20 Ansari Road, Darya Ganj, New Delhi-110002 and Printed by him at Repro India Ltd, Mumbai www.bpbonline.com Dedicated to My family About the Author Sunil Patel has completed his Master’s in Information Technology from the Indian Institute of Information Technology-Allahabad, with a thesis focused on investigating 3D protein-protein interactions with deep learning. Sunil has worked with TCS Innovation Labs, Excelra, and Innoplexus before joining Nvidia. The main areas of research were using Deep Learning, Natural language processing in Banking, and healthcare domain. Sunil started experimenting with deep learning by implanting the basic layer used in pipelines and then developing complex pipelines for a real-life problem. Additionally, Sunil has participated in CASP-2014 in collaboration with SCFBIO-IIT Delhi to efficiently predict possible Protein multimer formation and its impact on diseases using Deep Learning. Currently, Sunil works as Data Scientist – III with Nvidia. In Nvidia, Sunil has expanded his area of interest to computer vision and simulated environments, and he extensively works in the banking, defense, and healthcare verticals areas. Sunil is currently focused on using GPUs for high-fidelity physics simulation. He has 3 pending US patents and 4 publications in the Deep Learning domain. To know more about his current research topic and interests, you can check out his LinkedIn profile: About the Reviewer Anurag Punia has 6 years of experience in data science and machine learning, with a special interest in topic modeling, information retrieval, and named entity recognition under the subfield of natural language processing. He has worked and delivered several data science projects across industry verticals, like insurance, asset management, marketing, tourism, and real estate. Currently, he is part of the center of excellence of a leading logistics company in Dubai, UAE. Anurag has a research-focused BS-MS dual degree from IISER Bhopal with a major in physics. He can be reached at [email protected] or https://www.linkedin.com/in/anurag-punia-data-scientist/ Acknowledgements First and foremost, I would like to thank God for giving me the courage to write this book. I would like to thank everyone at BPB Publications for helping me polish it and finally converting my writing to paperback. I would also like to thank my parents, wife, and brother for their endless support and for helping me in numerous ways. Lastly, I would like to thank my critics. Without their criticism, I would never be able to write this book. Sunil Patel Preface “The world’s most valuable resource is no longer oil but its data”. Nowadays, titans and the most valued firm in the world like Amazon, Google, Apple, and Microsoft have similar concerns as were raised for oil a century ago. Data is changing the way we live, and the amount of data generated in the past few years is more than that generated since human beings have existed. The amount of data is expected to grow exponentially with the boom in connected devices, personal assistants, blockchain, and mobile devices. The condition for the storage of data is getting favorable, as storage devices are getting cheaper 3X every 3 years. Hardware giants like Nvidia already claimed to have broken Moore’s law, which also indicates the exponential growth in processing power. Today’s world is highly favorable to the data-centric economy. And that’s exactly why data is the next oil. Unstructured and structured data is increasing at a similar rate. The former comes from a majority of sources, and algorithms are constantly being discovered to store and assimilate such data. Unstructured data can be anything, for example, scientific literature, randomly clicked selfies, chat messages, sensor data from self-driving vehicles, and voice/video over the Internet. It is rich in information, but processing such data and training a machine using such data is challenging. However, advancement has been made in gaining better understanding of unstructured data and using such a pre-trained network for supervised learning