Alexander Lancaster and Gordon Webster Python for the Life Sciences A Gentle Introduction to Python for Life Scientists Alexander Lancaster Amber Biology, Cambridge, MA, USA Gordon Webster Amber Biology, Cambridge, MA, USA Any source code or other supplementary material referenced by the author in this book is available to readers on GitHub via the book’s product page, located at www.apress.com/978-1-4842-4522-4 . For more detailed information, please visit http://www.apress.com/source- code . ISBN 978-1-4842-4522-4 e-ISBN 978-1-4842-4523-1 https://doi.org/10.1007/978-1-4842-4523-1 © Alexander Lancaster and Gordon Webster 2019 Apress Standard Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Distributed to the book trade worldwide by Springer Science+Business Media New York, 233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail orders- [email protected], or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation. To our families Praise for Python for the Life Sciences Fun, entertaining, witty, and darn useful. A magical portal to the big data revolution. —Sandro Santagata, Assistant Professor in Pathology, Harvard Medical School With Python for the Life Sciences , Lancaster and Webster have provided a comprehensive introduction to using Python for computational biology. Biologists use Python for data wrangling, statistical inference, and developing mathematical models, and PftLS provide guidance on all three of these application areas. Notably, Lancaster and Webster bring the sage advice of experienced software developers to the table – they share bits of trivia and historical context that make programming fun and make the occasional quirks of Python/Unix more understandable. This is not a superficial introduction, and careful readers will emerge with a deep understanding of Python, rather than as simple users. It is a lovely book with humor and perspective. —John Novembre, Associate Professor of Human Genetics, University of Chicago and MacArthur Fellow Alex and Gordon’s enthusiasm for Python is contagious. Their book is specifically written for those who understand they could greatly benefit from some training in computer programming. This addition to their academic research will be invaluable. The various chapters take you through a combined tour of Python and the multitude of biological issues it is relevant to. This is not just a “recipe” book for how to use Python nor a how to book on advanced tools but a way to jumpstart your imagination. —Glenys Thomson, Professor of Integrative Biology, University of California, Berkeley Informatics is a key component of modern biological science, and programming skills are essential for the modern life sciences researcher. Even if one does not write programs, the ability to read and understand code is becoming as important as being able to read a published paper. Webster and Lancaster’s Python for the Life Sciences is an excellent tutorial for both programming novices and experienced coders who wish to learn Python. —Steven J. Mack, Assistant Scientist, Children’s Hospital Oakland Research Institute Down the Rabbit Hole “Begin at the beginning,” the King said, very gravely, “and go on till you come to the end: then stop.” Our aim in this book is to teach you the basics of Python using examples familiar to life scientists from the very first chapters. Are you ready to find out how to use Python to automate lab calculations, search for gene promoter sequences , rotate a molecular bond , drive a 96-well plate robot , build a cellular toggle switch , model animal coat pattern formation , grow a virtual plant , simulate a flu epidemic , or evolve populations ? If so, you’ve come to the right place. Ready to go down the rabbit hole? Let’s begin… Prologue Welcome to the Kingdom of Nerdia “But I want to write code,” declared Alice. “Being able to write code could help me enormously with my research. There’s only so much you can do with a hand calculator and a spreadsheet.” “Pahh!” exclaimed the Mad Hatter. “You’re a biologist and everyone knows that real biologists don’t write code.” Amid the uneasy silence that settled around the table, Alice appeared both angry and unconvinced. “Python!” she uttered suddenly. “I’m going to learn Python and there’s nothing you can say to talk me out of it!” Who are you? You are probably a life scientist working in an academic or commercial research environment, and your best friends in the lab (apart from your real friends, your iPhone, and the bobble-head Charles Darwin action figure that hangs above your bench) are probably your calculator and your Excel spreadsheets. 1 You’ve likely never written much if any computer code but you have wished on many occasions that you knew how to, since you know that it could help you enormously in your work. Alas, computer programming was not a core component of the life science curriculum in your college and graduate school experience, and now that you’re already committed to your deep dive down the research rabbit hole, it’s hard to imagine having the time or the energy to learn computer programming at this point in your career (especially since all those pending grants, presentations, and research reports are just not going to write themselves). This at least is who we hope you are, since that’s the kind of person who’s most likely to put their hand in their pocket and fork over some cash to buy our book. Who are you not? You are probably not already an experienced bioinformatician, computational biologist, computer scientist, or experienced programmer. If you are, then your needs are likely already being met by the ton of great books that exist for duly anointed code gurus such as yourself. Our humble book is very unlikely to teach you anything you don’t already know about programming or Python for that matter. You are also probably not a computer scientist or programmer looking to learn some biology through the avenue of the computer programming paradigm that you are already so familiar with. If you were hoping for this, then we’re afraid you’re looking in the wrong place. This book assumes that you pretty much already know the biology and just need some help and encouragement to learn how to be able to get going writing code to help you with your life science research . So in summary then If you’re already an experienced computer head, coder type looking to enhance your computer skills or learn some biology, move along, nothing to see here. If you’re an experienced life scientist with little or no exposure to computer programming who wants a really fast and intuitive introduction to writing code so that you can get up and running and using it in your research as soon as possible, well pull up a seat my friend because you’ve come to the right place! Why Python? Python 2 is one of the most popular and rapidly growing computer programming languages. You can use it for everything from the tiniest tasks such as a simple script of a few lines of code for reading and processing a data file from a lab instrument to large-scale research