Copyright © 2017 by Dr. Martin Jones All rights reserved. This book or any portion thereof may not be reproduced or used in any manner whatsoever without the express written permission of the publisher except for the use of brief quotations in a book review. ISBN-13: 978-1495244377 ISBN-10: 1495244377 http://pythonforbiologists.com Set in PT Serif and Source Code Pro About the author Martin started his programming career by learning Perl during the course of his PhD in evolutionary biology, and started teaching other people to program soon after. Since then he has taught introductory programming to hundreds of biologists, from undergraduates to PIs, and has maintained a philosophy that programming courses must be friendly, approachable, and practical. In his academic career, Martin mixed research and teaching at the University of Edinburgh, culminating in a two year stint as Lecturer in Bioinformatics. He now runs programming courses for biological researchers as a full time freelancer. You can get in touch with Martin at [email protected] Martin's other works include Python for Biologists, Effective Python development for Biologists and Python for complete beginners. Table of Contents 1: Introduction 1 About this book » 1 Why use Python's advanced features? » 1 How to use this book » 2 Exercises and solutions » 5 A note on setting up your environment » 6 Joined-up programming » 7 Getting in touch » 8 2: Recursion and trees 10 Recursively generating kmers » 10 Processing tree-like data » 17 Recap » 29 Exercise » 31 Solution » 32 3: Complex data structures 37 Tuples » 37 Sets » 39 Lists of lists » 41 Lists of dicts and lists of tuples » 43 Other complex structures » 46 Recap » 56 Exercises » 58 Solutions » 59 4: object oriented Python 65 Introduction » 65 A simple DNA sequence class » 67 Constructors » 71 Inheritance » 77 Overriding » 86 Calling methods in the superclass » 88 Polymorphism » 90 Recap » 91 Exercise » 93 Solution » 94 5: Functional Python 110 Introduction » 110 State and mutability » 110 Side effects » 111 Functions as objects » 114 What is to be calculated » 117 built in higher order functions » 118 map » 118 filter » 123 sorted » 125 reduce » 130 Writing higher order functions » 132 Recap » 138 Exercises » 139 Solutions » 142 6: Iterators, comprehensions & generators 158 Defining lists » 158 Lists and iterables » 159 List comprehensions » 160 Dictionary comprehensions » 164 Set comprehensions » 166 Iterators and generators » 166 Recap » 172 Exercises » 173 Solutions » 174 7: Exception handling 181 Catching exceptions » 182 Catching specific errors » 184 else blocks in exception handling » 189 finally blocks in exception handling » 190 Nested try/except blocks » 194 Exceptions bubble up » 195 Raising exceptions » 198 Custom exception types » 201 Recap » 203 Exercises » 205 Solutions » 209 Chapter 1: Introduction 1: Introduction About this book Welcome to Advanced Python for Biologists. As I was completing my first book Python for Biologists I realized that, although it covered all the core parts of the language, I had to leave out some of the most elegant and useful parts of Python, so I was already thinking about the sequel. The purpose of this book is to continue the exploration of the Python language where the previous book left off, with the goal that between them, the two books will cover every useful part of the Python language. The overarching philosophy of this book is exactly the same as the previous one: to illustrate Python features using relevant biological examples which will be useful in real life. Just as before, the emphasis is on Python as a tool for practical problem-solving. Why use Python's advanced features? If you've read and understood Python for Biologists, or indeed any introductory Python programming book, then you probably have all the programming tools that you need to solve any given problem. Why, then, is it necessary to have an entire second book devoted to advanced features of Python? One reason is that in order to understand code that you find in the wild, you need to have a thorough overview of the language. You may be able to get on perfectly well in your own programming career without ever writing a class, a lambda expression, or a list comprehension, but when you come across these techniques in other people's code (and you will), you'll need to know how they work. 1 Chapter 1: Introduction A second, more persuasive reason is that all the features of Python that we are going to discuss in this book have been added to the language for good reasons – because they make code easier to write, easier to maintain, easier to test, faster, or more efficient. You don't have to use objects when modelling biological systems – but it will make development much easier. You don't have to use comprehensions when transforming data – but doing so will allow you to express your ideas much more concisely. You don't have to use recursive functions when processing tree-like data – but your code will be much more readable if you do. Yet another reason is that knowing about features of Python opens up new approaches to programming, which will allow you to think about problems in a new light. For example, two large chapters in this book are devoted to object oriented programming and functional programming. The aim of these chapters is to introduce you not only to object oriented and functional features, but also to object oriented and functional approaches to tackling real life problems. Hopefully, as you encounter new tools and techniques in this book the biological examples will convince you that they're useful things to know about. I have tried, for each new concept introduced, to point out why and in what circumstances it is a better way of doing things than the way that you might already know. How to use this book Picking the order of chapters in Python for Biologists was a straightforward affair, because the concepts and tools followed a natural progression. Picking the order of chapters for this book has been much trickier, because the features and techniques we are going to look at tend to be used together. In other words, whichever way I arrange the chapters, 2 Chapter 1: Introduction there are inevitably some cases where material from one chapter relies on material from a later chapter. I've tried to minimize such cases, and have added footnotes to point out connections between chapters whenever possible. If there's a particular chapter that sounds interesting then it's fine to jump in and start reading there; just be aware that you'll probably have to skip around in the book a bit to fill in any gaps in your current knowledge. Chapters tend to follow a predictable structure. They generally start with a few paragraphs outlining the motivation behind the features that it will cover – why do they exist, what problems do they allow us to solve, and why are they useful in biology specifically? These are followed by the main body of the chapter in which we discuss the relevant features and how to use them. The length of the chapters varies quite a lot – sometimes we want to cover a topic briefly, other times we need more depth. This section ends with a brief recap outlining what we have learned, followed by exercises and solutions (more on that topic below). The book assumes that you're familiar with all the material in Python for Biologists. If you have some Python experience, but haven't read Python for Biologists, then it's probably worth downloading a free copy and at least looking over the chapter contents to make sure you're comfortable with them. I will sometimes refer in the text or in footnotes to sections of Python for Biologists – rather than repeating the URL where you can get a copy1, I'll simply give it here: http://pythonforbiologists.com 1 If you're reading this book as an ebook (as opposed to a physical book) then you should have received a copy of Python for Biologists in your download. 3 Chapter 1: Introduction Formatting A book like this has lots of special types of text – we'll need to look at examples of Python code and output, the contents of files, and technical terms. Take a minute to note the typographic conventions we'll be using. In the main text of this book, bold type is used to emphasize important points and italics for technical terms and filenames. Where code is mixed in with normal text it's written in a monospaced font like this with a grey background. Occasionally there are footnotes1 to provide additional information that is interesting to know but not crucial to understanding, or to give links to web pages. Example Python code is highlighted with a solid border and the name of the matching example file is written just underneath the example to the right: Some example code goes here example.py Not every bit of code has a matching example file – much of the time we'll be building up a Python program bit by bit, in which case there will be a single example file containing the finished version of the program. The example files are in separate folders, one for each chapter, to make them easy to find. Sometimes it's useful to refer to a specific line of code inside an example. For this, we'll use numbered circles like this❶: a line of example code another line of example code this is the important line ❶ here is another line 1 Like this. 4