ebook img

Elegant Scipy: The Art of Scientific Python PDF

277 Pages·2017·29.07 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Elegant Scipy: The Art of Scientific Python

Elegant S ciPy THE ART OF SCIENTIFIC PYTHON Juan Nunez-Iglesias, Stéfan van der Walt & Harriet Dashnow Elegant SciPy The Art of Scientific Python Juan Nunez-Iglesias, Stéfan van der Walt, and Harriet Dashnow BBeeiijjiinngg BBoossttoonn FFaarrnnhhaamm SSeebbaassttooppooll TTookkyyoo Elegant SciPy by Juan Nunez-Iglesias, Stéfan van der Walt, and Harriet Dashnow Copyright © 2017 Juan Nunez-Iglesias, Stéfan van der Walt, and Harriet Dashnow. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/insti‐ tutional sales department: 800-998-9938 or [email protected]. Editor: Nan Barber Indexer: Judy McConville Production Editor: Melanie Yarbrough Interior Designer: David Futato Copyeditor: Christina Edwards Cover Designer: Karen Montgomery Proofreader: Rachel Monaghan Illustrator: Rebecca Demarest August 2017: First Edition Revision History for the First Edition 2017-08-10: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781491922873 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Elegant SciPy, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-491-92287-3 [LSI] Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii 1. Elegant NumPy: The Foundation of Scientific Python. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Introduction to the Data: What Is Gene Expression? 2 NumPy N-Dimensional Arrays 6 Why Use ndarrays Instead of Python Lists? 8 Vectorization 10 Broadcasting 10 Exploring a Gene Expression Dataset 12 Reading in the Data with pandas 12 Normalization 14 Between Samples 14 Between Genes 21 Normalizing Over Samples and Genes: RPKM 24 Taking Stock 30 2. Quantile Normalization with NumPy and SciPy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Getting the Data 33 Gene Expression Distribution Differences Between Individuals 34 Biclustering the Counts Data 37 Visualizing Clusters 39 Predicting Survival 42 Further Work: Using the TCGA’s Patient Clusters 46 Further Work: Reproducing the TCGA’s clusters 46 3. Networks of Image Regions with ndimage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Images Are Just NumPy Arrays 50 Exercise: Adding a Grid Overlay 55 iii Filters in Signal Processing 56 Filtering Images (2D Filters) 63 Generic Filters: Arbitrary Functions of Neighborhood Values 66 Exercise: Conway’s Game of Life 67 Exercise: Sobel Gradient Magnitude 68 Graphs and the NetworkX library 68 Exercise: Curve Fitting with SciPy 72 Region Adjacency Graphs 73 Elegant ndimage: How to Build Graphs from Image Regions 76 Putting It All Together: Mean Color Segmentation 78 4. Frequency and the Fast Fourier Transform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Introducing Frequency 81 Illustration: A Birdsong Spectrogram 84 History 90 Implementation 91 Choosing the Length of the DFT 92 More DFT Concepts 94 Frequencies and Their Ordering 94 Windowing 100 Real-World Application: Analyzing Radar Data 105 Signal Properties in the Frequency Domain 111 Windowing, Applied 115 Radar Images 117 Further Applications of the FFT 122 Further Reading 122 Exercise: Image Convolution 123 5. Contingency Tables Using Sparse Coordinate Matrices. . . . . . . . . . . . . . . . . . . . . . . . . . 125 Contingency Tables 127 Exercise: Computational Complexity of Confusion Matrices 128 Exercise: Alternative Algorithm to Compute the Confusion Matrix 128 Exercise: Multiclass Confusion Matrix 128 scipy.sparse Data Formats 129 COO Format 129 Exercise: COO Representation 130 Compressed Sparse Row Format 130 Applications of Sparse Matrices: Image Transformations 133 Exercise: Image Rotation 138 Back to Contingency Tables 139 Exercise: Reducing the Memory Footprint 140 Contingency Tables in Segmentation 140 iv | Table of Contents Information Theory in Brief 142 Exercise: Computing Conditional Entropy 144 Information Theory in Segmentation: Variation of Information 145 Converting NumPy Array Code to Use Sparse Matrices 147 Using Variation of Information 149 Further Work: Segmentation in Practice 156 6. Linear Algebra in SciPy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Linear Algebra Basics 157 Laplacian Matrix of a Graph 158 Exercise: Rotation Matrix 159 Laplacians with Brain Data 165 Exercise: Showing the Affinity View 170 Exercise Challenge: Linear Algebra with Sparse Matrices 170 PageRank: Linear Algebra for Reputation and Importance 171 Exercise: Dealing with Dangling Nodes 176 Exercise: Equivalence of Different Eigenvector Methods 176 Concluding Remarks 176 7. Function Optimization in SciPy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Optimization in SciPy: scipy.optimize 179 An Example: Computing Optimal Image Shift 180 Image Registration with Optimize 186 Avoiding Local Minima with Basin Hopping 190 Exercise: Modify the align Function 190 “What Is Best?”: Choosing the Right Objective Function 191 8. Big Data in Little Laptop with Toolz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Streaming with yield 200 Introducing the Toolz Streaming Library 203 k-mer Counting and Error Correction 206 Currying: The Spice of Streaming 210 Back to Counting k-mers 212 Exercise: PCA of Streaming Data 214 Markov Model from a Full Genome 214 Exercise: Online Unzip 217 Epilogue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Appendix: Exercise Solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Table of Contents | v Preface Unlike the stereotypical wedding dress, it was—to use a technical term—elegant, like a com‐ puter algorithm that achieves an impressive outcome with just a few lines of code. —Graeme Simsion, The Rosie Effect Welcome to Elegant SciPy. We’re going to spend rather a lot of time focusing on the “SciPy” bit of the title, so let’s take a moment to reflect on the “Elegant” bit. There are plenty of manuals, tutorials, and documentation websites out there that describe the SciPy library. Elegant SciPy goes further. More than just teaching you how to write code that works, we will inspire you to write code that rocks! In The Rosie Effect (hilarious book; go read its prequel The Rosie Project when you’re done with Elegant SciPy), Graeme Simsion twists the conventions of the word “ele‐ gant” around. Most would use it to describe the visual simplicity, style, and grace of, say, the first iPhone. Instead Graeme Simsion’s hero, Don Tillman, uses a computer algorithm to define elegance. We hope that you will understand exactly what he means after reading this book; that you will read or write a piece of elegant code, and feel calmed in the glow of its beauty and grace. (Note: The authors may be prone to hyperbole.) A good piece of code just feels right. When you look at it, its intent is clear, it is often concise (but not so concise as to be obscure), and it is efficient at executing the task at hand. For the authors, the joy of analyzing elegant code lies in the lessons hidden within, and the way it inspires us to be creative in how we approach new coding prob‐ lems. Ironically, creativity can also tempt us to show off cleverness at the expense of the reader, and write obtuse code that is hard to understand. PEP8 (the Python style guide) and PEP20 (the Zen of Python) remind us that “code is read much more often than it is written” and therefore “readability counts.” The conciseness of elegant code comes through abstraction and the judicious use of functions, not just through packing in a bunch of nested function calls. It may take a vii minute or two to grok, but it should ultimately provide a crisp, “ah-ha!” moment of understanding. Once you know the various components of the code, its correctness should be obvious. This can be aided by clear variable and function names, and care‐ fully crafted comments that explain the code, rather than merely describe it. In the New York Times, software engineer J. Bradford Hipps recently argued that “to write better code, [one should] read Virginia Woolf”: As a practice, software development is far more creative than algorithmic. The developer stands before her source code editor in the same way the author con‐ fronts the blank page. […] They may also share a healthy impatience for the ways things “have always been done” and a generative desire to break conventions. When the module is finished or the pages complete, their quality is judged against many of the same standards: elegance, concision, cohesion; the discovery of symmetries where none were seen to exist. Yes, even beauty. This is the position we take in this book. Now that we’ve dealt with the “elegant” part of the title, let’s come back to the “SciPy.” Depending on context, “SciPy” can mean a software library, an ecosystem, or a com‐ munity. Part of what makes SciPy great is that it has excellent online documentation and tutorials, rendering Just Another Reference book pointless; instead, Elegant SciPy wants to present the best code built with SciPy. The code we have chosen highlights clever, elegant uses of advanced features of NumPy, SciPy, and related libraries. The beginning reader will learn to apply these libraries to real-world problems using beautiful code. And we use real scientific data to motivate our examples. Like SciPy itself, we wanted Elegant SciPy to be driven by the community. We’ve taken many of our examples from working code found in the wider scientific Python eco‐ system, selecting them for their illustration of the principles of elegant code we out‐ lined above. Who Is This Book For? Elegant SciPy is intended to inspire you to take your Python to the next level. You will learn SciPy by example, from the very best code. Before starting, you should at least have seen Python, and know about variables, functions, loops, and maybe a bit of NumPy. You might have even honed your Python skills with advanced material, such as Fluent Python. If this doesn’t describe you, you should start with some beginner Python tutorials, such as Software Carpentry, before continuing with this book. But perhaps you don’t know whether the “SciPy stack” is a library or a menu item from the International House of Pancakes, and you aren’t sure about best practices. viii | Preface Perhaps you are a scientist who has read some Python tutorials online, and have downloaded some analysis scripts from another lab or a previous member of your own lab, and have fiddled with them. And you might think that you are more or less alone when you learn to code SciPy. You are not. As we progress, we will teach you how to use the internet as your reference. And we will point you to the mailing lists, repositories, and conferences where you will meet like-minded scientists who are a little further in their journey than you. This is a book that you will read once, but may return to for inspiration (and maybe to admire some elegant code snippets!). Why SciPy? The NumPy and SciPy libraries make up the core of the Scientific Python ecosystem. The SciPy software library implements a set of functions for processing scientific data, such as statistics, signal processing, image processing, and function optimiza‐ tion. SciPy is built on top of NumPy, the Python numerical array computation library. Building on NumPy and SciPy, an entire ecosystem of apps and libraries has grown dramatically over the past few years, spanning a broad spectrum of disciplines that includes astronomy, biology, meteorology and climate science, and materials science, among others. This growth shows no sign of abating. In 2014, Thomas Robitaille and Chris Beau‐ mont documented Python’s growing use in astronomy. Here’s what we found when we updated their plot in the second half of 2016: Preface | ix

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.