Parallel and Concurrent Programming in Haskell Simon Marlow Parallel and Concurrent Programming in Haskell by Simon Marlow Copyright © 2013 Simon Marlow. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/ institutional sales department: 800-998-9938 or [email protected]. Editors: Andy Oram and Maria Gulick Indexer: WordCo Indexing Services Production Editor: Melanie Yarbrough Cover Designer: Randy Comer Copyeditor: Gillian McGarvey Interior Designer: David Futato Proofreader: Julie Van Keuren Illustrator: Rebecca Demarest July 2013: First Edition Revision History for the First Edition: 2013-07-10: First release See http://oreilly.com/catalog/errata.csp?isbn=9781449335946 for release details. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Parallel and Concurrent Programming in Haskell, the image of a scrawled butterflyfish, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trade‐ mark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. ISBN: 978-1-449-33594-6 [LSI] Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Terminology: Parallelism and Concurrency 2 Tools and Resources 3 Sample Code 4 Part I. Parallel Haskell 2. Basic Parallelism: The Eval Monad. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Lazy Evaluation and Weak Head Normal Form 9 The Eval Monad, rpar, and rseq 15 Example: Parallelizing a Sudoku Solver 19 Deepseq 29 3. Evaluation Strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Parameterized Strategies 32 A Strategy for Evaluating a List in Parallel 34 Example: The K-Means Problem 35 Parallelizing K-Means 40 Performance and Analysis 42 Visualizing Spark Activity 46 Granularity 47 GC’d Sparks and Speculative Parallelism 48 Parallelizing Lazy Streams with parBuffer 51 Chunking Strategies 54 The Identity Property 55 4. Dataflow Parallelism: The Par Monad. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 iii Example: Shortest Paths in a Graph 61 Pipeline Parallelism 65 Rate-Limiting the Producer 68 Limitations of Pipeline Parallelism 69 Example: A Conference Timetable 70 Adding Parallelism 74 Example: A Parallel Type Inferencer 77 Using Different Schedulers 82 The Par Monad Compared to Strategies 82 5. Data Parallel Programming with Repa. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Arrays, Shapes, and Indices 86 Operations on Arrays 88 Example: Computing Shortest Paths 90 Parallelizing the Program 93 Folding and Shape-Polymorphism 95 Example: Image Rotation 97 Summary 101 6. GPU Programming with Accelerate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Overview 104 Arrays and Indices 105 Running a Simple Accelerate Computation 106 Scalar Arrays 108 Indexing Arrays 108 Creating Arrays Inside Acc 109 Zipping Two Arrays 111 Constants 111 Example: Shortest Paths 112 Running on the GPU 115 Debugging the CUDA Backend 116 Example: A Mandelbrot Set Generator 116 Part II. Concurrent Haskell 7. Basic Concurrency: Threads and MVars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 A Simple Example: Reminders 126 Communication: MVars 128 MVar as a Simple Channel: A Logging Service 130 MVar as a Container for Shared State 133 MVar as a Building Block: Unbounded Channels 135 iv | Table of Contents Fairness 140 8. Overlapping Input/Output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Exceptions in Haskell 146 Error Handling with Async 151 Merging 152 9. Cancellation and Timeouts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Asynchronous Exceptions 156 Masking Asynchronous Exceptions 158 The bracket Operation 162 Asynchronous Exception Safety for Channels 162 Timeouts 164 Catching Asynchronous Exceptions 166 mask and forkIO 168 Asynchronous Exceptions: Discussion 170 10. Software Transactional Memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Running Example: Managing Windows 173 Blocking 177 Blocking Until Something Changes 179 Merging with STM 181 Async Revisited 182 Implementing Channels with STM 184 More Operations Are Possible 185 Composition of Blocking Operations 185 Asynchronous Exception Safety 186 An Alternative Channel Implementation 187 Bounded Channels 189 What Can We Not Do with STM? 191 Performance 193 Summary 195 11. Higher-Level Concurrency Abstractions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Avoiding Thread Leakage 197 Symmetric Concurrency Combinators 199 Timeouts Using race 201 Adding a Functor Instance 202 Summary: The Async API 203 12. Concurrent Network Servers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 A Trivial Server 205 Table of Contents | v Extending the Simple Server with State 209 Design One: One Giant Lock 209 Design Two: One Chan Per Server Thread 210 Design Three: Use a Broadcast Chan 211 Design Four: Use STM 212 The Implementation 213 A Chat Server 216 Architecture 217 Client Data 217 Server Data 218 The Server 219 Setting Up a New Client 219 Running the Client 222 Recap 223 13. Parallel Programming Using Threads. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 How to Achieve Parallelism with Concurrency 225 Example: Searching for Files 226 Sequential Version 226 Parallel Version 228 Performance and Scaling 230 Limiting the Number of Threads with a Semaphore 231 The ParIO monad 237 14. Distributed Programming. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 The Distributed-Process Family of Packages 242 Distributed Concurrency or Parallelism? 244 A First Example: Pings 244 Processes and the Process Monad 245 Defining a Message Type 245 The Ping Server Process 246 The Master Process 248 The main Function 249 Summing Up the Ping Example 250 Multi-Node Ping 251 Running with Multiple Nodes on One Machine 252 Running on Multiple Machines 253 Typed Channels 254 Merging Channels 257 Handling Failure 258 The Philosophy of Distributed Failure 261 A Distributed Chat Server 262 vi | Table of Contents Data Types 263 Sending Messages 265 Broadcasting 265 Distribution 266 Testing the Server 269 Failure and Adding/Removing Nodes 269 Exercise: A Distributed Key-Value Store 271 15. Debugging, Tuning, and Interfacing with Foreign Code. . . . . . . . . . . . . . . . . . . . . . . . . 275 Debugging Concurrent Programs 275 Inspecting the Status of a Thread 275 Event Logging and ThreadScope 276 Detecting Deadlock 278 Tuning Concurrent (and Parallel) Programs 280 Thread Creation and MVar Operations 281 Shared Concurrent Data Structures 283 RTS Options to Tweak 284 Concurrency and the Foreign Function Interface 286 Threads and Foreign Out-Calls 286 Asynchronous Exceptions and Foreign Calls 288 Threads and Foreign In-Calls 289 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Table of Contents | vii