RRoocchheesstteerr IInnssttiittuuttee ooff TTeecchhnnoollooggyy RRIITT SScchhoollaarr WWoorrkkss Theses 7-2016 PPaarrttiiaall AAbboorrttss ffoorr TTrraannssaaccttiioonnss vviiaa FFiirrsstt CCllaassss CCoonnttiinnuuaattiioonnss Matthew Le [email protected] Follow this and additional works at: https://scholarworks.rit.edu/theses RReeccoommmmeennddeedd CCiittaattiioonn Le, Matthew, "Partial Aborts for Transactions via First Class Continuations" (2016). Thesis. Rochester Institute of Technology. Accessed from This Thesis is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected]. RochesterInstituteofTechnology PARTIALABORTSFORTRANSACTIONSVIAFIRSTCLASSCONTINUATIONS ATHESISSUBMITTEDTO THEFACULTYOFTHEGOLISANOCOLLEGEOFCOMPUTERANDINFORMATION SCIENCES INCANDIDACYFORTHEDEGREEOF MASTEROFSCIENCEINCOMPUTERSCIENCE DEPARTMENTOFCOMPUTERSCIENCE BY MATTHEWLE ROCHESTER,NEWYORK JULY2016 Partial Aborts for Transactions via First Class Continuations APPROVEDBY SUPERVISINGCOMMITTEE: Dr. MatthewFluet,Supervisor Dr. ArthurNunes-Harwitt,Reader Dr. ZackButler,Observer Table of Contents Acknowledgments vii Abstract viii CHAPTER 1 Introduction 1 2 BaselineSTMs 4 2.1 TL2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 TinySTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 NOrec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Semantics 9 3.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 PartialAbortOperationalSemantics . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 LogValidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4 Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4 Manticore 25 4.1 CompilerArchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.1.1 BOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.1.2 CPS,CFG,andHeap-AllocatedContinuations . . . . . . . . . . . . . . . . 27 iii 4.2 GarbageCollectionandHeapArchitecture . . . . . . . . . . . . . . . . . . . . . . 27 5 Implementation 29 5.1 Reads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.2 Writes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.3 Commit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.3.1 NOrecCommit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.3.2 TL2Commit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.3.3 TinySTMCommit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.4 ReadSetValidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.5 RelationtoFormalModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.6 GarbageCollection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.7 BoundingContinuations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.8 ChronologicallyOrderedReadSets . . . . . . . . . . . . . . . . . . . . . . . . . . 38 6 Evaluation 40 6.1 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 6.2 BenchmarkResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6.3 Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 7 RelatedWork 47 8 Conclusion 50 References 52 APPENDIX A TL2andNOrecLogValidationRules 55 iv List of Figures 2.1 TL2Pseudocodeforputandget . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 TL2Pseudocodeforcommit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 TinySTMPseudocodeforputandget . . . . . . . . . . . . . . . . . . . . . . . 7 2.4 NOrecPseudocodeforputandget . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.5 NOreccommit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 OperationalSemantics( : commonrules) . . . . . . . . . . . . . . . . . 11 mode,stm ! 3.3 ReadRules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.4 WriteRules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.5 PartialandFullAbortSemantics . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.6 TinySTMTransactionalLogValidation . . . . . . . . . . . . . . . . . . . . . . . . 16 3.7 ThreadPoolWellFormedJudgement . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.8 ReplaySemantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.9 ThreadPoolAheadOfJudgement . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.1 ManticoreHeapArchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.1 LayoutofRead/Writesets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.2 TL2LinkedListStats(FullAbortvs. PartialAbort) . . . . . . . . . . . . . . . . . 35 5.3 SkipListRepresentationofReadSet . . . . . . . . . . . . . . . . . . . . . . . . . 37 6.1 BenchmarkResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 v 6.2 RedBlackTreePartialAbortPositions . . . . . . . . . . . . . . . . . . . . . . . . 44 6.3 HalfandHalfLinkedListBenchmark . . . . . . . . . . . . . . . . . . . . . . . . 45 7.1 CommonNestedTransactionalIdioms . . . . . . . . . . . . . . . . . . . . . . . . 48 A.1 NOrecTransactionalLogValidation . . . . . . . . . . . . . . . . . . . . . . . . . 55 A.2 TL2TransactionalLogValidation . . . . . . . . . . . . . . . . . . . . . . . . . . 56 vi ACKNOWLEDGMENTS I would like to express an enormous amount of gratitude towards my advisor, Dr. Matthew Fluet. If it was not for him, the work presented in this thesis would unquestionably not have come to fruition. I would also like to thank Dr. Arthur Nunes-Harwitt for serving as the reader for this thesisandDr. ZackButlerforservingastheobserver. Iamalsoeternallygratefulfortheloveand supportprovidedbyfamily,withoutthemIwouldmostcertainlynotbewhereIamtoday. vii ABSTRACT Softwaretransactionalmemory(STM)hasproventobeausefulabstractionfordevelopingconcur- rent applications where programmers denote transactions with an atomic construct that delimits acollectionofreadsandwritestosharedmutablereferences. Theruntimesystemthenguarantees that all transactions are observed to execute atomically with respect to each other. Traditionally, when the runtime system detects that one transaction conflicts with another, it aborts one of the transactions and restarts its execution from the beginning. This can lead to problems with both executiontimeandthroughput. This thesis presents a novel approach that uses first-class continuations to restart a conflicting transaction at the point of a conflict, avoiding the re-execution of any work from the beginning of the transaction that has not been compromised. In practice, this allows transactions to com- plete more quickly, decreasing execution time and increasing throughput. The ideas presented in this thesis have been implemented in the context of the Manticore project, an ML-family lan- guage with support for parallelism and concurrency. Crucially, this work relies on constant-time continuation capturing via a continuation-passing-style (CPS) transformation and heap-allocated continuations. The partial abort scheme has been implemented as a part of three modern STM implementations: TL2,TinySTM,andNOrec. Experimentalresultsshowthat,whilenobaseSTM implementation is universally best, each partial-abort implementation compares favorably to its full-abortcounterpart. In addition to an implementation, this thesis presents a formal semantics for partial aborts. A proof of correctness is given by relating the partial-abort semantics to an analogous full-abort semanticsviaasimulation. AllproofshavebeenformallyverifiedusingtheCoqTheoremProver. viii CHAPTER 1 INTRODUCTION Software transactional memory (STM) [40, 25] allows programmers to mark sections of code as transactional using an atomic language construct (or using suitable library support). The run- time system then guarantees that modifications of shared references within transactions happen atomically with respect to other concurrently running transactions. Using STM instead of other synchronization methods such as mutex locks substantially simplifies the development of concur- rentapplications,avoidingcommonpitfallssuchasdeadlocks. There are many different ways to enforce atomicity for STM. This work builds on multiple algorithms drawn from the STM literature. The general idea shared by all implementations is that when executing a transaction, the runtime system transparently maintains thread-local logs recording which references were read from and written to within a transaction. At the end of the transaction, the thread validates its log and if no conflicts are detected, it commits all of its writes totheglobalstore. Ifaconflictisdetected,thenitthrowsawaythelogsandrestartsthetransaction fromthebeginning. One issue that is under active research is that of fairness. Consider a situation where there are some threads executing long transactions and other threads that are executing short transactions that conflictwith thelong transactions. The threadsexecuting theshort transactions willcomplete sooner,givingthemahigherprobabilityofsuccessfullyvalidatingandcommitting. Thesecommits will then invalidate the long running transactions causing them to frequently abort. This issue has beenaddressedinthepastbyusingcontentionmanagers[42],butnotwithoutimposingsignificant overheads. In many compilers for functional languages, it is common to perform a continuation-passing- style (CPS) transformation to enable further optimizations. Additionally, it has been shown that continuations can be used to elegantly express concurrent programming [43, 41, 35] and serves as afundamentalcomponentoftheManticoreschedulinginfrastructure[19]. Thisworkmakesuseof 1
Description: