Springer Series in Advanced Microelectronics 55 B. Sharat Chandra Varma Kolin Paul M. Balakrishnan Architecture Exploration of FPGA Based Accelerators for BioInformatics Applications Springer Series in Advanced Microelectronics Volume 55 Series editors Kukjin Chun, Seoul, Korea, Republic of (South Korea) Kiyoo Itoh, Tokyo, Japan Thomas H. Lee, Stanford, CA, USA Rino Micheloni, Vimercate (MB), Italy Takayasu Sakurai, Tokyo, Japan Willy M.C. Sansen, Leuven, Belgium Doris Schmitt-Landsiedel, München, Germany TheSpringerSeriesinAdvancedMicroelectronicsprovidessystematicinformation on all the topics relevant for the design, processing, and manufacturing of microelectronic devices. The books, each prepared by leading researchers or engineers in their fields, cover the basic and advanced aspects of topics such as wafer processing, materials, device design, device technologies, circuit design, VLSI implementation, and subsystem technology. The series forms a bridge between physics and engineering and the volumes will appeal to practicing engineers as well as research scientists. More information about this series at http://www.springer.com/series/4076 B. Sharat Chandra Varma Kolin Paul M. Balakrishnan (cid:129) Architecture Exploration of FPGA Based Accelerators for BioInformatics Applications 123 B. SharatChandra Varma M.Balakrishnan Department ofElectrical andElectronic Department ofComputer Science Engineering andEngineering TheUniversity of HongKong Indian Institute of Technology Delhi Hong Kong NewDelhi, Delhi Hong Kong India KolinPaul Department ofComputer Science andEngineering Indian Institute of Technology Delhi NewDelhi, Delhi India ISSN 1437-0387 ISSN 2197-6643 (electronic) SpringerSeries inAdvancedMicroelectronics ISBN978-981-10-0589-3 ISBN978-981-10-0591-6 (eBook) DOI 10.1007/978-981-10-0591-6 LibraryofCongressControlNumber:2016931344 ©SpringerScience+BusinessMediaSingapore2016 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpart of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission orinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodologynowknownorhereafterdeveloped. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfrom therelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authorsortheeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinor foranyerrorsoromissionsthatmayhavebeenmade. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerScience+BusinessMediaSingaporePteLtd. To the Almighty To my family and friends Preface Rapid advances in VLSI technology have enabled fabrication of billions of tran- sistors on a single chip. Technology scaling has allowed more than one processor core to be integrated in a single chip. The current computing systems, including desktops, laptops, and mobile phones have many-core processor chips. Software applications are parallelized to execute on multiple processing units/cores concur- rently to reduce the overall execution time. Although sophisticated compute sys- tems have been developed for fast execution, software applications belonging to certain domains take a very long time (days to months) for producing results. Bioinformatics is one such domain in which applications take a long time to execute.Itisalsoamultidisciplinaryresearchfieldconsistingofcomputerscience, mathematics, and statistics, which deals with storing, analyzing, and interpreting largebiologicaldata.Therecentadvancementsinbiologicalresearchhasresultedin generation of large amounts of digital data. Most of the bioinformatics algorithms are both data intensive and compute intensive. Speedups can be obtained using specialized hardware along with the existing processor architectures. General-purposeprocessors areoften augmented with hardwareacceleratorsfor compute intensive application to reduce the computation time. Typical hardware accelerators consist of a large number of small execution units which facilitate parallelexecution.Veryoftentheyrepresenttransformationofloopsinasequential code(temporaliteration)tospatialunrollingoftheloop(spatialiteration)toreduce thecomputationtimethroughconcurrentexecution.Someofthepopularhardware accelerators are GPUs, FPGAs, and CELL processors. The accelerators are pro- grammableandhencecanbeusedforavarietyofrelatedapplications.FPGA-based accelerators are known to be effective in speeding up certain kinds of applications when compared to other accelerators. Since FPGAs are configurable, they can be customized to implement a variety of processing elements as accelerators. But speedup may not always be possible as FPGAs run at slower clock frequency vis-a-visprocessorsandtheresourcesavailableintheFPGAmightnotbesufficient fortheimplementationofasignificantnumberofcopiesoftheprocessingelements. FPGAs with heterogeneous mix of coarse grained hard blocks along with vii viii Preface programmablesoftlogic,canfacilitateimplementationofamuchlargernumberof processing elements and thus achieve higher speedups. FPGAs have evolved and the hardware blocks useful for many applications are being implemented within the FPGA fabric as Hard Embedded Blocks (HEBs). Introduction of HEBs in FPGA can significantly increase the performance of FPGA-basedaccelerators.ModernFPGAscontainspecializedembeddedunitslike memory units, array multipliers, DSP computation units, etc. In fact many FPGAs have embedded processor cores. Based on the application a matching FPGA with therightHEBsischosen.ForexampleinXilinxVirtex-4FPGAs,theSX-serieshas only LUTs, in the LX-series there are DSP units as HEBs and the FX-series have Power-PC embedded in them. The user can choose the best suited FPGA archi- tecture according to his/her needs. It is easy to predict that many more such hard blocks will be embedded into future FPGAs. The evaluation approach to identify and incorporate HEBs is complex as there are many parameters and constraints such as area, granularity routing resources, etc., which need to be considered in an integrated manner to get efficient imple- mentation. Incorporating HEBs in FPGA involves a clear tradeoff as this may occupyasignificantareaandhencemayreducetheconfigurablelogic.Further,they may not be usable for many applications. On the other hand, they may give very significant speedups for certain applications. Clearly, the challenge is to identify kernels that are useful for a class of applications and justify designing customized HEBs that are effective in significantly speeding up an important class of appli- cations. There is a need to develop a methodology to explore the FPGA fabric designspacetoevaluatethenatureandnumberofHEBsbasedonotherconstraints. This book presents an evaluation methodology to design future FPGA fabrics incorporating hard embedded blocks to accelerate applications. This methodology is useful for selection of blocks to be embedded into the fabric and for evaluating the performance gain that can be achieved by such an embedding. The use of the methodologyisillustratedbydesigningFPGA-basedaccelerationoftwoimportant bioinformatics applications: Protein docking and Genome assembly. This book explainshowtherespectiveHEBsaredesignedandhowhardwareimplementation of the application is done using these HEBs. The impact of use of HEBs on accelerating these two applications is shown. The methodology presented in this book may also be used for designing HEBs for accelerating software implemen- tations in other domains as well. We thank Prof. Dominique Lavenier for letting us collaborate with the SYMBIOSElab,IRISA,Rennes,France.Wearealsogratefultohimforproviding us guidance on this project. We thank Pierre Peterlongo for his valuable inputs on Mapsembler. Hong Kong B. Sharat Chandra Varma New Delhi, India Kolin Paul New Delhi, India M. Balakrishnan Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Protein Docking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.2 Genome Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 DSE of Application Acceleration Using FPGAs with Hard Embedded Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1 Challenges in Design Space Exploration. . . . . . . . . . . . . . 6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1 Accelerator Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Challenges in Designing Accelerators . . . . . . . . . . . . . . . 10 2.2 FPGA-Based Acceleration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.1 FPGA Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.2 FPGA-Based Accelerators . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.3 Hard Embedded Blocks in FPGAs. . . . . . . . . . . . . . . . . . 14 2.3 Bioinformatics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.1 Protein Docking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.2 Genome Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3 Methodology for Implementing Accelerators. . . . . . . . . . . . . . . . . . 29 3.1 Design Space Exploration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 Tools for Carrying Out Design Space Exploration . . . . . . . . . . . . 32 3.2.1 Profiling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3 Design of FPGAs with Hard Embedded Blocks. . . . . . . . . . . . . . 34 3.3.1 VPR Methodology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3.2 VEB Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 ix x Contents 3.4 Methodology for Designing FPGAs with Custom Hard Embedded Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.4.1 Performance Estimation of Application Acceleration . . . . . 37 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4 FPGA-Based Acceleration of Protein Docking. . . . . . . . . . . . . . . . . 39 4.1 FTDock Application. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.1.1 Shape Complementarity . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.1.2 Electrostatic Complementarity. . . . . . . . . . . . . . . . . . . . . 43 4.2 Profiling Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.2.1 Need to Speedup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.2.2 Earlier Attempts to Speedup. . . . . . . . . . . . . . . . . . . . . . 46 4.3 Choice of Single Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.4 FPGA Resource Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.5 Estimation Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5 FPGA-Based Acceleration of De Novo Genome Assembly . . . . . . . . 55 5.1 Application. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.1.1 Related Work with FPGA-Based Acceleration. . . . . . . . . . 57 5.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.2.1 Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.2.2 Algorithm to Architecture. . . . . . . . . . . . . . . . . . . . . . . . 62 5.3 Hardware Implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.3.1 FASTA File to Bit File Converter. . . . . . . . . . . . . . . . . . 67 5.3.2 Processing Element Design. . . . . . . . . . . . . . . . . . . . . . . 68 5.3.3 Prefilter Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.3.4 Extender Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.4 Results and Discussion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.4.1 Resource Utilization and Operating Frequency . . . . . . . . . 72 5.4.2 Speedups over Software. . . . . . . . . . . . . . . . . . . . . . . . . 74 5.4.3 Quality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6 Design of Accelerators with Hard Embedded Blocks . . . . . . . . . . . . 81 6.1 Acceleration of FTDock Using Hard Embedded Blocks . . . . . . . . 81 6.1.1 Related Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.1.2 FPGA Resource Mapping. . . . . . . . . . . . . . . . . . . . . . . . 82 6.1.3 Hard Embedded Block Design Space Exploration . . . . . . . 84 6.1.4 Results and Discussion. . . . . . . . . . . . . . . . . . . . . . . . . . 86
Description: