ebook img

OpenCL framework for a CPU, GPU, and FPGA Platform by Taneem Ahmed A thesis submitted in ... PDF

85 Pages·2011·0.69 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview OpenCL framework for a CPU, GPU, and FPGA Platform by Taneem Ahmed A thesis submitted in ...

OpenCL framework for a CPU, GPU, and FPGA Platform by Taneem Ahmed A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto Copyright (cid:13)c 2011 by Taneem Ahmed Abstract OpenCL framework for a CPU, GPU, and FPGA Platform Taneem Ahmed Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2011 Withtheavailabilityofmulti-coreprocessors, highcapacityFPGAs, andGPUs, ahetero- geneous platform with tremendous raw computing capacity can be constructed consisting of any number of these computing elements. However, one of the major challenges for constructing such a platform is the lack of a standardized framework under which an ap- plication’s computational task and data can be easily and effectively managed amongst the computing elements. In this thesis work such a framework is developed based on OpenCL (Open Computing Language). An OpenCL API and run time framework, called O4F, was implemented to incorporate FPGAs in a platform with CPUs and GPUs un- der the OpenCL framework. O4F help explore the possibility of using OpenCL as the framework to incorporate FPGAs with CPUs and GPUs. This thesis details the findings of this first-generation implementation and provides recommendations for future work. ii Dedication To Mohsin - for all the inspiration iii Acknowledgements I would like to acknowledge all the support and guidance provided by my supervisor Prof. Paul Chow. His direction and feedback on this thesis has been invaluable. I also thank all the students in the program for their help, feedback, and friendship. Special thanks to my wife, my mother, and rest of the family for all their support and patience. I greatly appreciate all the encouragement and guidance from Dr. Jason Anderson and Dr. Qiang Wang - the two great ‘friend, philosopher and guide’s I have been blessed with. iv Contents 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 OpenCL Overview 5 2.1 OpenCL Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 Platform Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.2 Execution Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.3 Memory Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.4 Programming Model . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 OpenCL Application Flow . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.1 Platform Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.2 Runtime Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3 Related Work 15 4 Heterogeneous Platforms Under the OpenCL Framework 18 4.1 ICD Loader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 Application flow under ICD Loader . . . . . . . . . . . . . . . . . . . . . 19 4.3 Challenges of using OpenCL for Heterogeneous Platforms . . . . . . . . . 20 v 4.3.1 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3.2 Data Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3.3 Cluster Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5 OpenCL For FPGA 22 5.1 Application Flow using FPGAs . . . . . . . . . . . . . . . . . . . . . . . 23 5.1.1 OpenCL Code Compilation . . . . . . . . . . . . . . . . . . . . . 23 5.2 Flow Used in this Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.4 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.4.1 OpenCL API Library . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.4.2 Device Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.5 Architecture for FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.5.1 Static Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.5.2 Kernel Organization . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.5.3 Kernel Information . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.6 Benefits of FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.6.1 Task Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.6.2 Data lifetime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.6.3 Resource Utilization . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.7 Challenges of FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.7.1 FPGA Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.7.2 FPGA Resource Estimation . . . . . . . . . . . . . . . . . . . . . 36 6 Example Application 37 6.1 Potential Application Types . . . . . . . . . . . . . . . . . . . . . . . . . 37 6.1.1 Iterative Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 6.1.2 Task Parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 vi 6.1.3 Other Considerations . . . . . . . . . . . . . . . . . . . . . . . . . 40 6.2 Example: Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . 41 6.2.1 Reason for using Monte Carlo simulation . . . . . . . . . . . . . . 41 6.2.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 6.2.3 Application Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 6.3.1 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 6.3.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 7 Summary 50 7.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 A Implemented OpenCL API List 52 B Monte Carlo Kernel Execution 58 C Sobol Sequence Implementation 71 Bibliography 73 vii List of Tables 2.1 OpenCL Memory Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.1 BAR1 Offsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 viii List of Figures 1.1 OpenCL Framework Implementation . . . . . . . . . . . . . . . . . . . . 2 2.1 OpenCL Platform Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 An example 3D indexed kernel space . . . . . . . . . . . . . . . . . . . . 8 2.3 OpenCL Application Flow . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.1 Multiple OpenCL Implementations Under ICD Loader . . . . . . . . . . 19 4.2 Possible OpenCL Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.1 FPGA Design Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.2 Actual Flow used in this work . . . . . . . . . . . . . . . . . . . . . . . . 25 5.3 UML Class Diagram of the API Library . . . . . . . . . . . . . . . . . . 26 5.4 Kernel Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.5 One Kernel Group with two Kernels . . . . . . . . . . . . . . . . . . . . . 32 5.6 Two Kernel Groups with one Kernel each . . . . . . . . . . . . . . . . . . 33 6.1 Monte Carlo Simulation Flowchart . . . . . . . . . . . . . . . . . . . . . 38 6.2 Components in Community Climate System Model . . . . . . . . . . . . 39 6.3 Possible application of the platform . . . . . . . . . . . . . . . . . . . . . 40 6.4 Distribution of the Monte Carlo simulation tasks across three different architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 ix Chapter 1 Introduction The availability of multi-core CPUs, high capacity FPGAs and GPUs makes possible a heterogeneous platform with enormous computational capacity. Previous research [4, 5, 8] has shown that each type of processor technology is ideally suited to imple- ment specific types of functions. Thus an application with multiple compute intensive segments would benefit from a heterogeneous platform consisting of different processor technologies. However, mass adaptation of such platforms remains elusive due to the challenging task of programming for such heterogeneous platforms. In the remainder of this Chapter, Section 1.1 details the motivation of this research, Section 1.2 summarizes the contributions, and Section 1.3 provides the outline of this thesis. 1.1 Motivation CPUs, GPUs, and FPGAs all have their own programming models that are very differ- ent from each other. Moving to a heterogeneous platform makes it even more difficult to present a unified programming model that works for all architectures. All of the existing heterogeneous platforms define their own programming paradigm and application devel- opment process. There is always a learning curve for the application developers to even 1

Description:
4 Heterogeneous Platforms Under the OpenCL Framework. 18. 4.1 ICD The OpenCL implementation packaged with the AMD-APP-SDK-v2.4-lnx64 is used.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.