ebook img

GPU Accelerated Adversarial Search PDF

81 Pages·2011·0.9 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview GPU Accelerated Adversarial Search

Charles University in Prague Faculty of Mathematics and Physics MASTER THESIS Martin Brehovský GPU Accelerated Adversarial Search Department of Software and Computer Science Education Supervisor of the master thesis: Mgr. Branislav Bošanský Study programme: Informatics Specialization: Software Systems Prague, 2011 This thesis would not have been possible without the continuous support of my parents who always support me and give me the will to succeed. I would like to thank my supervisor Branislav Bošanský for his support and advice. Finally, I thank all my friends who so patiently put up with me while I worked on my thesis. I declare that I carried out this master thesis independently, and only with the cited sources, literature and other professional sources. I understand that my work relates to the rights and obligations under the Act No. 121/2000 Coll., the Copyright Act, as amended, in particular the fact that the Charles University in Prague has the right to conclude a license agreement on the use of this work as a school work pursuant to Section 60 paragraph 1 of the Copyright Act. In Prague, 14.04.2011 Název práce: Akcelerace adversariálních algoritmů s využití grafického procesoru Autor: Martin Brehovský Katedra / Ústav: Kabinet software a výuky informatiky Vedoucí diplomové práce: Mgr. Branislav Bošanský, Katedra kybernetiky, Fakulta elektrotechnická, České vysoké učení technické v Praze Abstrakt: Moderní programovatelné grafické čipy umožňují významným způsobem urychlit běh výpočetně náročných algoritmů. Tato technologie schopná masivní paralelizace výpočtů významně zvyšuje výkon velké skupiny algoritmů. Tato práce se zaměřuje na využití grafických procesorů (GPU) v akceleraci algoritmů na takzvané prohledávání herních stromů. Zkoumáme, zda jsou tyto algoritmy vhodné pro paralelizace typu SIMD(single instruction multiple data), jež GPU poskytuje. Proto byly paralelní verze vybraných algoritmů pro GPU srovnány s algoritmy běžícími na CPU. Získané výsledky ukazují výrazné zlepšení rychlosti a dokazují použitelnost GPU technologií v oblasti prohledávání herních stromů. Klíčová slova: Grafický procesor, Adversariální algotitmy, SIMD paralelizmus, prohledávání herního stromu Title: GPU Accelerated Adversarial Search Author: Martin Brehovský Department / Institute: Department of Software and Computer Science Education Supervisor of the master thesis: Mgr. Branislav Bošanský, Department of Cybernetics, Faculty of Electrical Engineering, Czech Technical University in Prague Abstract: General purpose graphical processing units were proven to be useful for accelerating computationally intensive algorithms. Their capability to perform massive parallel computing significantly improve performance of many algorithms. This thesis focuses on using graphical processors (GPUs) to accelerate algorithms based on adversarial search. We investigate whether or not the adversarial algorithms are suitable for single instruction multiple data (SIMD) type of parallelism, which GPU provides. Therefore, parallel versions of selected algorithms accelerated by GPU were implemented and compared with the algorithms running on CPU. Obtained results show significant speed improvement and proof the applicability of GPU technology in the domain of adversarial search algorithms. Keywords: GPU computing, adversarial search, SIMD parallelism, game tree search Table of Contents 1. Introduction 1 1.1 Graphical processing units in high performance computing 2 1.2 Researched domain 3 1.3 Outline 4 2. CPU architecture evolution, problems, limitations and comparison to GPU 6 2.1 Evolution of the CPU over the last 20 years 6 2.2 Changes in CPU architecture 7 2.3 Problems influencing performance of CPU based systems 12 2.4 Overview of GPU architecture 14 2.5 Summary 16 3. OpenCL 18 3.1 Platform model 18 3.2 Execution model 19 3.2.1 NDRange 20 3.2.2 Synchronization 20 3.2.3 OpenCL execution flow 21 3.3 Memory model 22 3.3.1 Host to device memory transfer 22 3.4 OpenCL C99 structure 23 3.5 Recommendations and optimizations 25 3.6 Alternatives 26 4. Adversarial search 28 4.1. Multi-agent environments 28 4.2. Games 29 4.3 MiniMax algorithm 30 4.3.1 Optimizations and heuristics 31 4.4. Approaches to solve well known games 33 4.4.1 Ultra-weakly, weekly and strongly solved games 33 4.4.2 Tree search techniques 34 4.4.3 Solved games 34 4.4. Parallel approaches on CPU based systems 36 4.4.1 Static parallel evaluation 36 4.4.2 Tree decomposition 36 4.5 Summary 39 5. GPU accelerated adversarial search algorithms 39 5.1 Testbed description 40 5.2 Sequential algorithms 40 5.1.1 Sequential algorithm setup 42 5.1.2 Evaluation function complexity 43 5.1.3 Rules of tested game Fox and Hounds 44 5.2 Applying GPU architecture 47 5.2.1 Investigated attributes 47 5.2.2 Analysis and design of benchmark scenarios 48 5.3 Analysis of empirical results 51 5.3.1 Parallel adversarial search on synthetic trees 52 5.3.2 Benchmarks for tested game Fox and Hounds 58 5.4 Summary 62 Conclusion 65 Bibliography 67 List of Tables 71 Attachments 73 1. Introduction An increasing demand for ever-greater processing power has been one of the greatest challenges facing modern-day science and industry in recent years. Computers are used in computational science, allowing scientists to perform a variety of experiments. Currently, these scientists are bound firstly by the processing performance of the hardware available to them; and consequently by budgetary constraints that may inhibit the desired evolution of the hardware model. The computational challenges cannot be faced so easily with traditional laboratory approaches. The overall result of an experiment may be beyond the scope of a conventional setup due to the complexity of the evaluated elements and executed computations. Detailed precision of measurements are difficult to obtain in common laboratory conditions and hence, computers themselves are effectively becoming the new laboratories. Simulations optimized and broken down into a huge amount of simpler computations put an enormous load on computer systems. They can fully utilize what traditional central processing unit (CPU) based computers have to offer, but CPUs organized in clusters are the only alternative for significant performance gains. Investments to construct new supercomputers are enormous and therefore access to high performance is limited for common research. As described in [1] a notable example is represented by microscopic models of the structure of surfaces at the nano-scale, which cannot yet be characterized experimentally using available imaging techniques. Conditions that cannot be created in laboratories are also simulated over different time scales. A pretense setup can monitor, for example, each nanosecond of an experiment or the evolution over thousands of years in soil water to climate change. In many fields, computer simulations are essential and cannot be replaced by experiments. In multiple physical, medical and chemistry disciplines highly complex programs are at the cutting edge of new discoveries. The most difficult boundary they struggle against is the performance limitation of the available computer 1 systems. The evolution of performance tuning has reached a new era where graphical processing units (GPU), previously used only to process graphics, are now widely used to support such problems in a parallel manner. Developments in the GPU world now allow easier access to computational resources for a broader range of users by creating a more user-friendly interface. 1.1 Graphical processing units in high performance computing The performance of the central processing unit, which increases by Moore's law, estimates the upper boundary for high performance computing (HPC). Computations in science use more sophisticated algorithms and models, so the demand for better performance from these HPC systems easily outruns Moore's Law. The previous generation of scalable supercomputers relied on vendor-specific systems using high-performance, proprietary microprocessors, proprietary interconnects and vendor system software. This customized hardware is usually very expensive. One of the promising ways to increase processing power is to increase the granularity of parallelism in applications. Utilising simultaneous multi-threading can exploit thread-level parallelism. As is discussed in Chapter 2, the CPU itself is not designed to fully utilize the problems that HPC struggles with and so new technologies must be evaluated. As mentioned, up until a few years ago, graphical processing units were employed for graphics only. Programming on them was very difficult and data had to be mapped into textures. Only highly experienced programmers were able to utilize their potential power. In recent years, attempts to unify access to GPU led to proprietary interfaces to access GPUs from the manufacturer NVIDIA. Cuda is the first example that provides high level access for the common programmer. This interface revealed its performance for massive, general purpose, parallel computing. OpenCL is an open standard for writing programs that execute across heterogeneous platforms including CPU and GPU. Suitable cases for GPU are computationally intensive tasks, that require high floating point performance on multiple independent data sets. Data parallelism supported by GPU has its limitations and further 2

Description:
Department of Software and Computer Science Education Cybernetics, Faculty of Electrical Engineering, Czech Technical University in .. Page 15 8MB. 12MB. 24MB. Table 1: Overview of current multicore CPUs by Intel Ultra-weak – For a given starting position, the result of perfect play is.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.