UUnniivveerrssiittyy ooff KKeennttuucckkyy UUKKnnoowwlleeddggee University of Kentucky Master's Theses Graduate School 2006 CCAACCHHEE OOPPTTIIMMIIZZAATTIIOONN AANNDD PPEERRFFOORRMMAANNCCEE EEVVAALLUUAATTIIOONN OOFF AA SSTTRRUUCCTTUURREEDD CCFFDD CCOODDEE -- GGHHOOSSTT Anand B. Palki University of Kentucky, [email protected] RRiigghhtt cclliicckk ttoo ooppeenn aa ffeeeeddbbaacckk ffoorrmm iinn aa nneeww ttaabb ttoo lleett uuss kknnooww hhooww tthhiiss ddooccuummeenntt bbeenneefifittss yyoouu.. RReeccoommmmeennddeedd CCiittaattiioonn Palki, Anand B., "CACHE OPTIMIZATION AND PERFORMANCE EVALUATION OF A STRUCTURED CFD CODE - GHOST" (2006). University of Kentucky Master's Theses. 363. https://uknowledge.uky.edu/gradschool_theses/363 This Thesis is brought to you for free and open access by the Graduate School at UKnowledge. It has been accepted for inclusion in University of Kentucky Master's Theses by an authorized administrator of UKnowledge. For more information, please contact [email protected]. ABSTRACT OF THESIS CACHE OPTIMIZATION AND PERFORMANCE EVALUATION OF A STRUCTURED CFD CODE - GHOST This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD code - GHOST, on modern commodity clusters. The basic philosophy of this work is to optimize the cache performance of the code by splitting up the grid into smaller blocks and carrying out the required calculations on these smaller blocks. This in turn leads to enhanced code performance on commodity clusters. Accordingly, this work presents a discussion along with a detailed description of two techniques: external and internal blocking, for data access optimization. These techniques have been tested on steady, unsteady, laminar, and turbulent test cases and the results are presented. The critical hardware parameters which influenced the code performance were identified. A detailed study investigating the effect of these parameters on the code performance was conducted and the results are presented. The modified version of the code was also ported to the current state-of-art architectures with successful results. KEYWORDS: Cache Optimization, External blocking, Internal blocking, Structured CFD Code Optimization, Commodity Clusters Anand .B. Palki 12/15/2006 Copyright © Anand B Palki, 2006 CACHE OPTIMIZATION AND PERFORMANCE EVALUATION OF A STRUCTURED CFD CODE - GHOST By Anand .B. Palki Dr. Raymond .P. LeBeau Director of Thesis Dr. L. S. Stephens Director of Graduate Studies 12/15/2005 RULES FOR THE USE OF THESIS Unpublished thesis submitted for the Master’s degree and deposited in the University of Kentucky Library are as a rule open for inspection, but are to be used only with due regard to the rights of the authors. Bibliographical references may be noted, but quotations or summaries of parts may be published only with the permission of the author, and with the usual scholarly acknowledgements. Extensive copying or publication of the thesis in whole or in part also requires the consent of the Dean of the Graduate School of the University of Kentucky. A library that borrows this thesis for use by its patrons is expected to secure the signature of each user. Name Date THESIS Anand B Palki The Graduate School University of Kentucky 2006 CACHE OPTIMIZATION AND PERFORMANCE EVALUATION OF A STRUCTURED CFD CODE - GHOST THESIS A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Mechanical Engineering in the College of Engineering at the University of Kentucky By Anand .B. Palki Lexington, Kentucky Director: Dr. Raymond .P. LeBeau Assistant Professor of Mechanical Engineering University of Kentucky Lexington, Kentucky 2006 To My Parents ACKNOWLEDGEMENTS The satisfaction and euphoria that accompany the successful completion of any task would not be complete without the mention of the people who made it possible, whose constant encouragement and guidance crowned the effort with success. I would like to express my sincere gratitude to my academic advisor Dr. Raymond LeBeau, for his continuous support and encouragement throughout my work. The progress of this work would not be achieved without his guidance and numerous ingenious suggestions. I would like to thank Dr. P.G. Huang for helping me understand how a CFD code works. I would also like to thank my defense committee members Dr. Jamey Jacob and Dr. M. Seigler for their valuable time for serving on my committee and evaluating my work. I am grateful to my student collaegues who made my stay at the University of Kentucky a memorable one. I am especially grateful to Abhishek T, Ajay Babu, Aditya C, Chaitanya Penugonda, Chetan Babu, Daniel R, Jacky Rhinehart, Karthik M, Narendra BK, Phanindra C, Radhika K, Sandeep B, Snehal P and Vijay N for their emotional support, entertainment, caring and most importantly for being the surrogate family over the past two years. Finally, I would like to thank my parents for their continuous encouragement and love without which none of this would have been possible. vii Table of Contents Acknowledgements............................................................................................................................................vii List of Tables.....................................................................................................................................................x List of Figures...................................................................................................................................................xi List of Files.....................................................................................................................................................xiii 1. Introduction......................................................................................................................................................1 1.1 Overview.................................................................................................................................................1 1.2 Background - Memory Hierarchy...........................................................................................................2 1.3 Introduction to Problem..........................................................................................................................6 1.4 Goals of Optimizing the Code................................................................................................................7 1.5 Previous Work........................................................................................................................................8 1.5.1 General Cache Optimization Techniques........................................................................................8 1.5.1.1 Techniques for Reducing Capacity Misses..............................................................................9 1.5.1.2 Techniques for Reducing Conflict Misses.............................................................................11 1.5.1.3 Techniques to Hide Effects of Cache Misses........................................................................12 1.5.1.4 Techniques to Improve the Replacement Decisions by Cache..............................................13 1.5.2 Optimizations to CFD Codes........................................................................................................13 1.5.2.1 Techniques to Improve Parallel Performance:.......................................................................13 1.5.2.2 Techniques to Improve Single Node Performance:...............................................................14 1.6 External & Internal Blocking................................................................................................................15 2. Computational tools.......................................................................................................................................17 2.1 General Description of GHOST............................................................................................................17 2.1.1 GHOST Flowchart........................................................................................................................18 2.1.2 Governing equations.....................................................................................................................19 2.1.3 Calculation at artificial boundaries................................................................................................21 2.2 Grid File Data.......................................................................................................................................22 2.2.1 Finite Volume Method..................................................................................................................22 2.2.2 Generalized coordinates................................................................................................................22 2.2.3 Description of G.F90 Output.........................................................................................................25 2.2.4 Description of Input File...............................................................................................................26 2.3 Compilers & MPI Environment............................................................................................................26 2.4 Valgrind [79].........................................................................................................................................28 2.5 Kentucky Fluid Clusters........................................................................................................................30 2.6 Method used to measure performance..................................................................................................31 2.7 Summary...............................................................................................................................................32 3. External Blocking Results..............................................................................................................................33 3.1 Terminology..........................................................................................................................................33 3.1.1 Terms related to Code Versions....................................................................................................33 3.1.2 Terms related to performance study test results description.........................................................34 3.1.3 Terms related to cache behavior study test results........................................................................34 3.2 Test Case...............................................................................................................................................35 3.3 Types of Tests.......................................................................................................................................35 3.4 External Blocking.................................................................................................................................37 3.5 Performance Test Results......................................................................................................................37 3.5.1 KFC4 Results................................................................................................................................37 3.5.2 KFC5 Results................................................................................................................................41 3.5.3 Rectangular Blocks.......................................................................................................................42 3.5.4 Effects of Compiler Optimization Levels on Performance...........................................................43 viii 3.5.5 Effects of Different Compilers on Performance............................................................................46 3.5.5 Effect of different hardware on performance................................................................................50 3.5.6 Steady Turbulent Case Performance Results................................................................................53 3.6 Valgrind Results....................................................................................................................................57 3.6.1 KFC4 Results................................................................................................................................57 3.6.2 Comparison between G95 & IFC..................................................................................................59 3.6.3 Effect of Cache Thrashing.............................................................................................................62 3.6.4 Valgrind Results for Turbulent Case.............................................................................................63 3.7 Accuracy Test Results...........................................................................................................................64 3.8 Summary...............................................................................................................................................67 4. Internal Blocking Results...............................................................................................................................68 4.1 Basic Principle......................................................................................................................................68 4.2 Implementation of Internal Blocking in GHOST..................................................................................68 4.3 Primary Tests........................................................................................................................................71 4.4 Performance Test Results......................................................................................................................72 4.4.1 Comparison of Performance Test Results between External and Internal Blocking.....................74 4.5 Valgrind Results....................................................................................................................................76 4.5.1 Comparison of Valgrind results between Internal and External Blocking....................................79 4.6 Accuracy Test Results...........................................................................................................................81 4.7 Summary...............................................................................................................................................84 5. Unsteady Test Case Results...........................................................................................................................85 5.1 Laminar Unsteady Test Case................................................................................................................85 5.2 Performance Test Results......................................................................................................................87 5.3 Accuracy Test Results...........................................................................................................................90 5.4 Summary...............................................................................................................................................93 6. Conclusions And Future Work.....................................................................................................................94 6.1 Summary and Conclusions....................................................................................................................94 6.2 Future Work..........................................................................................................................................98 Appendix...........................................................................................................................................................100 A.1 Steps to implement internal blocking to GHOST...............................................................................100 References.........................................................................................................................................................116 Vita................................................................................................................................................................122 ix
Description: