MULTI-CORE EMBEDDED SYSTEMS Embedded Multi-Core Systems Series Editors Fayez Gebali and Haytham El Miligi University of Victoria Victoria, British Columbia Multi-Core Embedded Systems, Georgios Kornaros MULTI-CORE EMBEDDED SYSTEMS Edited by Georgios Kornaros Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Group, an informa business MATLAB® and Simulink® are trademarks of The MathWorks, Inc. and are used with permission. The Math- Works does not warrant the accuracy of the text of exercises in this book. This book’s use or discussion of MATLAB® and Simulink® software or related products does not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particular use of the MATLAB® and Simulink® software. CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2010 by Taylor and Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed in the United States of America on acid-free paper 10 9 8 7 6 5 4 3 2 1 International Standard Book Number: 978-1-4398-1161-0 (Hardback) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit- ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright. com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data Multi-core embedded systems / editor, Georgios Kornaros. p. cm. -- (Embedded multi-core systems) “A CRC title.” Includes bibliographical references and index. ISBN 978-1-4398-1161-0 (hard back : alk. paper) 1. Embedded computer systems. 2. Multiprocessors. 3. Parallel processing (Electronic computers) I. Kornaros, Georgios. II. Title. III. Series. TK7895.E42M848 2010 004.16--dc22 2009051515 Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com Contents List of Figures xiii List of Tables xxi Foreword xxiii Preface xxv 1 Multi-Core Architectures for Embedded Systems 1 C.P. Ravikumar 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 What Makes Multiprocessor Solutions Attractive? . . 3 1.2 Architectural Considerations . . . . . . . . . . . . . . . . . . 9 1.3 Interconnection Networks . . . . . . . . . . . . . . . . . . . . 11 1.4 Software Optimizations . . . . . . . . . . . . . . . . . . . . . 13 1.5 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.5.1 HiBRID-SoC for Multimedia Signal Processing . . . . 14 1.5.2 VIPER Multiprocessor SoC . . . . . . . . . . . . . . . 16 1.5.3 Defect-Tolerant and Reconfigurable MPSoC . . . . . . 17 1.5.4 Homogeneous Multiprocessor for Embedded Printer Application . . . . . . . . . . . . . . . . . . . . . . . . 18 1.5.5 General Purpose Multiprocessor DSP . . . . . . . . . 20 1.5.6 Multiprocessor DSP for Mobile Applications. . . . . . 21 1.5.7 Multi-Core DSP Platforms . . . . . . . . . . . . . . . 23 1.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2 Application-Specific Customizable Embedded Systems 31 Georgios Kornaros 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.2 Challenges and Opportunities . . . . . . . . . . . . . . . . . 34 2.2.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.3 Categorization . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.3.1 Customized Application-Specific Processor Techniques 37 v vi Table of Contents 2.3.2 Customized Application-Specific On-Chip Interconnect Techniques . . . . . . . . . . . . . . . . . . . . . . . . 40 2.4 Configurable Processors and Instruction Set Synthesis . . . . 41 2.4.1 Design Methodology for Processor Customization . . . 43 2.4.2 Instruction Set Extension Techniques . . . . . . . . . 44 2.4.3 Application-Specific Memory-Aware Customization . . 48 2.4.4 Customizing On-Chip Communication Interconnect . 48 2.4.5 Customization of MPSoCs . . . . . . . . . . . . . . . . 49 2.5 Reconfigurable Instruction Set Processors . . . . . . . . . . . 52 2.5.1 Warp Processing . . . . . . . . . . . . . . . . . . . . . 53 2.6 Hardware/Software Codesign . . . . . . . . . . . . . . . . . . 54 2.7 Hardware Architecture Description Languages . . . . . . . . 55 2.7.1 LISATek Design Platform . . . . . . . . . . . . . . . . 57 2.8 Myths and Realities . . . . . . . . . . . . . . . . . . . . . . . 58 2.9 Case Study: Realizing Customizable Multi-Core Designs . . . 60 2.10 The Future: System Design with Customizable Architectures, Software, and Tools . . . . . . . . . . . . . . . . . . . . . . . 62 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3 Power Optimization in Multi-Core System-on-Chip 71 Massimo Conti, Simone Orcioni, Giovanni Vece and Stefano Gigli 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.2 Low Power Design . . . . . . . . . . . . . . . . . . . . . . . . 74 3.2.1 Power Models . . . . . . . . . . . . . . . . . . . . . . . 75 3.2.2 Power Analysis Tools . . . . . . . . . . . . . . . . . . 80 3.3 PKtool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.3.1 Basic Features . . . . . . . . . . . . . . . . . . . . . . 82 3.3.2 Power Models . . . . . . . . . . . . . . . . . . . . . . . 83 3.3.3 Augmented Signals . . . . . . . . . . . . . . . . . . . . 84 3.3.4 Power States . . . . . . . . . . . . . . . . . . . . . . . 85 3.3.5 Application Examples . . . . . . . . . . . . . . . . . . 86 3.4 On-Chip Communication Architectures . . . . . . . . . . . . 87 3.5 NOCEXplore . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 3.5.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 91 3.6 DPM and DVS in Multi-Core Systems . . . . . . . . . . . . . 95 3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4 Routing Algorithms for Irregular Mesh-Based Network-on- Chip 111 Shu-Yen Lin and An-Yeu (Andy) Wu 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.2 An Overview of Irregular Mesh Topology . . . . . . . . . . . 113 Table of Contents vii 4.2.1 2D Mesh Topology . . . . . . . . . . . . . . . . . . . . 113 4.2.2 Irregular Mesh Topology . . . . . . . . . . . . . . . . . 113 4.3 Fault-Tolerant Routing Algorithms for 2D Meshes . . . . . . 115 4.3.1 Fault-Tolerant Routing Using Virtual Channels . . . . 116 4.3.2 Fault-Tolerant Routing with Turn Model . . . . . . . 117 4.4 Routing Algorithms for Irregular Mesh Topology . . . . . . . 126 4.4.1 Traffic-Balanced OAPR Routing Algorithm . . . . . . 127 4.4.2 Application-Specific Routing Algorithm . . . . . . . . 132 4.5 Placement for Irregular Mesh Topology . . . . . . . . . . . . 136 4.5.1 OIP Placements Based on Chen and Chiu’s Algorithm 137 4.5.2 OIP Placements Based on OAPR . . . . . . . . . . . . 140 4.6 Hardware Efficient Routing Algorithms . . . . . . . . . . . . 143 4.6.1 Turns-Table Routing (TT) . . . . . . . . . . . . . . . 146 4.6.2 XY-Deviation Table Routing (XYDT) . . . . . . . . . 147 4.6.3 Source Routing for Deviation Points (SRDP) . . . . . 147 4.6.4 Degree Priority Routing Algorithm . . . . . . . . . . . 148 4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 5 Debugging Multi-Core Systems-on-Chip 155 Bart Vermeulen and Kees Goossens 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 5.2 Why Debugging Is Difficult . . . . . . . . . . . . . . . . . . . 158 5.2.1 Limited Internal Observability . . . . . . . . . . . . . 158 5.2.2 Asynchronicity and Consistent Global States . . . . . 159 5.2.3 Non-Determinism and Multiple Traces . . . . . . . . . 161 5.3 Debugging an SoC . . . . . . . . . . . . . . . . . . . . . . . . 163 5.3.1 Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 5.3.2 Example Erroneous System . . . . . . . . . . . . . . . 165 5.3.3 Debug Process . . . . . . . . . . . . . . . . . . . . . . 166 5.4 Debug Methods . . . . . . . . . . . . . . . . . . . . . . . . . 169 5.4.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . 169 5.4.2 Comparing Existing Debug Methods . . . . . . . . . . 171 5.5 CSAR Debug Approach . . . . . . . . . . . . . . . . . . . . . 174 5.5.1 Communication-Centric Debug . . . . . . . . . . . . . 175 5.5.2 Scan-Based Debug . . . . . . . . . . . . . . . . . . . . 175 5.5.3 Run/Stop-Based Debug . . . . . . . . . . . . . . . . . 176 5.5.4 Abstraction-Based Debug . . . . . . . . . . . . . . . . 176 5.6 On-Chip Debug Infrastructure . . . . . . . . . . . . . . . . . 178 5.6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . 178 5.6.2 Monitors . . . . . . . . . . . . . . . . . . . . . . . . . 178 5.6.3 Computation-Specific Instrument . . . . . . . . . . . . 180 5.6.4 Protocol-Specific Instrument . . . . . . . . . . . . . . 181 5.6.5 Event Distribution Interconnect. . . . . . . . . . . . . 182 viii Table of Contents 5.6.6 Debug Control Interconnect . . . . . . . . . . . . . . . 183 5.6.7 Debug Data Interconnect . . . . . . . . . . . . . . . . 183 5.7 Off-Chip Debug Infrastructure . . . . . . . . . . . . . . . . . 184 5.7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . 184 5.7.2 Abstractions Used by Debugger Software . . . . . . . 184 5.8 Debug Example . . . . . . . . . . . . . . . . . . . . . . . . . 190 5.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 6 System-Level Tools for NoC-Based Multi-Core Design 201 Luciano Bononi, Nicola Concer, and Miltos Grammatikakis 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 6.1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . 204 6.2 Synthetic Traffic Models . . . . . . . . . . . . . . . . . . . . 206 6.3 Graph Theoretical Analysis . . . . . . . . . . . . . . . . . . . 207 6.3.1 Generating Synthetic Graphs Using TGFF . . . . . . 209 6.4 Task Mapping for SoC Applications . . . . . . . . . . . . . . 210 6.4.1 Application Task Embedding and Quality Metrics . . 210 6.4.2 SCOTCH Partitioning Tool . . . . . . . . . . . . . . . 214 6.5 OMNeT++ Simulation Framework . . . . . . . . . . . . . . 216 6.6 A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . 217 6.6.1 Application Task Graphs . . . . . . . . . . . . . . . . 217 6.6.2 Prospective NoC Topology Models . . . . . . . . . . . 218 6.6.3 Spidergon Network on Chip . . . . . . . . . . . . . . . 219 6.6.4 Task Graph Embedding and Analysis . . . . . . . . . 221 6.6.5 Simulation Models for Proposed NoC Topologies . . . 223 6.6.6 Mpeg4: A Realistic Scenario . . . . . . . . . . . . . . . 227 6.7 Conclusions and Extensions . . . . . . . . . . . . . . . . . . . 231 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 7 Compiler Techniques for Application Level Memory Optimization for MPSoC 243 Bruno Girodias, Youcef Bouchebaba, Pierre Paulin, Bruno Lavigueur, Gabriela Nicolescu, and El Mostapha Aboulhamid 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 7.2 Loop Transformation for Single and Multiprocessors . . . . . 245 7.3 Program Transformation Concepts . . . . . . . . . . . . . . . 246 7.4 Memory Optimization Techniques . . . . . . . . . . . . . . . 248 7.4.1 Loop Fusion . . . . . . . . . . . . . . . . . . . . . . . . 249 7.4.2 Tiling . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 7.4.3 Buffer Allocation . . . . . . . . . . . . . . . . . . . . . 249 7.5 MPSoC Memory Optimization Techniques . . . . . . . . . . 250 7.5.1 Loop Fusion . . . . . . . . . . . . . . . . . . . . . . . . 251 Table of Contents ix 7.5.2 Comparison of Lexicographically Positive and Positive Dependency . . . . . . . . . . . . . . . . . . . . . . . . 252 7.5.3 Tiling . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 7.5.4 Buffer Allocation . . . . . . . . . . . . . . . . . . . . . 254 7.6 Technique Impacts . . . . . . . . . . . . . . . . . . . . . . . . 255 7.6.1 Computation Time . . . . . . . . . . . . . . . . . . . . 255 7.6.2 Code Size Increase . . . . . . . . . . . . . . . . . . . . 256 7.7 Improvement in Optimization Techniques . . . . . . . . . . . 256 7.7.1 Parallel Processing Area and Partitioning . . . . . . . 256 7.7.2 Modulo Operator Elimination . . . . . . . . . . . . . . 259 7.7.3 Unimodular Transformation . . . . . . . . . . . . . . . 260 7.8 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 7.8.1 Cache Ratio and Memory Space . . . . . . . . . . . . 262 7.8.2 Processing Time and Code Size . . . . . . . . . . . . . 263 7.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 7.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 8 Programming Models for Multi-Core Embedded Software 269 Bijoy A. Jose, Bin Xue, Sandeep K. Shukla and Jean-Pierre Talpin 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 8.2 Thread Libraries for Multi-Threaded Programming . . . . . 272 8.3 Protections for Data Integrity in a Multi-Threaded Environ- ment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 8.3.1 Mutual Exclusion Primitives for Deterministic Output 276 8.3.2 Transactional Memory . . . . . . . . . . . . . . . . . . 278 8.4 Programming Models for Shared Memory and Distributed Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 8.4.1 OpenMP . . . . . . . . . . . . . . . . . . . . . . . . . 279 8.4.2 Thread Building Blocks . . . . . . . . . . . . . . . . . 280 8.4.3 Message Passing Interface . . . . . . . . . . . . . . . . 281 8.5 Parallel Programming on Multiprocessors . . . . . . . . . . . 282 8.6 Parallel Programming Using Graphic Processors . . . . . . . 283 8.7 Model-Driven Code Generation for Multi-Core Systems . . . 284 8.7.1 StreamIt. . . . . . . . . . . . . . . . . . . . . . . . . . 285 8.8 Synchronous Programming Languages . . . . . . . . . . . . . 286 8.9 Imperative Synchronous Language: Esterel . . . . . . . . . . 288 8.9.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . 288 8.9.2 Multi-Core Implementations and Their Compilation Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . 289 8.10 Declarative Synchronous Language: LUSTRE . . . . . . . . . 290 8.10.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . 291 8.10.2 Multi-Core Implementations from LUSTRE Specifications . . . . . . . . . . . . . . . . . . . . . . . 291
Description: