ARCHER Hardware Overview and Introduction Slides contributed by Cray and EPCC Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_US This means you are free to copy and redistribute the material and adapt and build on the material under the following terms: You must give appropriate credit, provide a link to the license and indicate if changes were made. If you adapt or build on the material you must distribute your work under the same license as the original. Note that this presentation contains images owned by others. Please seek their permission before reusing these images. Nodes: The building blocks The Cray XC30 is a Massively Parallel Processor (MPP) supercomputer design. It a distributed memory system built from thousands of individual shared-memory nodes. There are two basic types of nodes in any Cray XC30: • Compute nodes • These only do user computation, also referred to as the “back-end” • Service/Login nodes • e.g. the ARCHER “front-end”: login.archer.ac.uk • These provide all the additional services required for the system to function, and are given additional names depending on their individual task: • Login nodes – allow users to log in and perform interactive tasks • PBS Mom nodes – run and managing PBS batch scripts • Service Database node (SDB) – holds system configuration information • LNET Routers - connect to the external filesystem. • There are usually many more compute than service nodes Differences between Nodes Service/Login Nodes Compute nodes • The node you access when you first • These are the nodes on which log in to the system. production jobs are executed • Run a full version of the CLE Linux OS • They run Compute Node Linux, a (all libraries and tools available) version of the Linux OS optimised for • Used for editing files, compiling code, running batch workloads submitting jobs to the batch queue and • Can only be accessed by submitting other interactive tasks. jobs to a batch management system • Shared resources that may be used (PBS Pro on ARCHER) concurrently by multiple users. • Exclusive resources, allocated (by • There may be many service nodes in PBS) to a single user at a time. any Cray XC30 and can be used for • Many more compute nodes in any various system services (login nodes, Cray XC30 than login / service nodes. IO routers, daemon servers). ARCHER Layout Compute node architecture and topology Cray XC30 Intel® Xeon® Compute Node The XC30 Compute node Cray XC30 Compute Node features: NUMA Node 0 NUMA Node 1 • 2 x Intel® Xeon® 32GB QPI 32GB Sockets/die • 12 core Ivy Bridge DDR3 • QPI interconnect Intel® Intel® • Forms 2 NUMA regions Xeon® Xeon® 12 Core die 12 Core die • 8 x 1833MHz DDR3 • 8 GB per Channel • 64/128 GB total PCIe 3.0 Aries NIC • 1 x Aries NIC Aries Aries • Connects to shared Aries Router Network router and wider network • PCI-e 3.0 Terminology • A node corresponds to a single Linux OS • on ARCHER, two sockets each with a 12-core CPU • all cores on a node see the same shared memory space • ARCHER comprises many 24-core shared-memory systems • i.e. maximum extent of an OpenMP shared-memory program • Packaging into compute nodes is visible to the user • minimum quantum of resources allocation is a node • user given exclusive access to all cores on a node • ARCHER resources requested in multiples of nodes • Higher levels (blade/chassis/group) not explicitly visible • but may have performance impacts in practice XC30 Compute Blade 8 ARCHER System Building Blocks System Rank 3 Group Network Rank 2 Active Optical Network Network Chassis Passive 12 Groups Rank 1 Electrical 4920 Network Network Compute 16 Compute 2 Cabinets Nodes Compute Blades 6 Chassis Blade No Cables 384 Compute 4 Compute 64 Compute Nodes Nodes Nodes Cray XC30 Dragonfly Topology + Aries
Description: