ARCHER CSE Service Quarterly Report Quarter 1 2016 1. Executive Summary This report covers the period: 1 January 2016 to 31 March 2016 inclusive. • Centralised CSE Team: o Along with users and service partners, we have identified key strategic priorities for the technical work of the centralised CSE team over the coming year to maximize the benefit of the CSE service to the user community (more information in Section 7 below. o A Jenkins Continuous Integration (CI) server has been set up and configured and a small set of initial compiler tests implemented. o We have exchanged application usage data with the NERSC HPC site and are in the process of analysing the data to understand where similarities and differences lie. This should help explore how unique the ARCHER workload is and where we can best share information. • Training: o We delivered 19 days (408 student-days) of face-to-face training in the quarter at 6 different locations, with an average feedback score better than “very good”. o We delivered 3 virtual tutorials as live interactive webinars with an average of 21 attendees per session. o Schedule of virtual tutorials was altered from initial plan to ensure users were kept up-to-date with the new features of a major ARCHER software upgrade (in February), and the associated availability of OpenMP 4.0 (in March). o The first training course to use the Research Data Facility (RDF) and associated Data Analytic Cluster (DAC) was successfully run in March to promote better data management on ARCHER. o The third follow-up survey on longer-term impact of training has been completed and the results analysed; a short report will be sent to training panel and EPSRC in April. The survey indicated that training has a strong impact on users' knowledge and use of HPC systems in general. o 36 people successfully completed the ARCHER “driving test” in Q1, 29 of whom have subsequently applied for and received a user account. o The ARCHER training programme was highlighted at the ARCHER Champions meeting and we discussed how attendees can access training material and collaborate to broaden the reach of HPC training.. o We are using PRACE funding to offer travel bursaries to the week-long ARCHER Summer School training event in July. • ARCHER Outreach Project: o The first ARCHER Champions face-to-face meeting was held on 16-17 March 2016 in Edinburgh. The event was considered extremely useful by those who attended. It covered a broad range of topics to facilitate discussion of what the next step is for the ARCHER Champions programme. o The ARCHER Outreach team staffed a booth at the Big Bang Fair (13-19 March). The booth showcased three interactive activities: Wee ARCHIE, designing a supercomputer, and parallel sorting. We had around 6000 interactions with children over the 4 days. o We have been successful in our proposals to provide Women in HPC workshops at ISC16 and SC16; planning has now begun for both events. • eCSE: o Of 59 projects accepted for the first 5 calls, 54 have started and 29 of these have now completed. 14 final reports have been received; more final reports are expected during the next quarter to be reviewed at the eCSE08 panel meeting in late June/early July o The 5 projects which have not yet started are from the most recent closed call (eCSE07), and are all due to start with the next few months o A call had previously gone out for early career researchers to attend an eCSE Panel meeting as observers with the aim of giving such researchers a better insight into the mechanism of selection to assist in the preparation of funding proposals. We received 17 proposals and at a selection meeting on 25 January 2016, nine candidates were selected to attend Panels over the next year. One such candidate attended the eCSE07 meeting and found this a very positive experience. 2. Collaborations and Outputs Summary • Presentations: o 18 Feb 2016, Andy Turner, Monitoring Application Usage on HPC Facilities using XALT and D3.js, HPC-SIG Meeting, University of Sheffield. o 21-23 Mar 2016, Oliver Henrich, Poiseuille Flow of Cholesteric Liquid Crystals, Joint Conference of the British and German Liquid Crystal Societies, Edinburgh. • Meetings: o 23-24 Feb 2016, Andy Turner, PRACE WP6 F2F Meeting, EPCC, Edinburgh. o 1 Mar 2016, Neelofer Banglawala, UKCTRF Management Meeting, Newcastle. o 11 March 2016, Alan Simpson, HPC SAC Meeting, Imperial College, London. o 14 Mar 2016, Andy Turner, National e-Infrastructure Project Directors’ Group Meeting, Farr Institute, London. o 17 Mar 2016, Andy Turner, EPSRC Consortia Chairs Meeting, MRC, London. o 23 Mar 2016, Andy Turner, ARCHER RAP Meeting, EPSRC, Swindon. o 24 Mar 2016, Gavin Pringle, eCSE F2F Meeting, Aeronautics Department, University of Glasgow. o 29 Mar 2016, Gavin Pringle, eCSE F2F Meeting, Aeronautics Department, University of Glasgow. 3. Forward Look • Parallel I/O Benchmarking: o Parallel I/O performance was identified as one of the key technical areas for the CSE tem to work on over the coming year. In particular, we want to understand how standard I/O benchmarks (such as IOR) relate to I/O patterns in real applications so that we can both monitor the I/O performance of ARCHER file systems in a realistic way and also provide input to future procurements on how to specify I/O benchmarks. o Work with user groups to identify a set of I/O benchmarks (both synthetic and real applications) that can be run across a variety of systems to provide an overview of parallel I/O technology performance. We are collaborating with other HPC services (JASMINE, DiRAC, Met Office), vendors and user communities to make this work as useful as possible. o Provide specific technical advice for ARCHER users on how to maximise their I/O performance on the service based on different usage modes. • Training: o One course attendee gave a score of “very bad” for a recent course. From their more detailed feedback we believe this was due to a misunderstanding over the course content, although we have reviewed the course publicity and it was explicit about the content. Feedback is currently completely anonymous so we cannot follow this up any further. As a result, we are now including the option for attendees to enter contact details on the feedback form. o The new “Object Oriented Programming with Fortran” course was run for the first time. Although it attracted a relatively small class of 6 attendees, half of them gave it the highest rating of “excellent” and we are investigating running this course again in Q3. o An online Q&A session on user issues with MPI has been scheduled for Q2. The outcomes of this virtual tutorial will be used to inform the content of the new course on “Developing Scalable Scientific Applications with MPI” to be run in Q4. • ARCHER Outreach o Best practice paper on “How to improve the representation of women at conferences” is currently in preparation. o We will hold Women in HPC workshops at ISC16, SC16 and in collaboration with EuroMPI 2016. o The second hands-on porting and optimisation workshop will run on 13 May at Imperial College, London. o The outreach material for the Teacher and Outreach Ambassadors pack is now being prepared, with anticipation of the material being released later in 2016. • eCSE: o An analysis of available funds will be completed during the next quarter to determine if more person months are available for the next two calls, over and above the minimum of 672. o We are currently focused on utilising data provided in the eCSE final reports to demonstrate the impact and benefits of the eCSE and ARCHER service. o Each of the remaining 2 calls will be attended by up to 3 early career researchers who will observe the process of the Panel meeting 4. Contractual Performance Report This is the contractual performance report for the ARCHER CSE Service for the Reporting Periods: January 2016, February 2016 and March 2016. The metrics were specified by EPSRC in Schedule 2.2 of the CSE Service Contract. CSE Query Metrics • QE1: The percentage of all queries notified to the Contractor by the Help Desk in a Quarter that the Contractor responds to, and agrees a work plan with, the relevant End User within 3 working hours of receiving the notification from the Help Desk. Service Threshold: 97%; Operating Service Level: 98%. • QE2: The percentage of all queries notified by the Help Desk to the Contractor that have been satisfactorily resolved or otherwise completed by the Contractor within a 4-month period from the date it was first notified to the Contractor. Service Threshold: 80%; Operating Service Level: 90%. • TA1: The percentage of all technical assessments of software proposals provided to the Contractor by the Help Desk in any Service Period that are successfully completed by the Contractor within 10 days of the technical assessment being provided to the Contractor by the Help Desk. Service Threshold: 85%; Operating Service Level: 90%. • FB1: The percentage of End User satisfaction surveys for CSE queries carried out in accordance with the Performance Monitoring System by the Contractor showing the level of End User satisfaction to be “satisfactory”, “good” or “excellent”. Service Threshold: 30%; Operating Service Level: 50%. Period Jan-16 Feb-16 Mar-16 Q1 2016 Metric Perf. SP Perf. SP Perf. SP Perf. Total QE1 100% -2 100% -2 100% -2 100% -6 QE2 100% -2 100% -2 100% -2 100% -6 TA1 100% -1 100% -1 100% -1 100% -3 FB1 100% -2 100% -2 100% -2 100% -6 Total -7 -7 -7 -21 Pink – Below Service Threshold Yellow – Below Operating Service Level Green – At or above Operating Service Level Of the ten feedback ratings received on In-Depth queries there were seven ratings of “Excellent” and three ratings of “Good”. Training Metrics • FB2: The percentage of all training satisfaction surveys carried out in accordance with the Performance Monitoring System by the Contractor) in each Quarter that are rated “good”, “very good” or “excellent”. Service Threshold: 70%; Operating Service Level: 80%. Period Jan-16 Feb-16 Mar-16 Q1 2016 Metric Perf. SP Perf. SP Perf. SP Perf. Total FB2 100% -1 100% -1 98% -1 100% -3 Total -1 -1 -1 -3 Pink – Below Service Threshold Yellow – Below Operating Service Level Green – At or above Operating Service Level Service Credits Period Jan-16 Feb-16 Mar-16 Total Service Points -8 -8 -8 5. CSE Queries Queries Resolved in Reporting Period Metric Descriptions In-Depth All technical queries passed to ARCHER CSE team Course Registration Requests for registration on ARCHER training courses or enquiries about registration Technical Assessment: <Category> Request for Technical Assessments of applications for ARCHER time eCSE Application Queries relating to eCSE applications A total of 403 queries were resolved by the CSE service in the reporting period. Metric Jan-16 Feb-16 Mar-16 Total % Total In-Depth 5 13 13 31 8% Course Registration 79 121 103 303 75% Technical Assessment: Grant 4 2 8 14 3% Technical Assessment: RAP 2 16 1 19 5% Technical Assessment: Instant 2 0 2 4 1% eCSE Application 9 23 0 32 8% 10 query feedback responses were received on In-depth queries in the reporting period. This represents a 32% return rate for feedback forms. Resolved In-Depth queries fell into the following categories: Category Number of Queries % Queries 3rd Party Software 14 45% User Programs 5 16% Compilers and system software 2 7% Batch System and Queues 1 3% Other 9 29% In-Depth Query Highlights A small number of In-Depth queries have been selected to illustrate the work of the centralised CSE team over the report period. Q740265: Query - Job Activity An experienced user and application developer was having issues with their application (CONQUEST, linear scaling DFT) not updating output files once a certain point in the simulation was reached. Debugging the issue was non-trivial as the application did not crash at the problem point but instead kept running, so had to be manually stopped to try and identify where the problem was coming from. The same calculation also ran fine on another HPC system, so it was unclear if the issue was with the application or with ARCHER software/hardware. The CSE team managed to identify that it was a subtle parallel bug in the code that was only exposed on ARCHER, and not on the other HPC system due to differences in the parallel runtime. A fix was provided back to the user and this was incorporated into the application for all users on ARCHER and beyond. Q741917: CP2K Segmentation Fault Error A user was seeing a crash in CP2K when they used a specific combination of settings for geometry optimisation calculations using Auxiliary Density Matrix Methods (ADMM). Analysis from the CSE team revealed that the combination of options used in the calculations was invalid but that CP2K was not identifying them as such. The CP2K code was updated to catch this exception and print a useful error message (rather than just crashing) and the user was advised on alternative functionalities in CP2K that could be used for the calculations. The changes to CP2K were implemented on the version on ARCHER and fed back into the main source code to benefit all CP2K users worldwide. Q743669: Install Fortran77 Code on ARCHER A new user on ARCHER was having difficulty compiling their CFD application on ARCHER. After discussion with the user we resolved the issues they were having, primarily due to lack of experience with the ARCHER application development environment. We managed to improve the performance of the code by showing the user how to use system versions of key numerical libraries rather than their own versions. Finally, we also provided information on how to future- proof the application with regard to using the PETSc library as they were using an obsolete mechanism for including the PETSc routines. Both the additional pieces of advice will allow the application to be more portable and better performing on other HPC systems as well as on ARCHER. In-Depth Query Analysis The histogram below shows the time to resolution for In-Depth queries in the current reporting period. The median resolution time during this period is 2 weeks (median resolution time since 1 Jan 2014 is 2 weeks). Plot of numbers of In Depth queries received per quarter: 120 100 d e v ei c 80 e R s e ri e 60 u Q f o er 40 b m u N 20 0 Quarter
Description: