The Alan Turing Institute Programme on Data Centric Engineering’s Response to the National Infrastructure Commission’s Second Call for Evidence 1. Introduction Background The Alan Turing Institute is the national institute for data science, with a mission to make great leaps in data science research to change the world for the better. The programme on data-centric engineering (DCE) will develop critical data analytic capabilities to address the challenges in improving the performance and resilience in engineering systems and national interdependent infrastructure nexus. The evidence presented in this document will be based on the Turing Program for DCE will focus on 3 grand challenge areas of: 1. Resilience: Resilient and Robust Infrastructures, 2. Monitoring: Monitoring Safety of Complex Engineering Systems, 3. Design: Data Driven Engineering Design under Uncertainty. 2. Better Asset Management Summary and Response to Questions: The application of new data analytic methods in critical infrastructure (CI) asset management can significantly improve the efficiency, consumer experience, and reduce operating costs. The immediate technology priorities for efficient algorithm scalability and integration into asset management methods and tools include: sparse data combining methods, high-dimensional statistical inference models, and to automated data wrangling. Potential barriers to rollout include the: a) differential attitude and readiness to adopt new technology due to geographic or political segregation in certain CI operators (e.g. water management), b) complexity and uncertainty in integrating new methods into large-scale existing practices, and c) data privacy restrictions. The latter two cases are especially a concern for large CI operators. To overcome these barriers, we hope the government can encourage the uptake of new data-driven solutions through raising awareness of projects and the targeted investment in joint academic- industry research grants, with a focus in breaking down CI silos and unlocking data access restrictions. Certainly, the development of a national digital twin can help to: (1) overcome the barriers by bridging geographic and sectorial divides through linking interdependencies via a common model, (2) provide a framework for determining sensor locations, and (3) serve as a technology demonstrator for new tools. As such, the digital twin must connect disparate CI sectors and be open to the demonstration of new data analytic tools. National Importance: UK infrastructure is ageing, and requires an ever-increasing amount of investment in maintenance and upgrade to maintain existing performance levels. In addition, infrastructure assets are characterized by long life and complex deterioration modes; knowledge about the way these assets deteriorate over time and how the deterioration affects the risks and asset performance is patchy. In summary, today’s infrastructure is faced with familiar and seemingly insurmountable problems – too little money, too many assets and increasing complexity. We now give examples of projects underway in using data analytics to improve engineering assets in critical infrastructure sectors: • Flood Risk in London Underground: Transport infrastructure assets are at risk from changing environmental conditions, in part contributed to by climate changes observed in recent decades. In the U.K., increased precipitation amounts are leading to rising groundwater levels, presenting transport operators with performance issues associated with flooding of rail tracks and ballast. A NERC funded feasibility study (NE/M007987/1) derived a groundwater rise vulnerability model for the cut- and-cover tube tunnels bearing on terrace gravel deposits in London Underground (LU). This sets out the mechanics of the seepage problem and through a deterministic approach has identified upper and lower risk boundaries considering a groundwater level fluctuation range [1]. The autonomous systems developed will enable cost-efficient continuous monitoring strategies to be put in place, and are especially valuable at network scale where interconnected assets require simultaneous inspection to fully understand the risks. Once analyzed, these will provide a probabilistic understanding of the risks posed to asset performance through robust hazard-structure interaction modeling. This data-driven approach to asset management will provide LU with high-resolution performance statistics that reflect the risk model in place. This allows optimisation of drainage, ballast and track maintenance through predictive strategies not currently achievable with traditional inspection regimes. This in turn empowers LU into shaping and optimizing the resilience of these assets for the future. • Self-Organisation in 4G Networks: The ICT industry is a leading producer of digital data and has for decades used its own data to automate asset management. In recent years, there is a growing recognition to combine ICT data with new forms of social media and mobile data to create stronger user-centric understanding of consumer demand and consumer experience [2]. To do so, joint academic and industrial initiatives are underway (EU H2020 project 778305, InnovateUK project 010734) to transfer state-of-the-art heterogeneous big data analytics and machine learning tools into applied ICT automation algorithms in critical industries such as 4G/5G mobile networks. The analytical techniques involve high-dimensional statistical models using Gaussian Processes and Deep Learning to forecast heterogeneous data demand, as well as stochastic multi-armed bandit algorithms with performance guarantees to drive a range of automated asset management across time scales (millisecond resource assignment to daily asset adjustments). These form the important building blocks to virtualise asset management and reduce OPEX in current and future networks (5G). • Railway Infrastructures: Asset management in the rail industry is critical. For example, in the financial year 2009/10, whole-industry costs totaled £12.7bn. Of this, over half was spent on maintenance, renewals and enhancements. National Rail own 30,000 railway bridges and considerations are underway to instrument the bridges, yielding a data storage and analysis bottleneck. What is required are on-the-fly procedures that can be employed without storing all data. Currently, a collaboration is formed between the DCE at the Alan Turing Institute and the Cambridge Centre for Smart Infrastructure and Construction at Cambridge University, and Imperial College to develop intelligent digital twins for two railway bridges (in collaboration with Laing O’Rourke Plc. as part of the Staffordshire Alliance Improvements Programme - SAIP). The instrumentation of bridges has changed the hands-on assessment of a bridges behavior to include a statistical data analysis. A statistical model will give an understanding of the stochastic nature of bridges and lead to an efficient monitoring system for predictive maintenance. The combined use of statistical analyses, big data, physical modelling and numerical modelling constitutes the main features of the digital twin. The real world SAIP self-sensing bridges are serving as the training ground for validating the intelligent digital twins. The approach, if employed over a system of assets, enables asset managers to (1) to develop a novel whole-system asset monitoring and maintenance capability and (2) get appropriate and accessible asset information that enables timely and cost- effective decision-making at different times of the assets’ lifetimes. 3. Smart Traffic Management Summary and Response to Questions: Smart traffic management (STM) has already started. There are 3 decisive challenges STM systems may have to tackle. 1. STMs will need to collect and process real time, high quality data. 2. increased demand for individual transportation will need to be offset by improvements in traffic flow density. 3. It will be necessary to improve the efficiency of movement via increased ride sharing and interchangeability between modes (i.e., improved coordination). Access to relevant data is a key contemporary challenge for urban policymakers as they deal with ever growing demand on public infrastructures and considerable financial constraints. As part of its work on data centric engineering, the Oxford Internet Institute (OII) in conjunction at the Alan Turing Inst. have been conducting research into the deployment of open and social media data for facilitating smart urban management. The key aim of this strand of work is to find ways of enabling an internet of things style awareness of the surrounding urban environment without the up-front costs and difficulty of installing large sensor grids (which are out of reach of all but large urban megacities): instead, we are exploring ways of repurposing existing data created by third parties and government itself. This creates what we have described as a “lightweight” smart city [3]. We now give examples of projects underway in using data analytics to improve traffic management in different sectors: • Data Bias from Public and Social Data: The research frontier in this area concerns plugging missing traffic data with repurposed data and accounting for the biases. Open data has the potential to change the way we collect and process transport data. One of the most promising projects is the open data platform created in the city of Manila. Easy Taxi, Grab and Le.Taxi – three ridesharing companies – partnering with the World Bank are sharing their driver’s GPS streams to the public using an open data license. This Open Transport Partnership makes it possible for transport agencies to make real time evidence-based decisions at relatively low cost. Examples of recent work by the OII including the use of OpenStreetMap data for understanding the spatial availability of alcohol [4], and Twitter data for understanding local high resolution commuting patterns [5]. In so doing, we highlight two key findings. First, there are biases in the demographic makeup of the groups which contribute to open and social media platforms. Second, we have found that these biases are not so severe that they impede the extraction of reliable proxies, which were found in the case of both alcohol availability and local commuting patterns. • Demand Mitigation using Autonomous Vehicles: Autonomous cars may lead to higher congestion due to demand effects from the forecasted substantial drop in the monetary costs of travelling by car. To mitigate demand externalities, traffic efficiency gains will have to be maximized. Depending on the level of automation, substantial gains could come from smart lights. Recent works shows that a simple light system with human drivers could increase traffic flow efficiency up to 200%. Autonomous cars make it possible to move from the traffic flow based system to a vehicle level system. This could substantially increase capacity and significantly reduce delays at intersections. Another way to respond to increased demand will be through improved coordination, and specifically, via ride-sharing and better interchangeability between modes. The technological innovation of smart phones and the decreasing cost of computing made it possible to efficiently share rides. Researchers at MIT have created a model that predicts the potential for ride sharing in any city. This potential, measured by the compatibility of individual mobility patterns in space and time, is shown to be substantial with important implications for demand management. 4. Big Data Summary and Response to Questions: The effective use of big data requires greater standards to make the data accessible and usable. Currently data from numerous sources will be in various states of readiness, and combining datasets and getting value from them in an arduous task. This would be made easier by having defined and widely accepted standards for data structures, data labelling, data cleanliness and data-sharing methods. The Alan Turing Institute is working with industry and public bodies on the development of standards for data science and on defining Data Readiness Levels to better methodology is how big datasets are managed. Also vital is the widespread acceptance of appropriate data security procedures. Many company are failing to protect vital infrastructure data, through reliance on lax procedures or outdated hardware and software. Solutions to these problems exist, but are not being adopted enough. Open Data: The benefits to having open data in the modern age are unprecedented, especially where they impact public services. Open data and accessible APIs can lead to greater public awareness and engagement with infrastructure, new services, greater safety and gains in efficiency. It also opens the sector to greater innovation from data science firms, especially the UK's wealth of start-ups and SMEs in this space. However, the sector as a whole is unwilling to share data openly, and even private data-sharing agreements (B2B, collaborations with academia, etc.) can be difficult to arrange. The unwillingness to openly share data is largely a cultural issue stemming from conservatism in many parts of the infrastructure business sector. There is a fear of the implications of sharing data openly, particularly around legal ramifications, security considerations and loss of IP. Many of these fears stem from a lack of knowledge and experience in operating with open data. Possible solutions to address these issues include: • Government guidance on data-sharing methods, including standards for ensuring security of data structures and advice on adhering to legal restrictions around data protection and other data-related legislation. • Flagship schemes or pilot projects to show the value and potential of data-sharing initiatives. This could build on existing schemes, such as the use of APIs by TFL for tube and bus services which has led to a range of improvements for customers travelling by public transport. • Financial incentives for firms which engage in open data sharing • Regulatory incentives which can nudge companies towards sharing data Digital Twin: A national digital twin is important to provide modeling and forecasting to an ageing UK infrastructures. It should provide a platform for using data and data science to validate and reinforce existing mathematical models of complex engineering systems and assist in the development of new models. Certainly, the development of it can help to: (1) overcome the barriers by bridging geographic and sectorial divides through linking interdependencies via a common model, (2) provide a framework for determining sensor locations, (3) identify abnormal behaviour using machine learning techniques that do not expose it to adversarial attacks [6], and (4) serve as a technology demonstrator for new tools. As such, the digital twin must connect disparate CI sectors and be open to the demonstration of new data analytic tools. Only by bridging data (collection, analysis) and engineering knowledge, working with engineers and knowledge stakeholders, can a national digital twin help to manage both the data and the infrastructure in an efficient and reliable way. References [1] Stephenson, V.; D’Ayala, D. 2017. Assessing the Vulnerability of Historic Rail Tunnel Linings to Groundwater Rise, Quarterly J. of Engineering Geology and Hydrogeology [2] Fan, C. et al.; (2017) Learning-based Spectrum Sharing and Spatial Reuse in mm-wave Ultra Dense Networks, IEEE Trans. on Vehicular Technology. [3] Voigt C & Bright J. 2016. The Lightweight Smart City and Biases in Repurposed Big Data. In: Proceedings of HUSO, The Second International Conference on Human and Social Analytics [4] Bright J, De Sabbata S, Lee S. 2017. Geodemographic biases in crowdsourced knowledge websites: Do neighbours fill in the blanks? Forthcoming in: GeoJournal [5] McNeill G, Bright J & Hale S. 2016. Estimating Local Commuting Patterns from Geolocated Twitter Data. arXiv preprint arXiv:1612.01785 [6] Quiring E, Arp D & Rieck K. 2017. Fracternal Twins: Unifying Attacks on Machine Learning and Digital Watermarking. arXiv preprint arXiv:1703:05561 The Alan Turing Institute Programme for Data Centric Engineering would be pleased to provide further detail of any of the issues raised above, either in writing or by way of oral evidence. This response was initiated by [name.redacted] and coordinated by [names.redacted] NIC New Technology Second Call for Evidence: BIG DATA Anglian Water Services Ltd. Anglian Water is delighted to have the opportunity to respond to NIC’s second call for evidence on New Technology, as a supplement to the material we have already shared on our innovation work. We would welcome the opportunity to discuss any of the issues raised. BIG DATA How can we support the effective deployment of innovative data-based technologies in infrastructure? What issues are there around the collection, management, and use of infrastructure data, and what are the barriers to sharing data? What can government do to address these issues? What data challenges would be presented by a national digital twin? This case study will consider the legislative, regulatory and cultural landscape, the quality and interoperability of data, and methods for promoting the secure sharing of data, focussing in particular on the energy sector. 10. What governance arrangements are needed to manage the huge amount of data being generated and used in the infrastructure industry and to encourage the effective deployment of data-based technologies in the infrastructure industry (e.g. need for agreed APIs)? To support our data and information capabilities, Anglian Water have developed four integrated information strategies to guide our business; Data, Content, Mobile and Business Intelligence. We are now developing an Integrated Technology Strategy to align all of the strategy roadmaps, ensuring we develop a coherent plan that is deliverable, affordable, and meets all business requirements. An organisational capability plan is being developed in parallel to the project, ensuring that we have the right leadership, governance, people, skills and communication in place to make this a sustainable and business-as- usual way of working for years to come. All of these strategies and plans can be shared and discussed with NIC on request. Big data is often described along three dimensions called the 3V’s – Volume, Velocity, Variety. Within Anglian Water our main data set of high volume is through operational site SCADA and regional telemetry. This data set is not currently considered to be of high velocity; though it will need to develop in frequency over time. Managing these high volumes of data for use through analytics is a major challenge. Over the next 18 months, Anglian Water (in line with our Business Intelligence strategy) plans to: • Upgrade our Enterprise Data Warehouse (this will encompass time series, geospatial and unstructured data) • Implement data integration technologies to enable real time data analytics • Implement best of breed reporting technologies to enable self-service reporting and analytics • Provide a data exploration and discovery platform for predictive analytics, modelling and new data insights through predictive modelling and data science Anglian Water Services Ltd. 1 • Implement a real time data historian for telemetry and IOT (Internet of Things) time series data. The Anglian Water Smart Infrastructure program is an example of another current strategy developed by the business to build a resilient communications network and operational technology (OT) platform to host OT data. Communicating operational data is a key success factor for big data within infrastructure owners. Over the next 5-10 years as the use of sensors increases, so will the amount of data produced and thus there is a need to build a scalable data capture solution. This gives rise to a clear need for a strong security governance structure as the program relies on complex data integration and sharing. The Internet of Things (IOT) is being considered to allow effective communication between IT systems; however, currently one of the main challenges is that there is not an IOT network and a very limited range of devices for the water industry. Many of the sensor devices we are trailing in our innovation Shop Window are sold as a service with a cloud platform which includes specialist analytics. We are now identifying how we integrate these into own OT platform in the future to allow for those data sets to be used in wider business analytics through our historian and enterprise data warehouse. Additionally, by the end of the current AMP period (2015-2020) we will have delivered a modern BI platform that delivers high quality inter-connected information that people can access whenever they need it and self-serve without the need to hand-off to the IT department. We will create an enterprise operational data store that people can use to store locally generated transactional data. Over time we will ensure that self-serve information is the easiest and most trusted source for insight and remove the need for future data silos to be created. Anglian Water also faces a major challenge in dealing with and aligning the large variety of data that must be handled. Often this data must be used for numerous purposes, while coming from a variety of sources. This data will also cover many formats, database records across many functional domains such as enterprise resource planning systems (e.g. SAP), images, geographical information systems, computer aided design (CAD) drawings and paper based records (both digital images and physical hard copy). Historically these different data areas have been managed separately. Due to the complex needs of the organisation, a governance structure that enables multiple domains of specialism (customer service, asset management, regulatory affairs, etc.) to understand their differing needs from other parts of the organisation is imperative. It is however important to distinguish between data management and software solutions at an early stage. The latter is a methodology and process that relies on the use of technology, which can be in the shape of software, but it is not in itself a software or technology. Anglian Water has defined a set of data management principles that can be found in our Data Strategy. These principles are the foundation of all new projects, serving as a guide and helping to control behaviour across multi-disciplinary teams. They should also form the foundation of any governance arrangements being developed around the issue of data management. Any such governance structure must have a number of basic characteristics; it must be effective in the way it can share data through complex integration methods (often restricted by limited performance). Furthermore, Anglian Water Services Ltd. 2 there must be a standardisation of approaches with a clear focus on avoiding duplication, and there must be the capability to forecast using external and internal data. Our solutions include: • Enterprise data architecture management; data definition, data relationships, meta data management, etc. - through open database technologies for data integration. This will readily enable integration with other external system via industry standard API’s on premises and in the cloud. • Data ownership - with a clear understanding that Anglian Water architects will ensure standardisation of data across the organisation internally and a responsibility to provide organisation-wide guidance on the location of existing data sets. • Data quality management – definition, standards and measures of quality • Data security management for personal and sensitive data protection • Master data management- through authoritative ‘golden’ data stores in a single, centralised location for shared business-critical data, to drive down fragmentation of master data. • Data retention and archiving – driven by policy and particularly in relation to existing data that has not yet been digitised In addition to this, Anglian Water plans to have a common set of self-serve tools with an open architecture to enable specialist teams to leverage niche analytical and visualisation tools to drive competitive advantage. We want to support our culture of collaboration and leverage new social capabilities to share knowledge and promote feedback on new information sources so everyone can find them and maximise their potential. Data governance will ensure that data and system overlaps are investigated and identified at the enterprise level at an early stage, and any potential synergies are realised. However, for all these solutions to be executed effectively there is an underlying need to change mindsets within Anglian Water and right across the supply chain, as well as a willingness to embrace new technologies and funding increases for data capture. To commence implementing our data strategy, Anglian Water has set up an Enterprise Data Architecture team as part of existing Enterprise Architecture capabilities. We are using best practices brought to us through our IS Alliance partners. This will facilitate and coordinate the development of necessary data management capabilities and support governance. To prove the value of enhanced data analytics and eventually ‘big data’, Anglian Water has also set up a collaborative Data Science function, using internal and external resources from our IS Alliance partners. This team tackles complex analytical challenges primarily to improve our service offering to customers, and improve our operational effectiveness and management of assets. It is real business case benefits that will support and inspire the massive cultural changes required in digital transformation. To further encourage the effective use of database technologies across our company and partner base, we see being able to support mobile working and digital communications across the whole supply chain with appropriate connectivity coverage, speed and data volumes is a key success factor. If all our Anglian Water Services Ltd. 3 people cannot access the information they need when and where they need it, data and information quality will never be improved and maintained. 11. What barriers are there to sharing data internally within systems and organisations and externally (e.g. through making data sets open to realise indirect value)? What can the government do to support the secure sharing of data in the infrastructure industry? Many of Anglian Water’s current data challenges stem from overly project- centric, system-centric, silo’d business unit strategies or funding practices, and lack of standardisation (definitions, languages and formats). These practices effectively perpetuate the creation and development of data silos both internally and externally. As the IT world evolves rapidly and with the phased adoption of technologies such as IOT there is often a disconnect between data capture formats and a severe lack of standardisation. This issue is then amplified when looking across the infrastructure industry as a whole, where different organisations and sub industries have developed and are developing at varying speeds. Barriers to sharing internally will be removed through delivering the enterprise data governance capabilities described above across our whole supply chain. As a specific example, Anglian Water currently has an asset management framework (AMF) which defines the information that needs to be captured for each asset type. This information standard may be a barrier if not aligned with national asset standards; however this strategy could potentially form the nucleus of any proposed standard. Anglian Water is looking to improve and upgrade the Service-orientated architecture (SOA) infrastructure and related processes. There is a programme underway to deliver our next generation of middleware and update management processes to enable us to react to changes quickly and integrate all internal or external systems. The government can assist by providing guidance and a platform to facilitate interested parties to develop standards for commonly required needs. A central hub/database to enable secure sharing of information (similar to Water Industry Market Reform through the Market Operator solution) is one way in which the government can aid effective data management across the national infrastructure industry. This will however also require cooperation between organisations to define data standards and API’s and has security implications that need to be addressed with mitigations agreed upon. Regarding barriers to sharing data externally, we recognise major challenges regarding: • Legal issues relating to conforming to General Data Protection Regulations and patient privacy. We think there may be merit in the sector developing guidance collectively so that we all have a common understanding – and don’t all reinvent the wheel. • Cultural barriers – the need for incentives, fears regarding miss-use of shared data, and building trust in what we share and how it is subsequently used. We need to recognise that failing to share data can result in a loss of trust. • Technical issues – common standards, definitions, technical interfaces and external facing portals. We need standard data protocols to allow us to share data and to facilitate innovation. Again, we think that companies can work together to establish these standards. Anglian Water Services Ltd. 4 Anglian Water would welcome the opportunity to discuss these areas in more detail as we plan how to tackle all of these challenges. To support our transformations in data sharing we have recognised the recent Ofwat report “Unlocking the value of customer data” that worked with the Open Data Institute, and are identifying what we can release and how to support those principals. To further develop our plans for data sharing, we are actively engaging in a number of collaborative national groups. In the area of asset information one specific group is the BIM4Water task group, as part of the National Building Information Modelling (BIM) group. This has been set up to deliver the objectives of the Government Construction Strategy and the requirement to strengthen the public sector’s capability in BIM implementation. The BIM4Water group has 4 priorities for action: 1. Build an evidence base and library of best practice case studies to help build a business case for BIM in water 2. Developing a ‘plan of works’ and associated data drops for the water sector 3. Work with BIM4Manufacturers and other representative groups to influence the creation of BIM product libraries for the water sector 4. Work with the BIM Task Group to influence the business case for BIM adoption with senior representatives from client organisations As part of the work we are actively sharing details about our asset data, and how we define, categorise and structure hierarchies, inventories and catalogues. We are also working with the IP3 knowledge transfer network for Infrastructure Client Organisations. Their aims are to support innovation, idea and information sharing. Anglian Water is also collaborating with the Infrastructure Client Group “Project 13: From Transactions to Enterprises”, set up to create a community of infrastructure owners and suppliers committed to change and to unpicking the UK’s productivity knot through new models of working. All these initiatives will support our understanding of what we need to share and why, and how to technically and culturally share data across our supply chains, across sectors, and open data environments. 12. How can a national digital twin help to manage infrastructure data as an asset? We view a “Digital Twin” as a bridge between the physical and digital worlds, an exact virtual representation of a physical ‘thing’, as if the system or product was looking in a mirror. It is important to recognise that a digital twin is not just a physical mirror, but a “virtual mirror” as well – cross discipline – not just a mechanical / geometric representation, but also including the electronics, sensors, wiring, software, firmware, telemetry etc. Certainly not just computer aided design (CAD). We believe that digital twins will allow analysis of data and monitoring of systems to head off problems before they even occur, prevent downtime, develop new opportunities, rehearse delivery of work and train staff, and even Anglian Water Services Ltd. 5
Description: