NASA/TM—2017-219408 Application of Machine Learning to Rotorcraft Health Monitoring Tyler Cody Glenn Research Center, Cleveland, Ohio Paula J. Dempsey Glenn Research Center, Cleveland, Ohio January 2017 NASA STI Program . . . in Profi le Since its founding, NASA has been dedicated • CONTRACTOR REPORT. Scientifi c and to the advancement of aeronautics and space science. technicalfi ndings by NASA-sponsored The NASA Scientifi c and Technical Information (STI) contractors and grantees. Program plays a key part in helping NASA maintain • CONFERENCE PUBLICATION. Collected this important role. papers from scientifi c and technical conferences, symposia, seminars, or other The NASA STI Program operates under the auspices meetings sponsored or co-sponsored by NASA. of the Agency Chief Information Offi cer. It collects, organizes, provides for archiving, and disseminates • SPECIAL PUBLICATION. Scientifi c, NASA’s STI. The NASA STI Program provides access technical, or historical information from to the NASA Technical Report Server—Registered NASA programs, projects, and missions, often (NTRS Reg) and NASA Technical Report Server— concerned with subjects having substantial Public (NTRS) thus providing one of the largest public interest. collections of aeronautical and space science STI in the world. Results are published in both non-NASA • TECHNICAL TRANSLATION. English- channels and by NASA in the NASA STI Report language translations of foreign scientifi c and Series, which includes the following report types: technical material pertinent to NASA’s mission. • TECHNICAL PUBLICATION. Reports of For more information about the NASA STI completed research or a major signifi cant phase program, see the following: of research that present the results of NASA • Access the NASA STI program home page at programs and include extensive data or theoretical http://www.sti.nasa.gov analysis. Includes compilations of signifi cant scientifi c and technical data and information • E-mail your question to [email protected] deemed to be of continuing reference value. NASA counter-part of peer-reviewed formal • Fax your question to the NASA STI professional papers, but has less stringent Information Desk at 757-864-6500 limitations on manuscript length and extent of graphic presentations. • Telephone the NASA STI Information Desk at 757-864-9658 • TECHNICAL MEMORANDUM. Scientifi c and technical fi ndings that are preliminary or of • Write to: specialized interest, e.g., “quick-release” reports, NASA STI Program working papers, and bibliographies that contain Mail Stop 148 minimal annotation. Does not contain extensive NASA Langley Research Center analysis. Hampton, VA 23681-2199 NASA/TM—2017-219408 Application of Machine Learning to Rotorcraft Health Monitoring Tyler Cody Glenn Research Center, Cleveland, Ohio Paula J. Dempsey Glenn Research Center, Cleveland, Ohio National Aeronautics and Space Administration Glenn Research Center Cleveland, Ohio 44135 January 2017 Acknowledgments The lead author would like to acknowledge and thank Herb Schilling, from the Scientifi c Applications and Visualization Team at the NASA Glenn Research Center, for providing the opportunity to work in a new area and serving as a mentor along the way. Without his support and guidance, this paper would not have been possible. Level of Review: This material has been technically reviewed by technical management. Available from NASA STI Program National Technical Information Service Mail Stop 148 5285 Port Royal Road NASA Langley Research Center Springfi eld, VA 22161 Hampton, VA 23681-2199 703-605-6000 This report is available in electronic form at http://www.sti.nasa.gov/ and http://ntrs.nasa.gov/ Application of Machine Learning to Rotorcraft Health Monitoring Tyler Cody* National Aeronautics and Space Administration Glenn Research Center Cleveland, Ohio 44135 Paula J. Dempsey National Aeronautics and Space Administration Glenn Research Center Cleveland, Ohio 44135 Summary Machine learning is a powerful tool for data exploration and model building with large data sets. This project aimed to use machine-learning techniques to explore the inherent structure of data from rotorcraft gear tests, to determine relationships between features and damage states, and to build a system for predicting gear health for future rotorcraft transmission applications. Classical machine-learning techniques are difficult to apply to time-series data because many techniques make the assumption of independence between samples. Two techniques were used to overcome this difficulty: (1) hidden Markov models were used to create a binary classifier for identifying scuffing transitions and (2) recurrent neural networks were used to leverage long-distance relationships in predicting discrete damage states. When combined in a workflow, where the binary classifier acted as a filter for the fatigue monitor, the system was able to demonstrate accuracy in damage state prediction and scuffing identification. The time-dependent nature of the data restricted this project to collecting and analyzing data from the model selection process. The limited amount of available data was unable to give valuable information, and the division of training and testing sets tended to heavily influence the scores of models across combinations of features and hyperparameters. This work built a framework for tracking scuffing and fatigue on streaming data and demonstrates that machine learning has much to offer rotorcraft health monitoring through the use of Bayesian learning and deep machine learning methods to capture the time-dependent nature of the data. 1.0 Introduction Machine learning involves techniques for exploring data and building models. Without explicit definitions of relationships, these techniques can identify and leverage patterns across data sets. The goal of this project was to apply machine learning to gear health monitoring by using previously generated data sets related to fatigue and scuffing failure modes. The aim of this application was to find out what can be learned about the data, to build a system for tracking gear damage, and to inform future work and data generation in the research area. Beyond informing future work, there are further motivations for applying machine learning to this field. Rotorcraft and gear health testing is a data-intensive process. Data have, are, and will continue to be generated in large amounts; and machine learning offers a nontraditional approach for interpreting these data. Furthermore, rotorcraft and gear health data are complex. There are many relationships, both known and unknown, between the features of the data set. Any techniques employed to evaluate these relationships must also take into account the time-dependent nature of the data. Thus, rotorcraft health monitoring represents a significant and specific challenge for testing the capabilities of machine learning. *Lewis’ Educational and Research Collaborative Internship Project (LERCIP) internship. NASA/TM—2017-219408 1 2.0 Background and Data Set Description 2.1 Data-Generation Process The data used in this project spanned multiple spiral bevel gear set (pinion and gear) tests. Tests were performed in the Spiral Bevel Gear Fatigue Test Rig at the NASA Glenn Research Center. A detailed description of this test facility is provided in References 1 and 2. The Spiral Bevel Gear Fatigue Test Rig is illustrated with a cross-sectional view in Figure 1. The facility operates as a closed-loop torque regenerative system, where the drive motor only needs enough power to overcome the losses within the system. The load is locked into the loop via a split shaft and a thrust piston that forces a floating helical gear axially into the mesh. The 100-hp drive motor supplies the test rig with rotation and overcomes loop losses via V-belts to the axially stationary helical gear. Two sets of spiral-bevel gears, referenced as left and right when facing the gearboxes, are installed in the gearbox. The concave side of the pinion is always in contact with the convex side of the gear on both the left and right side. However, the pinion drives the gear in the normal speed reducer mode on the left side, while the pinion acts as a speed increaser on the right side. Both gear sets are lubricated with oil jets pumped from an oil reservoir using qualified helicopter transmission oil. The oil drains from the gearbox, flows through an inductance-type in-line oil debris sensor, then flows past a magnetic chip detector. A strainer and a 3-μm filter capture any debris before the oil returns to the gearbox. Facility operational parameters, torque, speed, and gearbox oil temperatures were collected every minute with a facility data acquisition (DAQ) system. A commercially available noncontact rotary transformer shaft-mounted torque sensor was used to measure torque during testing. Oil inlet, outlet, and fling-off temperatures were measured with thermocouples. The fling-off temperature was measured inside the gearbox where the oil was flung off of the gears at the out-of-mesh position. Vibration, oil debris, torque, and speed data were also collected once every minute with Glenn’s research DAQ system, the Mechanical Diagnostic System Software (MDSS). The NASA MDSS system acquires, digitizes, and processes the tachometer pulses and accelerometer data. A new experiment is set up when a new gear set is installed on the left side of the test rig. Figure 1.—Spiral Bevel Gear Fatigue Test Rig. NASA/TM—2017-219408 2 Figure 2.—Location of Mechanical Diagnostic System Software (MDSS) accelerometers. Oil debris data were collected from an inductance-type oil debris sensor and a magnetic chip detector. The inductance-type oil debris sensor was used to measure the ferrous debris generated during fatigue damage to the gear teeth. The MDSS recorded the number of particles and their approximate size on the basis of user-defined particle size ranges or bins. The user-defined average particle size for each bin was used to calculate the cumulative mass: the average particle diameter (for particles assumed to be spherical) was multiplied by the density of steel. Reference 3 has a detailed analysis of the oil debris data generated during testing. Vibration data were measured with accelerometers installed on the right and left sides of the test rig pinion support housings, radially and vertically with respect to the pinion, as shown in Figure 2. Facing the gearboxes, the left gear set (pinion and gear) and right gear set (pinion and gear) accelerometers were referenced as such in the MDSS system. Speed was measured with optical tachometers mounted on the left pinion shaft and left gear shaft to produce a separate once-per-revolution tachometer pulse for the pinion and gears. Reference 4 provides additional details on the vibration data collected during these tests. 2.2 Gear Design The gears tested were designed to represent a rotorcraft drive system gear mesh. To minimize scuffing and force a failure on the left-side gear set, several gear sets were super-finished (a process that improves gear surface and extends gear life) and installed on the right side of the gearbox (Ref. 5). Surface roughness improved by a factor of 4 on average after this process was applied. 2.3 Gear Set Failure Modes The failure mode planned to be investigated was the surface contact fatigue that occurs when small pieces of material break off from the gear surface because the surface has been exposed to forces “exceeding the endurance limit of the material” that produce pits on the contacting surfaces because of “surface and subsurface stressors” (Ref. 6). The failure mode for these tests, defined by American Gear Manufacturers Association (AGMA) standards, was identified as AGMA class (contact fatigue), general mode (macropitting), and degree (progressive) in which pits are observed in different shapes and sizes greater than 0.04 in diameter (Ref. 7). Gear sets were tested until progressive macropitting was observed on a significant area of two or more gear or pinion tooth surfaces. An unanticipated failure mode— scuffing—was also observed on some teeth during testing. Scuffing causes metal to transfer from one tooth surface to another without any substantial debris generation. Figure 3 demonstrates the concept of fatigue as a progressive failure mode and scuffing as an immediate failure mode. Representative photographs of the two failure modes observed during testing on the gear teeth are also shown in Figure 3. NASA/TM—2017-219408 3 Figure 3.—Gear set failure modes. TABLE I.—DATA AVAILABLE FOR ANALYSIS Operational parameters: left gear box only Runtime, min Torque, in.-lb Left oil inlet temperature (LOI), Left fling-off temperature (LFO), Left oil outlet temperature (LOO), Condition indicators for left gear (GL) and pinion (PL) Debris, mg GL RMS CI PL RMS CI GL FM4 CI PL FM4 CI GL SI1 CI PL SI1 CI GL SI3 CI PL SI3 CI GL M8A CI PL M8A CI Damage state PL or GL damage state/scale 2.4 Data Set Description The data sets consisted of observed damage states, condition indicators, and operational data. Three data sets were exemplary of fatigue failure, and three data sets were exemplary of scuffing failure. Table I lists the operational and condition indicator data available for analysis. The operational parameters, which were measured throughout each test, reflect the condition of the environment. Table II summarizes the failure modes observed on the gear teeth during testing, between inspections. In addition to oil debris, vibration-based condition indicators were calculated. More specifically, vibration data were collected at sample rates that provided sufficient vibration data for calculating time-synchronous-averaged (TSA) data: vibration signal data were averaged over several revolutions of the shaft, in the time domain, to improve the signal-to-noise ratio (Ref. 8). From the TSA data, several gear condition indicators were calculated for this analysis: figure of merit 4 (FM4), root mean square (RMS), sideband index (SI), and M8A (Ref. 8). FM4, RMS, and M8A are common time-domain, statistically based vibration algorithms used in commercial health and usage monitoring systems (HUMS) (Ref. 9). These operational parameters and condition indicators are referred to as features within the context of machine learning. Also, the data sets varied in size depending on failure mode, from ~300 samples to ~9000 samples, where the sample rate was 1 sample/min. NASA/TM—2017-219408 4 TABLE II.—FAILURE MODES OBSERVED DURING TESTS [---, no damage; macro, macropitting.] Inspection Inspection Left gear 45 Right gear 50 Left pinion 45 Right pinion 50 interval (min) 1 Pre-test ---------------- ------------ ---------------- --------------- 2 1 to 76 ---------------- ------------ ---------------- --------------- 3 76 to 324 ---------------- ------------ ---------------- --------------- 4 324 to 1370 ---------------- ------------ ---------------- --------------- 5 1370 to 2120 ---------------- ------------ Macro 1 tooth --------------- 6 2120 to 2403 ---------------- ------------ Macro 2 teeth --------------- 7 2403 to 2833 ---------------- ------------ Macro 2 teeth --------------- Inspection Inspection Left gear 15 Right gear 50 Left pinion 15 Right pinion 50 interval (min) 1 Pre-test ---------------- ------------ ---------------- --------------- 2 1 to 63 ---------------- ------------ ---------------- --------------- 3 63 to 705 ---------------- ------------ Macro 1 tooth --------------- 4 705 to 1022 ---------------- ------------ Macro 2 teeth --------------- 5 1022 to 1291 ---------------- ------------ Macro 2 teeth --------------- Inspection Inspection Left gear 30 Right gear 50 Left pinion 30 Right pinion 50 interval (min) 1 Pre-test ---------------- ------------ ---------------- --------------- 2 1 to 70 ---------------- ------------ ---------------- --------------- 3 70 to 1784 ---------------- ------------ Micropitting --------------- 4 1784 to 3270 ---------------- ------------ Micropitting --------------- 5 3270 to 4633 Macro 1 tooth ------------ Micropitting --------------- 6 4633 to 5359 Macro 1 tooth ------------ Micropitting --------------- 7 5359 to 5962 Macro 2 teeth ------------ Macro 1 tooth --------------- 8 5962 to 6037 Macro 2 teeth ------------ Macro 1 tooth --------------- Inspection Inspection Left gear 20 Right gear 50 Left pinion 20 Right pinion 50 interval (min) 1 Pre-test ---------------- ------------ ---------------- --------------- 2 1 to 70 ---------------- ------------ ---------------- --------------- 3 70 to 217 Scuffing all teeth ------------ Scuffing/pitting all --------------- teeth Inspection Inspection Left gear 40 Right gear 50 Left pinion 40 Right pinion 50 interval (min) 1 Pre-test ---------------- ------------ ---------------- --------------- 2 1 to 63 ---------------- ------------ ---------------- --------------- 3 63 to 370 Scuffing all teeth Scuffing all teeth Inspection Inspection Left gear 21 Right gear 19 Left pinion 21 Right pinion 19 interval (min) 1 Pre-test ---------------- ------------ ---------------- --------------- 2 1 to 127 ---------------- ------------ ---------------- --------------- 3 127 to 307 Scuffing all teeth ------------ Scuffing all teeth --------------- 4 307 to 1122 Scuffing all teeth ------------ Macro 5 teeth Edge wear 5 1122 to 1393 Scuffing all teeth ------------ Macro 6 teeth Edge wear 6 1393 to 1568 Scuffing all teeth ------------ Macro 8 teeth Edge wear 7 1568 to 1905 Macro 4 teeth ------------ Macro 10 teeth Edge wear NASA/TM—2017-219408 5 3.0 Approach 3.1 Introduction to Machine Learning Machine learning is used in data analysis to automate analytical model building. The machine uses algorithms to learn from data iteratively to discover trends, identify patterns, and make predictions. This learning can be divided into two broad groupings: (1) supervised learning and (2) unsupervised learning (Ref. 10). In the typical case, supervised learning methods are used for building models for prediction, whereas unsupervised learning methods are used for data exploration. In supervised learning, algorithms try to predict an output when given an input vector, which can come in the form of regression or classification. In supervised learning, a training data set is selected that has labeled target output data for given input data; then a model is trained to find the mapping function for this relationship. That is, supervised learning requires a labeled set of data. In unsupervised learning, algorithms try to discover a good internal representation of the input. In some cases this is used to create a useful representation of the data for subsequent supervised learning tasks, such as methods to identify key features or parameters in the data. However, unsupervised learning techniques are also standalone tools that can explore patterns in data when the patterns to look for have not been explicitly defined. Supervised and unsupervised learning define two ends of a spectrum. Many methods fall into a semisupervised category somewhere between the two that leverages the benefits of both learning techniques. 3.2 Classical Machine Learning Several more classical machine-learning algorithms can be grouped effectively into one of a number of categories. The categories, which span supervised and unsupervised techniques, are described in the following list to broadly highlight common techniques used in machine learning. Regression algorithms iteratively refine a modeling of relationships between variables. This refining is performed by minimizing a measure of error in the predictions made by the model. Common algorithms are linear, logistic, and stepwise regression. Regularization algorithms penalize models on the basis of their complexity, thus selecting features internally as part of the model-building process. They are typically extensions of regression methods, of which two popular types are ridge regression and lasso regression. Clustering algorithms explore inherent structures in the input data to group them by their commonalities. These usually involve some form of a distance calculation between points; a prime example is a k-means algorithm. Decision tree algorithms base decision models on the values of attributes in the input data. A given input datum follows a path down the tree until a prediction is reached at a leaf node. Decision tree methods are quick to build and fast in prediction. Ensemble algorithms leverage independently trained weaker models by using them together to create a more powerful, robust model. Popular ensemble methods include random forests and gradient-boosted forests, which leverage the speed of decision trees by using large numbers of these trees together in modeling. 3.3 Machine Learning and Time-Series Analysis From the onset, gear health monitoring was known to be a time-dependent problem. That is, successive samples in a data set cannot be assumed to be independent. However, many of the classical machine-learning techniques make this assumption. In evaluating a sequence of points, a given tempera- ture at time-step 0 may have a completely different significance than the same temperature at time-step 100. The algorithms listed in Section 3.2 are unable to capture this nature of the data out of the box. NASA/TM—2017-219408 6