Health Management System Design: Development, Simulation and Cost/Benefit Optimization1 Gregory J. Kacprzynski Andrew J. Hess Michael J. Roemer NAWCAD Impact Technologies, LLC 22195 Elmer Road, Building 106 125 Tech Park Drive Suite 228 Rochester, NY 14623 Patuxent River, MD 20670-1534 585-424-1990 [email protected] 1. INTRODUCTION Abstract(cid:190) This paper provides an update to [1] on the developments associated with a Prognostics and Health The application of “health” or “condition” monitoring Management (PHM) system design tool that integrates a systems serves to increase the overall reliability of a system model-based FMECA methodology with state-of-the-art through judicious application of intelligent monitoring system simulation directly linked to downstream Life Cycle technologies. A consistent health management philosophy Costs (LCC). This design tool will seek out recommended integrates the results from the health monitoring system for PHM system designs based on a cost function that the purposes of optimizing operations and maintenance accurately represents key LCC variables such as system practices through, 1) prediction, with confidence bounds, of availability, maintainability, reliability, and failure mode the Remaining Useful Life (RUL) of critical components, observability. The tool will be capable of assessing PHM and 2) isolating the root cause of failures after the failure sensor requirement specifications at the component and effects have been observed. If RUL predictions can be subsystem levels, and will then allow for integration into a made, the allocation of replacement parts or refurbishment broader integrated system model. Tradeoff, sensitivity and actions can be scheduled in an optimum fashion to reduce “what if” analysis will then allow the designer/user to the overall operational and maintenance logistic footprints. examine the cost/benefit relationship of either adding or Fault isolation is a critical component to maximizing system removing sensor and algorithms under consideration for the availability and minimizing downtime through more PHM design. An interactive database of existing PHM efficient troubleshooting efforts. technologies for specific applications will also be accessible within the design tool for suggesting sensors/algorithms for Aside from general exceedence warnings/alarms, health monitoring various system parameters. Finally, the monitoring initiatives mostly take place after in-field approach introduces a collaborative, web-enabled failures (and substantial costs) have been incurred. To environment for enhanced realization and virtual simulation address this issue, this paper proposes the concept of a of PHM system design. A simplified example of a Health Health Management Virtual Test Bench or a software tool Management system cost/benefit analysis on an aircraft that is not only used for health monitoring system design but electromechanical valve is provided for illustration of the also for system validation, managing inevitable changes concepts introduced. from in-field experiences, and evaluating system design tradeoffs (Figure 1). TABLE OF CONTENTS 1. INTRODUCTION 2. ROLE OF FMECA IN HEALTH MANAGEMENT 3. APPROACH TO HEALTH MANAGEMENT DESIGN 4. FUNCTIONAL BLOCK DIAGRAM 5. ENHANCED FMECA 6. RESPONSE MODELS 7. HEALTH MANAGEMENT ATTRIBUTES 8. COST FUNCTION 9. COLLABORATIVE DESIGN ENVIRONMENT 10. HM DESIGN EXAMPLE 11. CONCLUSION 1 0-7803-7231-X/01/$10.00/© 2002 IEEE Report Documentation Page Report Date Report Type Dates Covered (from... to) 2002 N/A - Title and Subtitle Contract Number Health Management System Design: Development, Simulation and Cost/Benefit Optimization Grant Number Program Element Number Author(s) Project Number Task Number Work Unit Number Performing Organization Name(s) and Address(es) Performing Organization Report Number Impact Technologies, LLC Rochester, New York 14623 Sponsoring/Monitoring Agency Name(s) and Sponsor/Monitor’s Acronym(s) Address(es) Sponsor/Monitor’s Report Number(s) Distribution/Availability Statement Approved for public release, distribution unlimited Supplementary Notes Abstract Subject Terms Report Classification Classification of this page unclassified unclassified Classification of Abstract Limitation of Abstract unclassified UU Number of Pages 8 4. Traditional FMECA does not address health management technologies for diagnosing and prognosing faults. 5. Traditional FMECA typically focuses on subsystems independently. With these shortcomings in mind, a new approach has been developed that extends far beyond traditional FMECA capability and used in the design of health monitoring and management systems. 3. APPROACH TO HEALTH MANAGEMENT DESIGN Figure 1 - Health Management with System Design Figure 2 provides an overview of the approach to health management system design optimization. A basic Because an initial system FMECA is performed during the description of each block will be given first, then details design stage, it is a perfect link between the critical overall associated with each block will follow. First, a Function system failure modes and the health management system Block diagram of the system must be created that models designed to help mitigate those failure modes. Hence, a key the energy flow relationships among components. This aspect of the process presented links this traditional functional block diagram provides a clear vision of how FMECA analysis with health management system design components interact with each other across subsystems. On optimization based on failure mode coverage and life cycle a parallel path, a tabular FMECA is created that corresponds cost analysis. to a traditional FMECA except it contains failure mode symptoms, as well as sensors and diagnostic/prognostic technologies. Alternately, a system response model may be 2. ROLE OF FMECA IN HEALTH MANAGEMENT used for assessing sensor placements and observability of simulated failure modes thus offsetting the manual burden FMECA’s historically contain 3 main pieces of information of creating the FMECA. Finally, maintenance tasks that as described below: address failure modes are included. 1. A list of failure modes for a particular component The information from the Functional Block diagram and the 2. The effects of each failure mode ranging from a local tabular FMECA is automatically combined to create a level to the end effect graphical health management environment that contains all 3. The criticality of the Failure mode (I – IV), where (I) is of the failure mode attributes as well as health management the most critical technologies. The graphical health management environment simply a sophisticated interface to a relational While this type of failure mode analysis is beneficial in database. Once the graphical health management system getting an initial (though generally unsubstantiated) measure has been developed, attributes are assigned to the failure of system reliability and identifying candidates for modes, connections, sensors and diagnostic/prognostic redundancy, there are several areas where fundamental technologies. The attributes are information like historical improvements can be made so that FMECA’s can assist in failure rates (failures / 1E5 operating hours), replacement health monitoring design. Four shortcomings of traditional hardware costs, false alarm rates etc., which are used to FMECA’s are: generate a fitness function for assessing the benefits of the health management system configuration. The “fitness” 1. Traditional FMECA does not address the precursors or function criteria include system availability, reliability, and symptoms to failure modes. cost. Some of these attributes must be manually 2. To move maintenance from reactive to proactive, it is determined, if known, while others are related to the important to focus on both system and component level attributes of the diagnostic/prognostic technologies can be indications that the likelihood of a substantial failure determined from independent measures of performance and mode has increased. Failure mode symptoms that occur effectiveness tests or from pre-developed databases. prior to failure are these indications. An example of Finally, the health management configuration is failure mode symptoms associated with a bearing automatically optimized from a cost/benefit standpoint would be an increase in spike energy or an increase in using a genetic algorithm approach. The net result is a the oil particulate count. configuration that maintains the highest system reliability to 3. Traditional FMECA does not address the sensors and cost/benefit ratio. sensor placement requirements to observe failure mode symptoms or effects. Figure 2 – Architecture of PHM Design tool Figure 3 – Functional Block Diagram Layout 4. FUNCTIONAL BLOCK DIAGRAM 5. ENHANCED FMECA The Function Block Diagram (FBD) contains an integrated As previously mentioned, with this approach, traditional representation of how components, subsystems and systems FMECA analyses were enhanced with the addition of interact with one another. It is not a simulation, only a sensors, health monitoring technologies and failure hierarchical map of physical energy flows (i.e. torque symptoms. Figure 4 shows an example of an enhanced transfer, current, pressure). This energy flow map serves as FMECA performed on a portion of a fuel system for a F- the backbone for the health management design 100 engine created by Penn State ARL and Impact environment because it contains the failure mode symptoms Technologies. and effects as well as capturing their temporal paths. Figure 3 shows an example of a functional flow diagram at a As with traditional a FMECA, the failure mode is provided “system” level. One could select any of the components to along with its effects (ranked from top to bottom as primary, reveal specific interactions between its associated subsystem secondary, tertiary, etc.). The Criticality or Frequency of components. Occurrence of the failure mode is ranked from A to E where: A = Frequent, B = Probable, C = Occasional, D = Remote, E = Improbable Figure 4 – Tabular FMECA of a F-100 Fuel System In practice, this Criticality letter would be associated with a specific probability of failure range. The Severity of the failure mode is ranked from I-IV where: I – Catastrophic, II – Critical, III – Marginal, IV - Negligible The first FMECA enhancement is that failure mode symptoms have been added to the “effects” column and are shaded in blue (or light gray). Failure mode symptoms are events that can be observed prior to the failure mode occurring or when the failure mode is in a very early stage Figure 5 – Response model integration in the overall HM of development. Subsequent effects may or may not be model downstream failure modes. In the case where an effect is a downstream failure mode, the failure mode of focus could One such system response model for a hydraulic system be considered a failure mode precursor. developed by Dr. Jacek Stecki et al. of Monash University is shown in Figure 6. This model illustrates how the system The “Component” column identifies the component model may be perturbed to simulate how the effects of immediately affected by the failure mode while “Module” is certain modes propagate in time and space. Sensor / the subsystem in which the component resides. This algorithm combinations can be examined for their ability to functional relationship is cross-referenced with the detect the perturbations. functional block diagram. In a similar fashion, the “Sensor” column lists the sensor that can observe the symptom or effect while “S_Module” is the subsystem in which the component connector sensor resides and “S_Component” is the component it is l linked to. All sensors in this example are required for k m control or safety purposes. Finally, “Diagnostics” and “Prognostic” column have been added. The “Diagnostics” n column describes if there are any discrete diagnostic (Built fault in Test (BIT)) or continuous processing algorithms that can observe the symptom or effect. The “Prognostics” column describes any prognostic algorithms that can be used to pressure p down p obtain a RUL prediction on the failure mode. Qldown (eg. pipe leakage) Qkdown (e.g.. pump leakage) 6. RESPONSE MODELS Qmup (e.g.. check valve leakage) Qnup (e.g. relief valve open In some cases, a model of a subsystem may be developed that can provide valuable insight into where sensor are Figure 6 – Example of a detailed system response model likely to have the most observational quality on failure modes. This optional level of fidelity allows for detailed, physics-based subsystem modeling, to be used for examining PHM trade-off’s. Such tradeoff’s at this level 7. HEALTH MANAGEMENT ATTRIBUTES would include analyzing the number of sensors required, To autonomously evaluate the cost/benefit of a HM system location of the sensors and associated algorithms. This configuration, all aspects of the system must ultimately be type of model would be integrated in the overall HM design assigned, or modify, a dollar value so that a cost function environment thus far discussed where cross-system can be generated and optimized. Some of these “attributes” influences can be examined and accounted for (Figure 5). are more easily derived that others. The attributes assigned within a HM system and their respective icons are linked to Failure modes (F/FM), Sensors (eye), Effects, Diagnostics (Stoplight-discrete, x-y plot - continuous), Prognostics (stethoscope) and Maintenance Tasks (M). A short list of these attributes is shown in Figure 7. Some of the less obvious attributes are described next. detection quality of the technology. A physical prognostic model (i.e. based on an FE model) would ideally have a higher prognostic accuracy than an experienced-based model (i.e. Weibull distributions of historical failure rates). More details on model fidelity are discussed in [2]. A valid concern is how the technical attributes of diagnostic and prognostics technologies can be determined. One method is addressed in [1], whereby algorithms are test objectively from performance and effectiveness standpoints using transitional run to failure data. Of course in the absence of this type of information, and with a new sensor/algorithm combination, an educated guess may be the only option. Figure 7 - Short list of HM attributes 8. COST FUNCTION Sensors The health management design environment configuration Sensors are defined in the model as components for and attributes contain a sufficient amount of information to measuring physical quantities such as temperatures, generate and evaluate a “fitness” function. This fitness pressures and currents. The “Observational Quality” function is of the form: attribute of a particular sensor is a measure of the sensitivity with which it is able to pick up a physical signal linked to a For each Failure Mode – FM(i) particular failure mode. For example, an accelerometer stud Step 1) Probability of Failure * Severity *Consequential mounted on top of a bearing casing may have a better Cost of FM(i) +(Downstream Failure Mode observational quality than one magnetically mounted some Consequential Costs) * Probability of Propagation distance away. Step 2) *HM risk reduction attributed to FM(i) Step 3) + Cost associated with False Alarms on FM(i) Diagnostic and Prognostic Attributes Step 4) + Total Cost of all HM technology Diagnostics can be either discrete or continuous. Discrete diagnostics are traditionally algorithms that produce 0 or 1 The Consequential Cost (CC) is the sum of the direct and depending on if a threshold has been exceeded. Many types indirect costs required to address a particular fault/failure of Built In Tests (BITs) can be classified as Discrete mode (i.e. repair, replace, inspect) ranging from quantifiable Diagnostics. An example of a discrete diagnostics is an repair and labor costs, to less concrete costs such as the Exhaust Gas Temperature (EGT) reading that has exceeded effect on system availability. Clearly, only a small aspect of a predetermined level. all the possible factors are addressed here and the issue is purposely left ambiguous. If the probability of failure Continuous diagnostics are algorithms designed to observe multiplied by consequential costs is defined as risk, health transitional effects and diagnose a failure mode based on the monitoring reduces risk by providing a probability that a method and rate in which the effect is changing. particular failure mode can be prevented by 1) either Continuous diagnostics are usually associated with detecting an “upstream” fault/failure mode or 2) prognosing observing the severity of failure mode symptoms. Examples when a fault/failure mode will occur. Unfortunately, the of continuous diagnostics would be a spike energy monitor health monitoring adds development and hardware costs as for identifying low levels of bearing race spalling or an A.I. well as the potential for false alarms. At the system-wide classifier for diagnosing that a valve is sticking. The level, the benefits of the health monitoring technologies in “Detection Confidence score (0-1) – (DDC)”, and “% false terms of risk reduction must offset the costs and risk of the positive score (0-1) – (DFP)” can be used to simultaneously technology addition. account for true-negative and true-positive characteristics. Specifically, the formulation is as follows (using the Finally, Prognostic algorithms can use a combination of acronyms defined in Figure 7): sensor data, a-priori knowledge of a failure mode and diagnostic information to predict the time to a failure or Steps 1 and 2 = degraded condition with confidence bounds. Prognostic aglrgaoprhiitchaml sF MarEeC Ali nmkeodd eld. i r ectly to failure modes in the (cid:229) FMN(cid:239)(cid:237)(cid:236)ŒŒØ(cid:213) DC(cid:215)(cid:229)SDOQ(1- SPf)(cid:215)(cid:213) PA(cid:215)(cid:229)SPOQ(1- SPf)œœø(cid:215)ŒØ(Pf(cid:215)S(CC- M)(cid:215)Pp)+F(cid:229)MNRolled_Upœø(cid:239)(cid:253)(cid:252) (1) FMi(cid:239)(cid:238)ŒºDFM NsensorsD PFM NsensorP œß º FMi+1 ß(cid:239)(cid:254) Prognostics do not have an attribute associated with false alarms. The “Prognostic Accuracy” accounts for the early The "Rolled Up” costs = and standard data formats such as XML, data and applications will be accessible individually through web- (cid:231)(cid:230) (cid:231)(cid:230) (cid:229) OQ (cid:247)(cid:246) (cid:231)(cid:230) (cid:229) OQ (cid:247)(cid:246) (cid:247)(cid:246) (2) based servers, and managed through an integration layer, Pf(cid:215)S(CC)(cid:215)Pp(cid:215)(cid:231)(cid:213) (cid:231)1- SFM (cid:247) (cid:215)DC(cid:215)(cid:213) (cid:231)1- SFM (cid:247)(cid:215)PA(cid:247) which will control the communications protocol and access (cid:231)(cid:231)Ł DFM(cid:231)Ł NsensorsD(cid:247)ł PFM(cid:231)Ł NsensorsP(cid:247)ł (cid:247)(cid:247)ł privileges (Figure 8). Step 3 = Ø Ø ø ø (3) +(1- Pf)(cid:215) S(cid:215) Œ1- Œ (cid:213) (1- SPf)- (cid:213) (1- FP)œ œ (cid:215) CC Œº º SFM DFM ß œß Finally Step 4 = +(cid:229) AIC+(cid:229) DAIC+(cid:229) PAIC (4) S D P Figure 8 – Design of Collaborative Work environment HM Design Optimization The goal of the HM system optimization is to maximize the 10. HM DESIGN EXAMPLE risk reduction provided by the design while minimizing costs. The optimization of the previously described cost A simple, yet realistic example of a Health Management function will operate between two boundaries; a design evaluation is shown next. In this example, an “maximum” HM system configuration that includes the electrically actuated control valve concept is addressed for “wish list” of all potential sensors and associated algorithms an aerospace application. Recall that a HM design model that achieve complete failure mode coverage and a has many hierarchies ranging from the component level to “minimum” configuration that is necessary for safety and the system level. For brevity, this example will consider, control. The optimization algorithm will examine random but not illustrate, the far-reaching system effects of various configuration variations and calculate the “fitness” or cost valve failure modes. The cost function for this model for each. should by no means considered complete. The purpose of the example is only to introduce the HM design and A genetic algorithm optimization scheme was chosen for the optimization process. HM optimization because genetic algorithms are better configured to handle optimization problems with little The top portion of Figure 9 shows a Line Replaceable Unit regard for non-linearity, dimensionality or function (LRU) level Functional model of a Load Control Valve complexity in general. Potential cost functions generated in (LCV) that is used to regulate discharge air from an the HM environment can include hundreds of independent Auxiliary Power Unit (APU). Compressed air from the variables and thus makes it impractical to utilize traditional APU is used for main engine starts, environmental control optimization techniques such as gradient decent or other and several other functions. The “in” and “out” bars on the derivative-based algorithms. While the details of the left and right of the model are used to propagate signals, optimization are outside the scope of this paper, it is flows, and effects between levels. important to note that there will be no clear “winner,” rather many different HM system configurations will be suggested that the designer can evaluate on the basis of additional criteria. More on this subject can be found in [7]. 9. COLLABORATIVE DESIGN ENVIRONMENT Before an example is given, it is important to address the design environment and associated architecture to enable the entire process. A collaborative work environment is being implemented in this program to allow a number of domain experts to operate applications from different locations, potentially on different operating systems, while sharing and maintaining the same data. For instance, the HM Design Tool will be used to perform advanced component simulation models, FMEA and Cost/Benefit Models Figure 9 – Functional Model and HM design for LCV simultaneously at various locations. By utilizing the Internet The bottom portion of Figure 9 shows the unit level maintenance task (denoted by the “U”) to remove/replace the LCV. Also shown are the candidate health monitoring algorithms that have the potential to detect a valve degrading in performance and allow for proactive maintenance. Algorithm #1 trends the relationship between LCV command, motor current, and the actual actuator position. In this scenario, the LVDT used to monitor the actuator position is a candidate sensor. Algorithm #2 trends the APU’s exhaust gas temperature and speed with respect to the LCV command. All the sensors used for Algorithm Figure 11 – HM design for Actuator #2 are available for “free” because they are required for control purposes. Figure 10 shows the HM design at the torque motor level. Contained at this level is a failure mode of torque motor, the effects of such a failure, and maintenance tasks on the motor. Also shown is an existing Built-In-Test (BIT) based on the torque motor current. This BIT is either 0 or 1 and can provide no prognostic capability or truly isolate a failure mode. Figure 12 – HM design for Butterfly valve Figure 13 provides a concise illustration of some of the attributes assigned to the HM elements in Figures 9-12 that were used in evaluating the cost function. Other “expensive” fault/failure modes such as inability to start the main engines and inadequate avionics cooling were also included. For brevity, the details of the cost function analysis will not be given. In this simple study, the LVDT sensor and algorithm #1 where found not to provide enough risk reduction for the cost, rather, algorithm #2 should be implemented. There are, of course, a number of variables contributing to this result the most dominent being the fact Figure 10 – HM design at the Torque motor level that algorithm #2 uses existing sensors even though it provides lower diagnostic confidence and was assigned Figure 11 illustrates the HM design at the actuator where the higher development costs. LVDT would physically exist. Note that due to the cause and effect relationship, failure of the actuator position to function could be the result of a torque motor problem or an actuator failure mode. Finally, Figure 12 is the HM design for the butterfly valve. Many upstream failure modes can cause it to malfunction creating potentially creating more critical downstream failure modes such as insufficient avionics cooling, inability to start the main engines, etc. Clearly, such a model should continue through system interactions until end effects are reached. Figure 13 – Costs and probabilities for the HM design 11. CONCLUSION [7] Yukish, Michael, “Simulation Based Design and Lifecycle cost estimating”, 54th Proceedings of the Society An approach has been presented that extends traditional for Machinery Failure Prevention Technology (MFPT), FMECA and system modeling capabilities to aid in the Virginia Beach, VA, May 2000. design of complex health management systems. This approach utilizes a graphical and collaborative design Gregory J. Kacprzynski is a environment where failure modes, failure mode Project Manager at Impact symptoms/effects, sensors, and diagnostic/prognostic Technologies with over 5-yr. of technologies are represented. The health management experience in the development of system configuration can be optimized from a cost/benefit diagnostic/prognostic systems for through analysis of the fitness attributes on HM system compressors, pumps, power building blocks. The ultimate objective of this approach transmission components, gas was to form a methodology and environment which enables and steam turbines. He has been effective health management practices by mitigating or involved in developing real-time, preventing failure modes while still keeping sensor and intelligent health monitoring diagnostic/prognostic technology costs at a minimum. systems for gas turbine engines for on-wing and test cell applications as well as for other air vehicle subsystems. Early in his career he developed stochastic life assessments ACKNOWLEDGMENTS of steam turbine components and performed failure analysis and vibration testing of various mechanical structures. We would like to acknowledge the contributions of Carl Greg has published papers and developed technologies in Byington of Impact, Dr. Jacek Stecki of Monash University, the area of maintenance optimization, FMECA’s, Life Cycle Rob Campbell of Penn State ARL, and the support of Andy cost assessment, model-based prognostics and data fusion Hess and Dr. William Scheuren of DARPA in this ongoing technologies. Greg has his MS and BS in Mechanical project. Engineering from Rochester Institute of Technology. Dr. Michael J. Roemer is the Director of Engineering at REFERENCES Impact Technologies in Rochester, NY and Adjunct Professor of [1] Kacprzynski, G., and Roemer, M., “Extending FMECA Mechanical Engineering at the – Health Management Design Optimization for Aerospace Rochester Institute of Technology. applications”, Proceedings of the IEEE 2000 He was formerly a Vice President of Engineering at STI [2] Orsagh R.F. and Roemer, M.J. “Development of Technologies prior to joining Metrics for Mechanical Diagnostic Technique Qualification Impact Technologies. Mike has a and Validation”, COMADEM Conference, Houston TX, Ph.D. in Mechanical Engineering, December 2000. M.S. in Systems Engineering and B.S. in Electrical Engineering, all [3] Roemer, M. J. and Kacprzynski, G.J., “Advanced from the State University of New York at Buffalo. He has Diagnostics and Prognostics for Gas Turbine Engine Risk over 14 years experience developing real-time, automated Assessment,” Paper 2000-GT-30, ASME and IGTI Turbo health management technologies for complex systems, Expo 2000, Munich, Germany, May 2000. including large steam and gas turbines, gas turbine engines, rotary/fixed-wing aircraft subsystems and naval propulsion [4] Lewis, E., Introduction to Reliability Engineering, John systems. He has developed several diagnostic and Wiley & Sons, New York, 1987 prognostic capabilities for complex systems utilizing probabilistic methods that are directly linked to [5] Roemer, M. J., and Atkinson, B., “Real-Time Engine maintenance planning and system operation. He is the Health Monitoring and Diagnostics for Gas Turbine Engines,” author or co-author of more than 50 technical papers in Paper 97-GT-30, ASME and IGTI Turbo Expo 1997, Orlando, these subject areas. He is currently the Chairman of the Florida, June 1997. Machinery Failure Prevention Technology (MFPT) Society, a Division of the Vibration Institute, and Prognostics Lead [6] Brooks, R. R., and Iyengar, S. S, Multi-Sensor Fusion, for the SAE-E32 Engine Condition Monitoring Committee. Copyright 1998 by Prentice Hall, Inc., Upper Saddle River, New Jersey 07458 [7] Canada, J, and Sullivan, W, Capital Investment Analysis papers/ieee2002/Hmdesignver2.doc for Engineering and Management, Copyright Prentice Hall 1996