ebook img

NASA Technical Reports Server (NTRS) 20040139600: 'Systemic Failures' and 'Human Error' in Canadian TSB Aviation Reports Between 1996 and 2002 PDF

0.15 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview NASA Technical Reports Server (NTRS) 20040139600: 'Systemic Failures' and 'Human Error' in Canadian TSB Aviation Reports Between 1996 and 2002

‘Systemic Failures’ and ‘Human Error’ in Canadian TSB Aviation Reports Between 1996 and 2002 C.W. Johnson C.M. Holloway Department of Computing Science, NASA Langley Research Center, University of Glasgow, Hampton, Glasgow, G12 9QQ, Scotland. VA 23681-2199, USA [email protected] [email protected], http://www.dcs.gla.ac.uk/~johnson http://shemesh.larc.nasa.gov/people/cmh/ combinations of these minor failures to build up over ABSTRACT time and hence create the preconditions for failure. This paper describes the results of an independent analysis of the primary and contributory causes of There is considerable controversy over this systemic aviation accidents in Canada between 1996 and 2003. view of failure [3]. It can be difficult to identify The purpose of the study was to assess the comparative precisely which factors play a significant role in the frequency of a range of causal factors in the reporting of latent causes of an accident or incident. For example, these adverse events. Our results suggest that the the operational pressures of their everyday tasks may majority of these high consequence accidents were influence operator behaviour. The causes of these attributed to human error. A large number of reports pressures can be traced back to particular management also mentioned wider systemic issues, including the decisions distributed throughout the tiers of managerial and regulatory context of aviation responsibility within a company. Often the systemic operations. These issues are more likely to appear as causes of adverse event will ultimately lead to the contributory rather than primary causes in this set of regulatory authorities and certification bodies that help accident reports. to create the environment in which a management board will operate. The proponents of the ‘systemic’ view Keywords can reasonably argue that regulators must ultimately Human error, accident analysis, incident investigation bare responsibility for accidents in the industries that they regulate. However, this ignores the legislative and INTRODUCTION political constraints that limit the regulators’ scope for Reason [1] has recently distinguished the ‘person’ from intervention. Similarly, it is important to question the ‘system’ approach to accident analysis. Each of whether or not upper-levels of management can these perspectives implies a radically different view of reasonably be expected to understand the detailed causation. The ‘person approach focuses on the errors working practices that characterise the everyday of individuals, blaming them for forgetfulness, operation of complex technology. In particular, inattention, or moral weakness’. In contrast, the system previous studies of adverse events such as the Bristol approach ‘concentrates on the conditions under which Infirmary failures have shown that middle and junior individuals work and tries to build defenses to avert levels of management often find it difficult to pass bad errors or mitigate their effects’. Similarly, Cook and news to their more senior colleagues [3]. Woods [2] argue that accidents occur through the The regulations that govern the work of most accident concatenation of multiple small failures. Each of these investigation agencies seldom emphasize the importance causes is necessary. However, they are each of ‘systemic factors’. For example, the Canadian insufficient to cause the failure unless they occur in Transportation Accident Investigation and Safety Board combination with other potential causes. Often these Act, 1989, c. 3, the Transport Safety Board (TSB) must small failures have roots that extend well back from the identify “causes and contributing factors” to identify moment when the accident is triggered. This analysis is “safety deficiencies as evidenced by transportation careful to distinguish between the operators who often occurrences”. It is not the function of the Board “to trigger an incident ‘at the sharp end’ and the managers assign fault or determine civil or criminal liability, but and regulators who often create the latent conditions for the Board shall not refrain from fully reporting on the a failure ‘at the blunt end’. In particular, managerial causes and contributing factors merely because fault or and regulatory problems often make it possible for liability might be inferred from the Board's findings. A number of further factors can prevent investigators from Method exploring the range of minor failures that together The method adopted in this study involved the two co- combine to create the preconditions for adverse events. authors performing an independent analysis of all of the In particular, resource constraints limit the scope of major aviation accident reports published between 1996 many investigations. Most investigation agencies and 2002 by the Canadian TSB. The investigators each operate with a relatively small core staff. They rely on had more than a decade’s experience in the development external support to provide additional expertise. of safety-critical systems. Each has been active in the However, there are inevitable shortages of skilled analysis of system failures for more than five years. The personnel in several key areas, including software decision to focus on Canadian accident reports was forensics. Further problems are created by the lack of justified because this forms one part of a larger recognized analytical techniques that might be used to international study, a companion paper described the guide and validate the ‘systemic’ analysis of adverse results of applying this technique to US NTSB events. From this it follows that it can be difficult to investigations. The start date was determined by determine whether or not investigators have considered pragmatism. It was felt that this provided a sufficiently an adequate range of causal factors during any particular large sample to support our analysis within the time investigation. available to our study. This sample yielded a total of 27 accident reports. The most recent report available at A number of leading accident investigators have written the time of writing, February 2004, was published in on the importance of ‘systemic’ factors in the causes of 2002. The reports ranged from high profile, multiple adverse events. For example, Strauch [4] argues that fatality accidents such as the loss of Swiss Air Flight the ‘transformation of error perspective’ from blaming 111 through to less severe loss-of-separation incidents. the operator to identifying the contribution of system The heuristic that we adopted was to investigate every elements ‘has, I believe, led to profound changes in the aviation incident report that was composed of distinct way we investigate, consider, and respond to accidents’. numbered sections between 1996 and 2002. We also Similarly, Ayeko [5] has argued that ‘to learn a lesson substituted a number of the less structured, reports for from an accident one must understand not only the 2002. These were reports A02C0124, A02F0069, immediate cause but also contributing factors and A02P0109, A02Q0130. This decision was justified by underlying conditions of the accident’. He goes on to the need to avoid a gap in our sample for the last two state ‘it is my belief that, when we seek “cause” rather years. We were also unable to determine whether this than “information about cause” in an investigation of an less structured format will provide a standard for future accident, the direction of the investigation often veers TSB reports. Even with these additions, our sample is towards elements that are more likely to be linked to relatively small compared to the 1,812 accidents and blame rather than the mitigation of risks’. We were 1,374 incidents that were reported to the Candian TSB concerned to determine whether these ‘systemic’ views in 2002. A considerable process of filtering was used by of complex, technological failure have had a discernable the investigatory agencies to select the most serious of impact on the work of accident investigation agencies. these incidents for investigation. In consequence, our It can be difficult to measure the impact that a particular sample focuses on those higher risk mishaps, including view of accident causation has upon the working near misses, which were deemed serious enough to practices of an investigatory organization. For example, warrant a subsequent investigation and report. most investigatory organizations analyzed a range of The analysis progressed by extracting the causal and causal factors well before authors such as Perrow [6] contributory factors that were identified in the aftermath and Reason [1] articulated the ‘systems view’. It is of each investigation. Canadian TSB reports contain a likely, therefore, that the impact of ‘systemic’ ideas can section in their abstract that lists ‘Findings As To only be measured in terms of a relative change in the Causes and Contributing Factors’. Once these sections scope of any analysis rather than a dramatic or sudden had been extracted, the two investigators performed change in investigatory practices. It is also difficult to their analysis independently. All subsequent stages know what to measure in order to determine whether were also performed in isolation from each other until there has been any movement from the ‘person’ view to the results were available for comparison. The second the ‘systems’ view of adverse events. stage of the analysis was to assign each of the probable causes and contributory factors to a number of common categories. We decided not to use any pre-defined taxonomy but to allow each of the investigators to independently assign their own terms to each of the relating to equipment failure. Analyst M also identified ‘causes’. three causes involving human error. However, they argued that the sole contributory factor should be The results of this process were then collated. There classified as a problem with equipment design. were some obvious differences in the terms used but there were also some strong similarities. For instance, This reliance on individual judgment created one analyst identified ‘human error’ while another disagreement over causes and contributory factors. distinguished between ‘aircrew error’, ‘ATM error’ and Analyst J found 53 causes and only 35 contributory so on. Where such disagreements occurred we used a factors. Analyst M found correspondingly fewer process of discussion to agree on a common term to causes, 44, and more contributory factors, 71. A more support comparisons between the classifications. For formal method for distinguishing causes from example, we agreed to use the more general term contributory factors could have reduced this variance ‘human error’. The term ‘ATM failure’ was used (Johnson, 2003). At the start of the study, we decided instead of ‘ATM error’ because it was often unclear not to use a more formal approach because the whether a particular cause or contributory factor could development of appropriate root cause analysis be associated with the manager’s actions or with design techniques remains an active area for research. We problems in their information systems. Distinctions were also keen to employ the subjective criteria that were preserved between different terms where no might be employed by the readers of these documents. agreement could be reached between the two analysts. As mentioned, the 27 incidents yielded a total of 53 Results probable causes for the first analyst. The mean number of probable causes was 1.9 with a standard deviation of As mentioned, our sample reports included separate 1.2. The second analyst identified 44 probable causes sections on "Findings As to Causes and Contributing with a mean of 1.6 and a standard deviation of 1. There Factors”. Our analysis was complicated, however, were 35 contributory causes identified by the first because the TSB does not distinguish probable causes analyst with a mean of 1.3 and a standard deviation of from contributory factors in these sections. For 2.5. The second analyst identified 71 contributory instance, report A02F0069 contained the following list: causes with a mean of 2.5 and a standard deviation of 1. The pilot not flying (PNF) inadvertently entered 3.9. The mode over all probable causes was 1 while the an erroneous V speed into the MCDU. The mode for all contributory causes was 0. 1 error was not detected by either flight crew, The standard deviation associated with the mean results despite numerous opportunities. for both causes and contributory factors is relatively high. This can be explained in terms of a small number 2. The PNF called "rotate" about 25 knots below of reports, which were very different from the mode of the calculated and posted rotation speed. one cause and zero contributory factors. In particular, both analysts identified two causes in report A97H0011. However, analyst J identified 13 contributory factors 3. The pilot flying (PF) initiated rotation 24 knots while M found 20 in this single incident. This report below the calculated and posted rotation speed describes a loss of control on go-around under adverse and the tail of the aircraft struck the runway weather conditions. Analyst J identified human error surface. and problems in air traffic management as the main causes. Analyst M identified two instances of ‘human 4. A glide path signal was most probably distorted error’. The thirteen contributory factors identified by by a taxiing aircraft and provided erroneous Analyst J included five instances of managerial failure, information to the autopilot, resulting in a pitch- two human errors, two regulatory problems, two aircraft up event. The pitch-up could have been design issues, a maintenance failure and a problem minimized if the autopilot had been relating to the operational environment in which the disconnected earlier by the PF. accident occurred. In contrast, analyst M identified As can be seen, there is no indication as to which of three human errors, seven management issues, six these items is a cause and which is a contributory factor. regulatory failures, three environmental factors and one Each analyst, therefore, had to use his own judgment. instance of equipment failure. A number of other Both analysts independently identified three causes atypical reports also helped to pull the standard relating to human error and one contributing factor deviation away from the mode. For example, both analysts identified five instances of human error causing contributory factors as being related to company the incident described in TSB report A99Q0151: management. Analyst M found this in 27% of the factors in the accident reports. The agreement “The pilot flying did not establish a maximum continued in regulation with 9%(J) and 11%(M), performance climb profile, although required by the equipment failure 9%(J) and 7% (M), environmental company's standard operating procedures (SOPs), factors 9% (J) and 8% (M). There is a more noticeable when the ground proximity warning system (GPWS) disagreement over the role of aircraft design. Analyst J "Terrain, Terrain" warning sounded during the identified it in only 9% of these contributory factors. descent, in cloud, to the non-directional beacon Analyst M identified design flaws in 15% of the factors (NDB). The pilot flying did not fly a stabilized in the TSB sample. This can be explained in terms of a approach, although required by the company's SOPs. cluster of incidents in 1998. Analyst M identified three The crew did not carry out a go-around when it was aircraft design flaws in the contributory factors for clear that the approach was not stabilized. The A98H0003, two in A98H0002 and A98H0011 and one crew descended the aircraft well below safe in A98C0173. In contrast, Analyst J identified minimum altitude while in instrument management failure as a contributory factor behind meteorological conditions. Throughout the these design flaws. approach, even at 100 feet above ground level (agl), These statistics reemphasize the importance of human the captain asked the pilot flying to continue the error as a causal factor. We did not, however, identify descent without having established any visual any trend away from blaming the operator as might be contact with the runway environment. After the predicted given the popularity of ‘systemic theories’ of GPWS "Minimums, Minimums" voice activation at failure in recent years. The frequency of human error 100 feet agl, the aircraft's rate of descent continued identified by analyst J is: 3 probable and 1 contributory at 850 feet per minute until impact. The crew (1996), 2 probable and 3 contributory (1997), 6 probable planned and conducted, in cloud and low visibility, and 1 contributory (1998), 9 probable and 0 contributory a user-defined global positioning system approach (1999), 4 probable and 1 contributory (2000), 1 probable to Runway 31, contrary to regulations and safe and 1 contributory (2001), 6 probable and 1 contributory practices.” (2002). The frequency of distribution for analyst M is: Tables 1 and 2 summarise the data from our study. 3 probable and 2 contributory (1996), 4 probable and 5 Although there is some disagreement over individual contributory (1997), 6 probable and 6 contributory incidents, there is considerable consensus across the (1998), 10 probable and 0 contributory (1999), 4 sample. Both investigators identified human error as probable and 3 contributory (2000), 1 probable and 1 the most common causal factor across the TSB sample contributory (2001), 5 probable and 2 contributory at 56% for analyst J and 75% for analyst M. The (2002). The peak in 1999 is due largely to A99Q0151, relative difference between the proportions of incidents mentioned earlier. There are also relatively high levels identified by the two analysts can be partly explained in of human error identified during 1998 and 2002. 2002 terms of the broader range of categories that were was similar to 1999, with a single incident documented considered by analyst J compared to analyst M. For as A02F0069 producing several different forms of example, analyst J also included ‘loadmaster error’ (2%) human error. In contrast, several different explain the and ‘ATM error’ (6%) that were not included within the rise in 1998 reports each with a small number of classification used by analyst M. With this caveat, the operator ‘errors’: A98Q0192, A98P0303, A98H0011, remaining results show considerable agreement; both A98H0003, A98H0002, A98C0173, A98A0191, analysts fall within one or two percent of their A98A0067. It is difficult to identify any trends that colleagues classification for environmental causes with might characterize any change in the ‘systemic view’ 9%(J) and 7% (M), aircraft design 4% (J) and 5% (M), over the ‘person’ approach to causal analysis, at least in equipment failure 9% (J) and 9% (M), regulation 4% (J) terms of the distribution of human error between 1996 and 2%(M), maintenance 2% (J) and 2% (M). and 2002. Table 2 shows that human error plays a lesser role in the contributory factors that were identified by both analysts in the TSB sample; 22% by analyst J and 28% by analyst M. Again, there is considerable agreement in terms of the overall percentages for each category of contributory factor. Analyst J identified 28% of all Total M 33(22) 0 1 0 0 1 4(3) 2(1) 0 3(3) 44 Total M 19(11) 0 0 19(9) 8(3)) 5(5) 0 11(6) 3(2) 6(4) 71 J 31(18) 3(3) 1 1 1 2(2) 5(4) 2(1) 3(2) 5(5) 53 J 8(7) 3(3) 1(1) 10(6) 3(2) 3(3) 1(1) 3(2) 0 3(3) 35 2 M 5(3) 0 0 0 0 0 0 0 0 1 6(4) SB. 02 M 2(1) 0 0 0 0 2(2) 0 0 0 1 5(4) 0 T 0 20 J 6(3) 0 0 0 0 0 0 0 0 2(2) 8(5) nadian 2 J 1 0 0 0 0 2(2) 0 0 0 1 4(4) a C 2001 J M 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 2(1) 0 0 0 4(3) 2(2) erent incidents, 2001 J M 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1(1) 1(1) TSB. an 1997 1998 1999 2000 J M J M J M J M 2(2) 4(3) 6(4) 6(6) 9(3) 10(4) 4(2) 4(2) 2(2) 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2(2) 1 0 0 0 0 1 2(2) 1 1 2(1) 0 0 0 0 0 2(1) 2(1) 0 0 0 0 1 0 0 0 0 0 0 0 1 0 2(2) 2(2) 0 0 0 0 7(7) 6(5) 16(13) 12(9) 11(4) 10(4) 4(2) 4(2) robable Causes over Time, Analysts J & M. Parentheses represent number of diff 1997 1998 1999 2000 J M J M J M J M 3(2) 5(2) 1 6(4) 0 0 1 3(1) 2(2) 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 5(1) 12(3) 3(3) 5(4) 0 0 0 0 3(2) 6(1) 0 2(2) 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 0 2(1) 3(2) 1 6(3) 0 1 0 0 0 1 0 0 0 0 0 0 1 4(2) 0 0 0 0 1 1 17(10) 30(12) 7(7) 20(14) 0 2(2) 2(2) 4(2) me. Analysts J & M. Parentheses represent number of different incidents, Canadi 1996 J M 3(3) 3(3) 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 4(4) 4(4) Table 1: Frequency of P 1996 J M 1 2(2) 0 0 0 0 2(2) 2(2) 0 0 1 0 0 0 0 1 0 2(1) 0 0 4(4) 7(6) ntributory Causes over Ti o C Human Error ATM Failure Maintenance Problem Loading Error Company Management Regulation Equipment Failure Aircraft Design Manufacturing Environment Total Human Error ATM Failure Maintenance Problem Company Management Regulation Equipment Failure Aircraft Loading Aircraft Design Manufacturing Environment Total Table 2: Frequency of Although human error is still the most prominent causal inaccurate to assert, as some have, that: (1) the operator and contributory factor in our study, it is important not is always blamed, (2) most investigations stop as soon to underestimate the frequency of managerial and as they find someone to blame, or (3) organizational regulatory failures. These issues play a greater role in causes are usually ignored. the contributory factors than they do in probable causes. This paper has described an independent analysis of the Managerial issues account for around 1% of all causes primary and contributory causes of aviation accidents in and 27% of contributory factors across both analysts. Canada between 1996 and 2002. The purpose of the Regulatory issues account for 3% of all causes and 10% study was to assess the comparative frequency of a of all contributory factors. This not only provides range of causal factors in the reporting of these adverse insights into the practices and perspective of the TSB events. Our results suggest that the majority of these but for it also casts light on the two analysts who were high consequence accidents were attributed to human involved in this exercise. Recall that the TSB reports do error. A large number of reports also mentioned wider not distinguish between contributory factors and causes. systemic issues, including the managerial and regulatory Hence we were making qualitative judgements about context of aviation operations. These issues are more those failures that should be assigned to each general likely to appear as contributory rather than primary classification. This analysis suggests that we were causes in both sets of accident reports. predisposed to view human error as a more salient probable cause than either managerial factors or REFERENCES regulatory failure. 1. J. Reason, Human Error: Models and Management, Our results do also provide insights into the distribution British Medical Journal, 320:768-770, 18 March, 2000. of systemic issues within the causal and contributory factors between 1996 and 2002. All of the documents 2. D. Woods and R.I. Cook, The New Look at Error, that were classified as describing these potential sources Safety, and Failure: A primer, Technical Report, US of failure come before 1999. From that year to 2002, Veterans Association, National Centre for Patient neither analyst was able to identify any causes or Safety, 2004. contributory factors in managerial and regulatory failures. They did continue to find human causes, for 3. C.W. Johnson, The Failure of Safety-Critical instance analyst J found 6 instances of aircrew ‘error’ in Systems: A Handbook of Accident and Incident 2002 while their colleague found 5. To summarise, it is Reporting, Glasgow University Press, Glasgow, 2003. difficult to discern any pattern that might indicate a rise in the ‘systemic view’ of failure. In contrast, the decline of managerial and regulatory issues in the TSB 4. B. Strauch, Normal Accidents: Yesterday and Today. reports might indicate a decline in the prominence of In C.W. Johnson (editor), Proceedings of the First this view. Workshop on the Investigation and Reporting of Incidents and Accidents (IRIA 2002), Department of Conclusions Computing Science, University of Glasgow, Scotland, When we began this analysis, we were keen to 2002. determine whether or not the ‘systems’ view of failure was having an impact on the output of accident investigations. Prominent investigators in both of the 5. M. Ayeko, Integrated Safety Investigation Canadian TSB [4] and the US NTSB [5] have argued Methodology (ISIM): Investigation for Risk Mitigation. that these factors must be considered when identifying In C.W. Johnson (editor), Proceedings of the First the causes of adverse events. Our results have shown Workshop on the Investigation and Reporting of that the TSB do consider a wide range of causal and Incidents and Accidents (IRIA 2002), Department of contributory factors in their reports. In particular, it Computing Science, University of Glasgow, Scotland, seems clear that they have a long tradition of 2002. considering the regulatory and managerial precursors to adverse events. However, they do focus on the role of 6. C. Perrow, Normal Accidents: Living with High- human error as a potential cause in most of the adverse Technology Accidents, Princeton University press, events that they investigate. It also seems that the role Princeton, 1999. of managerial and regulatory issues has declined in prominence in recent years. We would argue that it is .

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.