Effective system safety efforts require learning from failure. Lessons learned will be presented here, often illustrated through an accident or mishap. Note that in discussing these accidents, the intent is not to oversimplify the events and conditions that led to the accidents. Rarely is there only one identifiable cause leading to the accident. Accidents are usually the result of complex factors that include hardware, software, human interactions, and procedures. Readers are encouraged to review the full accident and mishap investigation reports referenced to understand the often complex conditions and chain of events that led to each accident discussed here.
Aircraft Accident in Michigan
(posted 5/24/2012)

On January 9, 1997, an Empresa Brasiliera de Aeronautica flight, operated by COMAIR Airlines flight 3272, crashed near Monroe, Michigan when flying from Covington, Kentucky to Detroit. All 28 people on board died in the crash, and the airplane was destroyed. The NTSB determined that the probable cause of the accident was ice accumulation on the wings, and the NTSB faulted the FAA for its failure to establish adequate certification standards for flight in icing conditions, the FAA’s failure to ensure that deicing procedures were properly implemented, and the failure of the FAA to establish adequate minimum airspeeds for icing conditions. However, the NTSB stated that the operation of the autopilot in icing conditions also played a role in the accident. The pilots were flying on autopilot during the flight. Unfortunately, the conditions of the aircraft were becoming worse as the flight continued due to the ice buildup. The autopilot attempted to maintain the proper flight path until the conditions exceeded set limits, and then the autopilot disengaged. It appeared to the NTSB that the pilots were unaware of the degraded condition until the autopilot disengaged. The NTSB stated that the pilots may have been able to control the airplane better had they recognized the airplane’s degraded aerodynamic condition and disengaged the autopilot themselves before it shut down automatically. The NTSB stated that the autopilot may have resulted in decreased awareness of the conditions of the aircraft, “There is evidence that pilots’ previous experience with autopilot systems (the autopilot’s reliability and ability to consistently control the airplane as commanded) can result in an increased level of confidence in the autopilot system, which may lead to less vigilant flight crew monitoring of the airplane’s performance while operating with the autopilot engaged.” The NTSB recommended for future flights that the autopilot not be used when icing conditions are prevalent.
Lessons Learned: Research has found that increased use of automated controls can reduce the operator’s ability to maintain situational awareness. Operators who do not play some active role are less vigilant and more complacent. Therefore, efforts must be made to understand the unintended consequences of increased automation. More broadly, the human-software interface must be included as part of the hazard analysis and risk mitigation strategies.
National Transportation Safety Board, “In-Flight Icing Encounter and Uncontrolled Collision with Terrain, Comair Flight 3272, Embraer EMB-120RT, N265CA, Monroe, Michigan, January 9, 1997,” NTSB/AAR-98/04, November 4, 1998.
Titan IV B Failure
(posted 5/17/2012)

On April 30, 1999, a Titan IV B vehicle (Titan IV B-32), with a Titan Centaur upper stage (TC-14) was launched from Space Launch Complex 40 in Florida. The mission was to place a Milstar satellite into geosynchronous orbit. The flight performance of the Titan solid rocket motors and core vehicle was nominal, and the Centaur upper stage separated properly from the Titan IV B. The vehicle began experiencing instability about the roll axis during the first Centaur burn. That instability was greatly magnified during the Centaur’s second main engine burn, resulting in uncontrolled vehicle tumbling. The Centaur tried to compensate for those attitude errors by using its Reaction Control System. Such attempts ultimately depleted available propellant during the transfer orbit coast phase. The third engine burn ended early because of the tumbling vehicle motion. As a result of the anomalous events, the Milstar satellite was placed in a low elliptical final orbit instead of the intended geosynchronous orbit.
The Titan IV B Accident Investigation Board concluded that a failed software development, testing, and quality assurance process for the Centaur upper stage caused the failure of the Titan IV B-32 mission. That failed engineering process did not detect nor did it correct a human error in the manual entry of the roll rate filter constant entered in the Inertial Measurement System flight software file. Evidence of the incorrect constant appeared during launch processing and the launch countdown, but its impact was insufficiently recognized or understood. Consequently, this error was not corrected before launch. The incorrect roll rate filter constant zeroed any roll rate data, resulting in the loss of control. The Board noted that the manually input values were never formally tested in any of the simulations before launch, and simulator testing was not performed as the system was supposed to be flown. The investigation report also noted that the original risk assessment showed that such an error was a low risk because they had not seen similar problems in the past, either in testing or in operation. Therefore, the organization performing the risk assessment significantly underestimated the risk of an event based on lack of previous failures.
Lessons Learned: Risk assessment helps to understand the significant problems, and to focus and prioritize resources to fix the problems. When risk assessment is not rigorous or is performed improperly, decision makers may not fully understand the potential for harm or the likelihood of a catastrophic event. Therefore, every attempt should be made to validate analysis inputs, and to allow for independent review of the results of any risk assessment. In addition, analyses alone should not be used for safety decisions. Analyses should be supported by testing, accepted industry standards, validated processes, and sufficient design margin to assure that the risk has been reduced.
Leveson, N., “The Role of Software in Spacecraft Accidents.” AIAA Journal of Spacecraft and Rockets, vol. 41, no. 4, pp. 564-575.
Pavlovich, J. Gregory. 1999. Formal Report of Investigation of the 30 April 1999 Titan IVB/Centaur TC-14/Milstar-3 (B-32) Space Launch Mishap. Washington, D.C.: U.S. Air Force.
Airplane Crash Near Nova Scotia
(posted 5/10/2012)

photo: TSB
On September 2, 1998 Swissair flight 111 traveling from New York to Geneva, Switzerland crashed in the Atlantic Ocean near Halifax, Nova Scotia. All on board were killed in the crash, including 215 passengers and 14 crew members. The Transportation Safety Board of Canada found in their investigation that a fire had started above the ceiling of the cockpit prior to the accident, likely ignited by arcing in the in-flight entertainment network wiring. The fire spread, ultimately causing loss of all navigation equipment and eventual loss of control of the aircraft. The fire was fueled by the acoustic insulation blankets used in the ceiling; these blankets were constructed with a metallized polyethylene terephthalate (MPET) cover material. Later tests of MPET found that, once ignited, this material would be consumed by fire. Although the insulation blankets had passed FAA’s flammability certification tests, the tests themselves were later found to be insufficient. The Transportation Safety Board also found that no smoke and fire detection or suppression devices were required in the ceiling area, and none were installed. Had these devices been installed the pilots may have been alerted to the problem and the fire may not have spread as rapidly as it did. The crew also had no procedures or training to assist in fire detection and firefighting, according to the Transportation Safety Board. The board also found seven different occurrences between 1993 and 1999 where these insulation blankets had ignited and burned on aircraft. The Civil Aviation Administration of China investigated three of these incidents and provided the FAA with their report documenting the flammability of MPET material. This report alerted FAA to the concern and recommended that the insulation material be replaced. The FAA stated in its response to the report that it would perform its own investigations, but did not require additional testing of the MPET material. Such additional testing may have uncovered the flammability issue. On its own, aircraft manufacturer McDonnell Douglas had concluded that an expanded set of test conditions was needed to determine blanket flammability characteristics. From the results of their tests, McDonnell Douglas had discontinued the use of MPET insulation blankets in production aircraft. However, use of these insulation blankets continued throughout the industry until after the Swissair accident, when the FAA issued a directive requiring the removal of such blankets from existing aircraft.
Lessons Learned: Analyses after accidents often show that clues existed before the mishap occurred. Such clues frequently take the form of anomalies observed during the life cycle of a project. An anomaly is an apparent problem or failure that occurs during verification or operation and affects a system, a subsystem, a process, support equipment, or facilities. Anomaly or problem reporting and corrective action, therefore, can play an important role in System Safety analyses. An effective anomaly report and corrective action process not only allows for the reporting of problems, but also implements a closed-loop process for finding and fixing the root cause of a problem. In the case of this accident, if the near misses had been properly reported and analyzed, this accident may have been prevented.
Transportation Safety Board of Canada, “In-Flight Fire Leading to Collision with Water, Swissair Transport Limited, McDonnell Douglas MD-11 HB-IWF, Peggy’s Cove, Nova Scotia 5 nm SW, 2 September 1998,” Report Number A98H0003, September 2, 1998.
Visit this page weekly for a new system safety lesson learned.