Effective system safety and emergency management efforts require learning from failure, and from success. Lessons learned will be presented here, often illustrated through an accident or incident. Note that in discussing these events, the intent is not to oversimplify the conditions that led to the incidents or to place blame on individuals and organizations. Rarely is there only one identifiable cause leading to the accident. Accidents and incidents are usually the result of complex factors that include hardware, software, human interactions, procedures, and organizational influences. Readers are encouraged to review the full investigation reports referenced to understand the often complex conditions that led to each accident discussed here.
Wharf Collision in Sydney
On March 4, 2005, the ferry Collaroy operated by the Sydney Ferries Corporation collided with a wharf in Sydney Cove, Australia. There were no passengers on board at the time, and the crew was not injured. The ferry received minor damage, but the backboards at the wharf were extensively damaged. The collision occurred when the master of the vessel was not able to stop the ferry. The primary control of the propulsion system failed, and back-up systems were inoperative as well, leading to loss of control. Warning systems were also inoperative. The Collaroy was equipped with a propulsion control system that relied on four Programmable Logic Controllers (PLC). Two PLC units were assigned to each propeller such that there was always one main PLC and one backup. The system was designed such that if one of the PLCs failed the control system automatically switched over to the backup unit. At the time of the loss of control one of the PLC had failed because of a failed electronic circuit in a logic card on the PLC. The system should have then reverted to the backup PLC. However, it was discovered by the Australian Office of Transport Safety Investigations accident investigation team that the backup PLCs were not turned on. Because the backup systems were not turned on there was no warning alarm of a PLC failure. Without a backup system the propulsion control was lost upon failure of the primary PLC. The ferry company informed the investigators that prior to the accident they had experienced faults in the PLC electronic card circuitry. These faults occurred because of repeated cycling from being turned on and off, which heated up the circuits. These faults resulted in loud warning alarms, which became a nuisance to the crew. Therefore, an alternate PLC start-up procedure was used, but this alternate procedure may have fooled the crew into thinking that the backup units were on when they had not started. The report called for improved risk assessments of the propulsion control system and additional crew training for emergency situations.
Lessons Learned: As systems become operational, changes are often made in processes and procedures. Operational modifications may not only change the nature of the known hazards, but they can also introduce new hazards. The hardest to control are those procedural changes that are gradual. Small changes may be made to the operation as more is learned until the resulting processes (and associated hazards) are much different than those originally envisioned. If the reasons behind the original procedures are not documented, then changes may be made to processes or procedures without understanding the potential to increase risk. Note that operational changes can include change to personnel, contractors, and management structure.
Office of Transport Safety Investigations, “Collision of the Manly Ferry Collaroy Number 3 West Wharf, Circular Quay, 4 March 2005,” OTSI File Ref: 03545, November 25, 2005.
Explosion in Ohio
On May 4, 2009, two employees were seriously injured and two others sustained minor injuries from an explosion at Veolia ES Technical Solutions, LLC, in West Carollton, Ohio. The initial explosion was followed by multiple secondary explosions that damaged every structure on the company site. In addition, residences and businesses in the surrounding area were damaged by the explosions. Veolia provided hazardous waste services for industrial and municipal customers. This facility received waste products, both hazardous and non-hazardous, which were typically spent solvents from industrial generators. After distilling the waste products, the clean solvent could be sold to other industrial users. Immediately prior to the explosion the operating crew had shut down a tetrahydrofuran (THF) solvent recovery process per procedure. After completing this process the pipes required cleaning. The crew performed this cleaning function by back-blowing nitrogen gas through the piping into the dirty tank. The U.S. Chemical Safety and Hazard Investigation Board (CSB) investigated the accident. The CSB accident report stated that employees heard a loud vapor release just before the explosion, and they detected strong THF odors. The CSB found that the tanks were equipped with relief devices to protect the tank from overpressure. However, the relief devices vented directly to the atmosphere. This uncontrolled venting allowed highly flammable THF vapors to accumulate to explosive concentrations outside the process equipment. The THF should not have vented during normal cleaning operations, but the CSB stated that it was possible that the dirty tank was not manifolded properly, allowing overpressurization of the tank. Another possibility was that accumulated THF residue became active when exposed to oxygen. Two natural gas-fired boilers in a nearby lab/operations building likely served as the ignition source. The CSB found no record of a process hazard analysis to evaluate the siting of a lab/operating building so close to the operating units. The Occupational Safety & Health Administration (OSHA) issued citations for numerous violations following this accident, including failing to conduct compliance audits every three years to ensure policies and procedures were being followed for hazardous chemical, worker training deficiencies, inadequate testing and inspections of piping and processes, and a lack of written standards for operating procedures. The CSB recommended that the company conduct a systematic process hazard analysis on all OSHA Process Safety Management covered processes to ensure all buildings and structures at the West Carrollton facility were located and designed in accordance with electrical classification and spacing as defined in appropriate standards.
Lessons Learned: Numerous accidents have illustrated that failing to implement a systematic approach to safety can lead to unidentified hazards, underestimation of risks, and ineffective mitigation measures that ultimately result in undesired consequences and poor responses to an emergency. A systematic approach means that the effort is planned, documented, and formalized using appropriate analysis tools and techniques.
U.S. Chemical Safety and Hazard Investigation Board, “Case Study, Explosion and Fire in West Carollton, Ohio,Veolia Technical Solutions, LLC, West Carollton, Ohio, May 4, 2009,” Report No. 2009-10-I-OH, July 21, 2010.
Spacecraft Inadvertent Firing
The Viking program was a NASA project to send two spacecraft to Mars in 1975. The first spacecraft was launched on August 20, 1975. While preparing the second spacecraft for launch, an anomaly caused the Reaction Control System (RCS) thrusters to fire while the spacecraft was in pre-launch testing. The inadvertent firing was traced to a software problem. The flight software contained a “safing” sequence. This safing sequence automatically enabled the RCS and its thrusters when an anomaly was detected to assure that the spacecraft was put in a recoverable state. During testing the Attitude Control System sensed the earth’s rotation, which it understood as a problem because it was different from what was expected on orbit. Therefore, the software activated the RCS and fired the thrusters to compensate for the perceived problem. Test personnel were able to disable the thrusters, and the spacecraft was not damaged and no one was injured. The decision was made to continue with the flight, and Viking successfully launched September 9, 1975. However, the investigation team concluded that future spacecraft should consider the potential for automated stored commands to be exercised during system test or pre-launch phases.
Lessons Learned: New hazards can be introduced in the verification process if the condition of the system is not properly understood. For example, the power could still be supplied to a control panel that is being serviced, creating a shock hazard to personnel performing maintenance. Testing can cause an unexpected change in the operational configuration or can stress a component leading to failures later. Hazards associated with testing, including those associated with automated test equipment, should be analyzed. Organizations should perform test readiness reviews for complex systems. Test readiness reviews assure that systems are indeed ready for testing and that the system being verified is of the proper configuration and safe for operation.
National Aeronautics and Space Administration, “Design Development Test and Evaluation (DDT&E) Considerations for Safe and Reliable Human Rated Spacecraft Systems,” Volume 2, NASA/TM-2008- 215126, April 2008.