Effective system safety and emergency management efforts require learning from failure, and from success. Lessons learned will be presented here, often illustrated through an accident or incident. Note that in discussing these events, the intent is not to oversimplify the conditions that led to the incidents or to place blame on individuals and organizations. Rarely is there only one identifiable cause leading to the accident. Accidents and incidents are usually the result of complex factors that include hardware, software, human interactions, procedures, and organizational influences. Readers are encouraged to review the full investigation reports referenced to understand the often complex conditions that led to each accident discussed here.
Pipeline Rupture in Manitoba
On July 29, 1995, a natural gas pipeline ruptured leading to a fire on the TransCanada PipeLines Limited pipeline near Rapid City, Manitoba. Later that day a second rupture and a fire occurred in a natural gas pipeline adjacent to the first rupture. The TSB determined that the initial rupture was caused by a ductile overload fracture as a result of external stress corrosion cracking. The second line rupture was the result of heat overload from the initial fire and the delay in shutting down the first line when the rupture occurred. Immediately after the first rupture, the Winnipeg Regional Operations Controller attempted to initiate a computer-controlled emergency shutdown command to the Rapid City compressor station. Such an action would have isolated the flow of natural gas to the site and may have prevented the second pipe rupture. However, the controller was unable to successfully isolate the flow in part because wiring used to control and shut down the compressor station facilities and valves had sustained damage. In addition, the Supervisory Control and Data Acquisition (SCADA) system had not been designed or programmed to provide feedback to operations personnel on which lines had been isolated and which ones had not. The report also stated that the SCADA system was equipped with a feature that reopened valves after 15 minutes to allow restart of the line. However, the SCADA system did not include the capability to override this feature in an emergency. This meant that the controllers had to reissue commands every 15 minutes to keep the valves closed, adding to their workload during the emergency. Eventually, operations personnel were able to manually isolate the natural gas lines to extinguish the fire.
Lessons Learned: Those who operate the system, manage the system, or are at risk should be involved in the hazard analysis process so they will better understand the potential for harm. Operators and stakeholders may contribute valuable input to the process as well, providing suggestions that improve safety based on their real-world experience. In this example, operators may have provided input on what lines needed monitoring in case of an emergency, or they may have provided information on critical overrides.
Transportation Safety Board of Canada, “Natural Gas Pipeline Ruptures, TransCanada PipeLines Limited, Line 100-3, 914-millimetre (36-inch) Main Line Kilometre Post Main Line Valve 30-3 + 0.245 kilometres Line 100-4, 1,067-millimetre (42-inch) Main Line Kilometre Post Main Line Valve 30-4 + 0.220 kilometres Rapid City, Manitoba, 29 July 1995,” Report Number P95H0036, June 10, 1997.
Fire in Houston
On October 31, 2005, a fire erupted at a cooling tower at NASA Johnson Space Center in Houston, Texas during repairs to the building. The fire destroyed the cooling tower and heavily damaged other facilities, although no one was injured in the blaze. The mishap investigation found that the fire was the result of inappropriate disposal of smoking materials which ignited the wooden cooling tower. However, several root causes were identified in the report. First, the wooden cooling tower was not recognized as a fire hazard and as a result “No Smoking” signs had not been posted. Second, although the cooling tower was equipped with an automatic fire suppression system, that system was inoperative at the time of the mishap. The fire suppression system was taken off line after a failure of a clapper valve in May 2005. That system was approximately 40 years old, and repair parts were not readily available. Repair parts showed up in August 2005, but installing those replacement parts was not made a priority. The water to the fire suppression system remained off at the time of the mishap, and an alternate fire suppression system had not been implemented. Third, the response by the fire department was inadequate, with significant delays before firefighting personnel reached the fire. A contributor to the accident included inadequate supervision of the construction activities. In addition, inadequate construction and safety inspections were discussed. As stated by the report, “The fact that the [Mishap Investigation Board] found evidence of smoking violations at the job site (cigarette and cigar butts) indicated that this was an item that should have been found by inspectors. The only conclusions that can be drawn are that either the inspectors never looked for evidence of past violations or that they are not trained in detecting these violations.” The investigation also found that the worksite setup inspection also failed to identify the fire risks posed by the wooden cooling tower.
Lessons Learned: A number of accidents have resulted from a failure to regularly inspect equipment. Hardware and environments can change (or degrade), leading to conditions not expected when the hazard analysis was performed. The inspection criteria must be specified and be objective or hazards could be overlooked. Those criteria must also be adequate for the hazards envisioned. Incorrect or inadequate inspection criteria and procedures can lead to a misunderstanding and underestimation of the risk.
National Aeronautics and Space Administration, “Johnson Space Center Building 49 Cooling Tower Fire Mishap Investigation Board Final Report,” JSC Mishap # 06-0018, IRIS II # 2005-306-00006, December 23, 2005.
Explosion in Texas
On June 22, 1997, an explosion occurred at the Shell Chemical Company plant in Deer Park, Texas. The facility produced a number of petroleum intermediates by processing crude petroleum feed stocks. Although no one was killed in the explosion, several workers received minor injuries and the facility and nearby residences were extensively damaged. The U.S. Environmental Protection Agency (EPA) and U.S. Occupational Safety and Health Administration (OSHA) jointly investigated the accident. The EPA/OSHA team found that the cause of the accident was the failure of a check valve located on a high-pressure light hydrocarbon gas line. The check valve failure started a large flammable gas leak; the escaping gas then formed a vapor cloud which ignited. The report stated that the check valves had not been properly designed and manufactured for heavy duty service, and they were susceptible to failure during normal use. There were check valve failures prior to this accident, but the EPA/OSHA report stated that lessons from the prior failures had not been properly shared and implemented. These prior incidents were treated as maintenance actions and therefore no formal investigations were conducted to determine root cause. A process hazard analysis had been performed, but this analysis did not include failure of a check valve, and therefore mitigations were not implemented for such failures. Procedures were also found to be inadequate; those procedures did not instruct operators to verify the valve positions prior to restarting the process.
Lessons Learned: Analyses following accidents often show that clues existed before the mishap occurred. Such clues frequently take the form of anomalies, failures, and minor incidents observed during development of a new system or operation of an existing one. This may include not only technical failures but also issues associated with response to natural or man-made events. Analyses of previous problems and incidents can play an important role in system safety analyses. A process should be in place to analyze the root cause of those problems and then factor the corrective and preventative action back into the hazard analysis and system design and operations.
U.S. Environmental Protection Agency/U.S. Occupational Safety and Health Administration, “Joint Chemical Accident Investigation Report: Shell Chemical Company, Deer Park, TX,” EPA 550-R-98-00, June 1998.
Visit this page regularly for a new system safety lesson learned.