Current issues, opinions, and mini-tutorials on system safety and emergency management.
Readings in System Safety and Upcoming Events are also provided on this page.
May 9, 2013
Georgian Express photo courtesy AirDisasters.com
There are many organizations that seem to confuse quality and safety. Quality and safety are certainly related, but are truly different concepts. Quality is defined as conformance to requirements that, if met, results in a product that is fit for its intended use. Quality assurance is a set of planned and systematic activities implemented in a system so that requirements for a product or service will be fulfilled. Therefore, quality efforts seek to assure that customer needs are met and that the system is free of defects. Quality assurance includes, but is not limited to, auditing, process verification, analysis of defects, statistical process control, sampling, analysis of variance, and design of experiments. Quality assurance certainly plays a role in safety, especially in assuring that requirements have been met. But a system can meet its customer’s needs and show no signs of defect, and yet still be unsafe. A process can be in control, meaning it shows no signs of unusual variation, and yet it could still be performing in an unsafe manner. And a system can meet all requirements and still do the wrong thing. Consider the following examples.
On January 17, 2004, Georgian Express flight 126 crashed on takeoff from Pelee Island, Ontario on a flight to Windsor, Ontario. The Cessna 208B Caravan aircraft was carrying one pilot and nine passengers; all on board perished in the accident. The Transportation Safety Board of Canada (TSB) investigated the accident and found that the aircraft was contaminated with ice. In addition, at take-off, the aircraft was considerably overweight. The overweight condition along with the ice accumulation and the prevailing weather conditions meant that the aircraft was being flown outside its design condition. In its investigation the TSB found that the pilot had calculated the aircraft weight using standard passenger weights available in the Aeronautical Information Publication. However, as stated in the report, the standard weights “did not reflect the increased average weight of passengers and carry-on baggage resulting from changes in societal-wide lifestyles and in travelling trends.” The difference between the actual and calculated weights of the passengers was about 570 lb., and the aircraft was found to be a total of 1270 lb. overweight. Therefore, while the aircraft met requirements, the requirements were flawed.
On December 19, 2007, the tug Flying Phantom sank when assisting the bulk carrier Red Jasmine while in transit on the River Clyde in England. Three Flying Phantom crew members died in the accident. The Marine Accident Investigation Branch (MAIB) report stated that just prior to the accident the Red Jasmine made a turn which the tug pilot did not see because of fog. By the time the pilot realized what was happening, Flying Phantom was being pulled over by its own tow rope, which is known as “girting.” Flying Phantom was equipped with an emergency release mechanism for the towing winch, but the release did not occur quickly enough to prevent the tug from capsizing. The MAIB found in its testing after the accident that there was a delay in the release mechanism made worse by heavier loads. The report also stated that there was a lack of procedures, training, or limits for towing in restricted visibility. In this case the fog was so thick that the crew on Flying Phantom could not see Red Jasmine. The operator of the tug did have a Safety Management System (SMS), but the information provided in the SMS documentation was generic, and it did not contain information specific to Flying Phantom. For example, the SMS did not contain information on key processes such as towing. The report also stated that the operator over-relied on its ISO 9001 quality management system to ensure that the SMS appropriate. As stated in the report, “ISO9001 is a quality management system, the aim of which is to verify that a company or organisation [sic] is following its procedures correctly. In itself, ISO9001 does not necessarily check whether the procedures are correct or appropriate and, in this case, it did not provide a means of checking that the underpinning risk assessments were adequate or that all necessary procedures were in place.” The MAIB stated that in fact the risk assessments were “immature” and that the controls and safety measures were ineffective as a result. The MAIB recommended that the operator appoint someone with responsibility for safety to proactively evaluate risks and safety shortcomings.
System safety does not simply examine customer satisfaction, statistical control, and meeting requirements, but also whether the process will cause harm and whether there are conditions that could lead to an accident. In the case of Georgian Express 126 requirements were met, but the requirements did not reflect reality; therefore, quality assurance methods to verify that requirements had been met would have been no help in preventing the accident. In the case of Flying Phantom a strong ISO 9001 quality system was incorrectly assumed to suffice for safety management, leading to a failure to adequately evaluate risk in a complex system. Poor quality can certainly increase risk, and a strong quality assurance effort supports safety, but the differences between quality and safety must be understood to avoid unexpected outcomes.
Transportation Safety Board of Canada, Aviation Investigation Report, “Loss of Control, Georgian Express Ltd., Cessna 208B Caravan C-FAGA, Pelee Island, Ontario, 17 January 2004,” Report Number A04H0001, January 17, 2006.
U.K. Marine Accident Investigation Branch, Report on the investigation of the loss of the tug Flying Phantom while towing Red Jasmine on the River Clyde on 19 December 2007 resulting in 3 fatalities and 1 injury, Report No. 17/2008, September 2008.
April 18, 2013
In his book Mission Improbable: Using Fantasy Documents to Tame Disaster, author Lee Clarke makes the point that organizations and experts often use emergency plans not for their intended purpose of assuring that an organization is prepared for disaster, but rather as tools designed to convince others that the organization has control over difficult, unknown, or even inherently uncontrollable processes and situations. These plans may have little value beyond persuading the organization’s critics, regulators, and other interested parties of how well the organization understands the problems and that they can solve those problems. Because the plans tend to be divorced from reality, the author believes that these documents can be called fantasy documents. These fantasy documents often are produced to address concerns related to a recent major event, a new system with no precedent, or a system that is considerably scaled up from previous processes. The author uses oil spills, the threat of nuclear war, and other examples to illustrate such fantasy documents.
Clarke does not believe that these fantasy documents are the result of a conspiracy to dupe anyone, but rather are born of ignorance of the risks and self-deception about the organization’s ability to address those risks. Organizations want to believe that they are in control, and managers often believe that accidents are what happen to other people and not them. They may not even realize that the assumptions behind the plans are unrealistic, and that the promises inherent in the plan cannot be kept. Of course, the real problem with this is that the gap between fantasy and reality is not evident until catastrophe strikes. And some organizations have so much faith in their emergency plans and their ability to respond to disaster that they fail to implement appropriate preventive measures. The public or employees may also become complacent when a emergency plan is presented to them by a reputable organization, so they also fail to prepare for a disaster. If an emergency plan has been developed for a new or scaled up system, the organization may not have enough historical data or previous experience to provide a reality check, so no one really knows whether the plan’s key elements can be implemented until it is too late.
Recent examples of such fantasy documents come from the Deepwater Horizon explosion and oil spill in the Gulf of Mexico in 2010. Investigations into the accident found that almost all offshore drilling companies in the Gulf hired the same firm to prepare the disaster response plans they needed to meet state and federal regulations. This firm essentially prepared the same boilerplate plan for all the oil drillers, even though the operating approach and capabilities were different for each company. The plans also failed to include realistic assessments of capabilities and resources needed to respond to a large oil spill. The plans all said that the oil drillers could handle a spill much larger than the one that occurred in the Deepwater Horizon accident, even though that was clearly not the case.
Although the focus of Clarke’s book is on emergency planning, I believe that the author makes important points that can apply to other safety plans, including System Safety Program Plans, Process Safety Management plans, and Safety Management System plans. Some indicators from my own experience that a fantasy safety plan has been developed include the following:
Safety plans, including System Safety Program Plans, Process Safety Management plans, and emergency response plans, are important tools in defining how safety efforts will be implemented and realistically assessing risk. But to be effective, these plans must reflect how business is really done, not unrealistic visions of how the organization would like it to be done. Organizations should take great care to assure that they are not developing fantasy documents.
Clarke, L., Mission Improbable: Using Fantasy Documents to Tame Disaster, University of Chicago Press, 2001.
Coll, S. Private Empire: ExxonMobil and American Power, Penguin Press, 2012.
X-Events: The Collapse of Everything
William Morrow (2012)
In X-Events: The Collapse of Everything the author John Casti coined the phrase “X-event” to mean an unpredictable, rare occurrence with extreme consequences. Casti argues that X-events are occurring more frequently, in large part because of an increase in complexity. This complexity shows up in many forms, including high connectivity among infrastructures, or increased layers of bureaucracy in an organization. Casti says that our conventional risk management approaches will simply not work to help us predict and control these X-events for two reasons: there is no way to come up with meaningful probabilities for things that have never happened and assessing the damage will be difficult for events we haven’t imagined. These extreme events include unexpected storms, the crash of the Internet worldwide, nuclear winter, pandemic viruses, loss of major oil reserves, and so on. Due to the problem of predicting outlier events, X-events are not usually factored into the design of systems. As stated by Casti, humans are now more vulnerable than ever to X-events. He says, “The complex infrastructures we depend upon for everyday life – transportation, communication, food and water supply, electrical power, health care, to name a few – are fragile beyond belief, as we’re reminded when even a small glitch in the delivery system occurs.” The book shows how we have built systems that are highly efficient, but as a result are prone to upset. But the book is not all gloom and doom. Casti states that we have the opportunity to solve or at least be able to handle these problems. Human-caused X-events are, for the most part, avoidable, or in the worst case, their damage can be greatly reduced by human attention and preventive action. He proposes that we must put effort into “designing our systems so that they work as a unified whole rather than as a collection of systems managed in isolation.” In addition, we should prepare for these events, and design systems to be adaptive and resilient. In the world described by Casti, system safety, process safety management, and emergency management are disciplines needed more than ever. Casti does a terrific job of making a strong case for why system safety efforts must be strengthened to provide hope for the future.
6th International Association for the Advancement of Space Safety Conference
May 21-23, 2013
Australian System Safety Conference 2013
May 22-24, 2013
Adelaide, South Australia
ASSE Professional Development Conference & Exposition
June 24-27, 2013
Las Vegas, Nevada
31st International System Safety Conference
August 12-16, 2013
8th IET International System Safety Conference 2013
October 15-17, 2013
Safety-Critical Systems Club Symposium, SSS 2014
February 4-6, 2014
10th Global Congress on Process Safety
March 30-April 3, 2014
New Orleans, LA