The list of the 50 most interesting Wikipedia articles (and 50 more) has been blogged elsewhere. But while I was reading the second list, I stumbled across one of the coolest things I’ve seen on the Web in a while.
You may have noticed that a recurring theme of this blog is failure, mostly sports failures. That’s because failure is a sort of academic interest of mine. I gloat over the Detroit Lions going 0-16, but there’s more to it than that.
I want to know what allowed the network security breech. I want to know why the plane crashed. I want to understand what led up to the refinery explosion. I want to understand these things, because I want to know where the mistakes were made, and how to avoid repeating them. When I see the same patterns that led to the reactor meltdown repeating in my work environment, I want to be the person who steps up and says “Stop.” I want to learn from other people’s failures, so I don’t have to learn from my own.
I’m not the only person who feels that way. Books like Petroski’s To Engineer Is Human: The Role of Failure in Successful Design and Perrow’s Normal Accidents have been major influences on my thinking.
So I was very pleasantly surprised to discover, while reading the Wikipedia entry about the SL-1 incident, that NASA has something called the Process Based Mission Assurance Knowledge Management System. As best as I can put it, this is where NASA manages their knowledge of failure, and tries to learn from it.
One of the things you can find there is System Failure Case Studies. These are short PDF documents (about 4 pages) discussing various failures, and what lessons can be learned from them. (These documents include discussion questions, such as “Do you feel that the chronic pressure of aggressive schedules is adequately balanced with attention to safety and quality in your organization?”) Some examples:
- the loss of the USS Thresher, and the SUBSAFE program developed in response to that accident.
- the Nedelin rocket disaster.
- the loss of the R-101.
- the Mann Gulch fire. (Speaking of the Mann Gulch fire, if you haven’t read Young Men and Fire, you really need to.)
- the USS Forrestal.
- Apollo 1.
- The “Big Dig” tunnel failure.
- United 232.
- The Kansas City Hyatt Regency walkways.
- The Minneapolis bridge collapse.
- and, yes, the SL-1.
There’s also another series of documents, VITS, which appear to be PDF files of PowerPoint presentations. Some of the SFCS documents have VITS presentations that go along with them; other VITS presentations, such as this one on Hurricane Katrina, stand alone.
TJIC might argue with me (he’d probably suggest that private insurance companies have a strong motivation to do the same thing, and he’d have a pretty good point), but at some level, this is one of the things I want my government to do.
(Edited to add: And the Nets are now 0-17.)
[…] have previously written about my interest in failure and failure analysis, so I feel compelled to link to this fine examples of failure […]