Understanding human-caused downtime
In my thirty or so years of working in mission critical facilities, I have studied and investigated many incidents involving human-caused downtime. Most of these incidents fall into five major groupings – all preventable. Communication Errors Spoken communication is tough. If you don’t believe it, just ask the people working on Siri, Dragon, or other speech-recognition software. Local slang, vernacular, pronunciations, and meanings can add confusion and misunderstanding. When I reported to my first submarine in the Navy, there were announcements being made over the boat’s PA system that I didn’t understand for a couple of weeks. Usage of abbreviations, local designations, and “speed announcing” made it difficult to understand. Another problem I noticed: For those that had been there for some time, the announcements actually faded into the background noise…another very dangerous situation, especially since these were important safety announcements. Have you ever listened to a song on the radio and then later realized the actual lyrics were something entirely different than what you thought? Our minds can play tricks on us. Oftentimes, we hear what we want to hear or expect to hear (Hearing What We Want to Hear, 4/1997, Chenausky). Add to that communications that are not clear…such as using letters like “C”, “B”, and “D” within spoken operational orders and you start to appreciate the complexities that we interject into our communications. How we communicate can add risk to our operations....
Read MoreTraining costs — but the lack of training costs more
“In 1979, the Three Mile Island Nuclear Generating Station suffered a meltdown which caused billions of dollars in losses. One of the main contributors to the accident was lack of training (Presidential Commission Report, 1979). In 2008, a powerful explosion at the Bayer CropScience plant in West Virginia caused the death of two operators and injured eight others. This plant was producing methyl isocyanate (MIC), the same gas that killed over 4,000 in Bhopal, India. The explosion was caused in part by lack of training (U.S. Chemical Safety Board, 2009). Though not as dramatic, but possibly just as costly, Google’s reputation was tarnished and the company lost millions of dollars after they experienced an outage of all of their App Engine applications. One of the listed causes of the outage was a lack of training (Google, Inc., 2010). There are countless examples of why training is such an important investment for mission critical facilities....
Read MoreBeat Google!
When I worked for a large data center co-location provider, I was privy to the headquarters of several large internet companies. During this time, these companies were in fierce competition for the Internet search market. When I visited Yahoo, there were signs up that said “Beat Google!” I was at Microsoft and there were signs saying “Beat Google!” A few months later, I started work at Google and I saw a sign on one of the cubicles that said “Beat Google!” Google was one of the most innovative companies I had ever worked for. And being involved in that environment made me realize that truly innovative companies never keep their focus on “the competition.” Instead, they focus on how they can advance their passions. Apple, under Steve Jobs, focused on his passion for elegance and simplicity – and that focus literally turned Apple into a company that helped define our culture. Google focuses on doing “cool” things and in the process changed the way we find things on the Internet and how we navigate this world. Jeff Bezos followed his passion for business – writing up the business plan for Amazon while driving from New York to Seattle – and revolutionized retail. I’m pretty sure that any business Jeff wanted to pursue would have been wildly successful. ...
Read MoreConfusion in the facility…
An operator at a critical facility entered the electrical distribution room. He started to isolate a part of the system for a routine maintenance in accordance with an approved procedure. When he actually turned the switch to isolate the system, a major portion of the facilities power was lost with a large portion of the supported customers. It was determined afterward that the operator actually entered the wrong distribution room, shutting down power to the wrong part of the facility. At a nuclear power plant, an operator following an approved procedure to perform maintenance on some instrumentation caused the reactor to scram, shutting down the plant. The operator mistakenly hooked up a test signal to the wrong instrumentation causing a power spike to be seen by the protective circuitry, causing the shutdown. Each of these incidents was caused by confusing labeling — the first by two identical electrical distribution rooms next to each other with very small labeling, the second by labeling that made it difficult to tell which system it belonged to. I have seen companies spend enormous time and effort on the design, procedures, and training; but when it came to labeling, almost no thought or effort was made in this area to eliminate risk. Labeling and system identification should be addressed during the design and construction, but unfortunately so many of our facilities were built without labeling being a priority. Lucky that labeling issues can be easily addressed post construction....
Read MoreTeach someone something every day!!!
I have worked in the nuclear industry for over 20 years and one of the major tenets of the industry is training. We train you when you first get here, we train constantly while you’re here, and we train you when anybody learns anything about anything relevant. Needless to say, all this formal training costs. It is not uncommon for the training budget to be one quarter of the operational costs of the site. Training is mandated by law, regulations, and good practices. After all who wants people operating nuclear power plants who are not properly trained? With all this training I had the opportunity to work with some of the highest trained and operationally ready people in the world. It was a wonderful experience. I support and believe that the consequences of failure demand this type of training routine and program. Most mission critical facilities organizations cannot afford to support this type and level of training. That is the hard reality of economics of our industry, so what can we do?...
Read More

Follow Us!