Reliability-centered maintenance and data centers
Among the popular buzzwords bandied about in the data center industry today is Reliability-Centered Maintenance or RCM. The term is so prevalent in the industry’s marketing lexicon at this point that it’s hard to tell which companies really understand what it is – or how to do it. In fact, many people in the industry are unaware that there is a standard by which RCM practices are measured, and a governing body that sets the standard. Reliability-Centered Maintenance (RCM) is an engineering study conducted to determine the best course of action for maintaining a particular system or process. The Society of Automotive Engineers (SAE) defined the RCM process in their technical standard SAE JA1011, Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes (1998). The SAE standard sets out the minimum criteria that must be met before a process can legitimately be called RCM. The standard consists of seven questions that must be answered – and answered in this order – for the process to be called RCM:...
Read MoreEmergency Procedures: Help or hindrance
Your fire alarm goes off. The sirens blare, the strobe lights flash, and some sort of mechanized voice keeps informing you that there is a fire and you must go to the nearest exit. Most of the people in the facility do exactly that – head for the nearest exit. But what about your facilities staff? What are they doing? I can see them now, calmly going to the emergency procedure manual, carefully reviewing the index to select the right procedure, then diligently reading and checking off each step in the procedure precisely as they were trained. Never mind that the two-page procedure incorporates a 4-page checklist that would take an hour to complete, if it were actually up to date. (Well, maybe you could consider the procedure up to date if you include the five Post-It notes on various pages that add the few minor things that were left out of the original procedure – little reminders like remember to check the power to the backup system and the current facility manager’s correct cell phone number....
Read MoreUnderstanding human-caused downtime
In my thirty or so years of working in mission critical facilities, I have studied and investigated many incidents involving human-caused downtime. Most of these incidents fall into five major groupings – all preventable. Communication Errors Spoken communication is tough. If you don’t believe it, just ask the people working on Siri, Dragon, or other speech-recognition software. Local slang, vernacular, pronunciations, and meanings can add confusion and misunderstanding. When I reported to my first submarine in the Navy, there were announcements being made over the boat’s PA system that I didn’t understand for a couple of weeks. Usage of abbreviations, local designations, and “speed announcing” made it difficult to understand. Another problem I noticed: For those that had been there for some time, the announcements actually faded into the background noise…another very dangerous situation, especially since these were important safety announcements. Have you ever listened to a song on the radio and then later realized the actual lyrics were something entirely different than what you thought? Our minds can play tricks on us. Oftentimes, we hear what we want to hear or expect to hear (Hearing What We Want to Hear, 4/1997, Chenausky). Add to that communications that are not clear…such as using letters like “C”, “B”, and “D” within spoken operational orders and you start to appreciate the complexities that we interject into our communications. How we communicate can add risk to our operations....
Read More
Follow Us!