Reliability-centered maintenance and data centers

Posted by on Jan 27, 2012 in Facilities Management, Incidents/Downtime | 0 comments

Among the popular buzzwords bandied about in the data center industry today is Reliability-Centered Maintenance or RCM.  The term is so prevalent in the industry’s marketing lexicon at this point that it’s hard to tell which companies really understand what it is – or how to do it.  In fact, many people in the industry are unaware that there is a standard by which RCM practices are measured, and a governing body that sets the standard.

Reliability-Centered Maintenance (RCM) is an engineering study conducted to determine the best course of action for maintaining a particular system or process.  The Society of Automotive Engineers (SAE) defined the RCM process in their technical standard SAE JA1011, Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes (1998).  The SAE standard sets out the minimum criteria that must be met before a process can legitimately be called RCM.

The standard consists of seven questions that must be answered – and answered in this order – for the process to be called RCM:

1.  What is the item supposed to do and what are its associated performance standards?

2.  In what ways can the item fail to provide the required functions?

3.  What are the events that cause each failure?

4.  What happens when each failure occurs?

5.  In what way does each failure matter?

6.  What systematic task(s) can be performed proactively to prevent, or to diminish to a satisfactory degree, the consequences of the failure?

7.  What must be done if a suitable preventive task cannot be found?

As you can tell from the questions, some of the answers require very complex engineering analysis.  Imagine answering these questions for the typical 500-ton chiller and you start to appreciate the complexity and level of detail required for a true RCM analysis.  Within the chiller itself there are probably over a hundred different failure modes that can prevent the machine from accomplishing its functions.

While I have not performed an RCM analysis on a 500-ton chiller, I have analyzed similar pieces of equipment.  I estimate that a complete RCM analysis of a 500-ton chiller would require approximately 100-140 hours.  At today’s engineering rates (assume $200/hr), the analysis alone could amount to over $25,000.  Factor in the cost to write the associated procedures and maintenance plans for the machine and the total cost rises to somewhere in the neighborhood of $50,000.  One caution:  The analysis that is done for a chiller in Phoenix may not work for the exact same chiller in Toronto.  RCM is tied to the environment in which the equipment/system is operated.

Now the good news:  For all the expense, if you are selective in your application of RCM, the effort can pay off handsomely in the form of increased system/equipment uptime or costs savings.  Each situation is different, so it’s hard to say how dramatic your results will be.  However, as an example, I had the occasion to participate in an RCM analysis on the filters for a particular facility.  We found that, instead of the recommended manufacturer’s periodicities for changing out pre-filters, in the Northwest we could extend their life by up to 100 percent.  In another circumstance, we had to shorten the recommendation from 6 months to 1 month during certain times of the year due to the operations at a nearby farm.  The overall savings we achieved were close to $200K/yr for all the facilities, which paid for the cost of the analysis ($60K) in less than a year.

How can RCM be applied to data centers?  The decision to use RCM can be an exercise of analysis in and of itself.  One thing is certain, though:  Even a fairly simple RCM analysis will cost tens of thousands of dollars, so it’s best to look for areas of maintenance where annual costs run in the hundreds of thousands of dollars.  If you can increase the life of something or reduce the likelihood that it will fail, especially when failure equates to losing hundreds of thousands of dollars, investing in RCM analysis can offer big dividends.  Some areas to investigate are chillers, filters, cooling towers, UPSs and batteries, dynamic UPSs, power plants, and the like.  My basic rule of thumb is that, if you can predict a simple return on investment (ROI) that can be expressed in months (not years), then going ahead with RCM is a pretty good bet.  Where life/safety is at risk, there is no question that RCM should always be performed.

Keep in mind that partial implementation of RCM may be an effective technique to use.  In a partial implementation, the analysis is performed on a single component and not the entire system.  This might apply to evaluating only the filters of the air handling systems (costs in millions might be saved over the life of the facility).  Another method of partial implementation is to do analysis only at the component level.  This can be selected based upon the designs of systems – for example, doing the analysis in a what-if fashion for a power supply to your PLCs that control critical equipment or other components that support or are critical to your mission.

In any case, I would recommend that you talk to someone with expertise in RCM and see where you might gain from the process.  Your timely investment could provide significant cost-savings or prevent costly downtime.

Leave a Comment

Your email address will not be published. Required fields are marked *