Risky business – creating clarity around data center risk

Posted by on Apr 3, 2014 in Facilities Management, Incidents/Downtime | 0 comments

What does it mean to your client when you place a static switch or UPS in bypass? Most clients won’t understand that it means their business is now at the mercy of the utility supplying power. Most clients won’t associate risk with the statement at all — let alone imagine their company on the front page of the Wall Street Journal because of an outage. When you communicate with your client about maintenance or your process, is there any way to know what they really understand?

Many years ago when we were running one of our first data centers, we tried to come up with a way to relate what we did in facilities to our clients. Shawn Patrick came up with an idea that we started calling “Level of Readiness.” We rated the risk to our customers based on our equipment and process conditions. Ever since, I have used a very similar idea to communicate to the clients of data centers the level of risk our operations pose to their processes, systems, and business.

The logic is based on the different states in which the facilities equipment can exist.  From a facilities or engineering perspective, there are five basic states that critical equipment or systems can be in at any given point in time:

1. Off or secured: Not providing any service.

2. Degraded: Able to provide some part of its designed service.

3. Marginal: Able to provide the designed service, but sub-systems or backup systems are out of service.  Basically, a single point of failure or additional stress beyond the design parameters will cause the system or equipment to fail.

4. Fully Capable, but maintenance is being performed: Able to provide designed service and has backup systems that can take over if primary fails or is degraded.

5. Fully Capable, no work being performed: Able to provide designed service and has backup systems that can take over if primary fails or is degraded.

This is a good framework, but it could be more effective for clients if it was stated from their perspective:

Red DiamondLevel 1: Complete Loss – Complete loss of site and all associated services.

Orange diamondLevel 2: Partial Loss – Loss of some services, may affect entire business process or processes, may require the use of backup services or contingency plans, requires evaluation to determine actions. Example: Loss of a Power Distribution Unit (PDU) or equipment rack(s).

Yellow diamondLevel 3: Loss of Backup Protection – Condition allows for normal operations, but supporting sub-systems or backup systems are out of service.  Basically, a single point of failure or additional stress beyond the design parameters can cause the system or equipment to fail affecting business processes being supported. Work at this level does not normally cause a loss of services but does cause a higher state of risk.

Blue diamondLevel 4: Systems Fully Capable, Under Maintenance – Able to provide designed service and has backup systems that can take over if primary fails or is degraded.

Green DiamondLevel 5: Systems Fully Capable – No work is being performed on any supporting critical systems. Systems are able to provide designed service and have backup systems that can take over if primary fails or is degraded.

We rate each of our procedures in accordance with this scale so that our clients can associate a level of risk to the procedure. It helps them make better, more informed decisions about what backup processes are necessary to protect their business. While the system may not be perfect, it creates clarity between us on the subject of risk. It’s one way to take a complex, technical process and put it into terms the client can use to evaluate what level of risk is acceptable.

Just as tier levels have evolved and improved over the years, I see this scale evolving as our systems and processes change. I hope, by developing a meaningful standard that can quantify risk, we can help others have accurate discussions about risk in the facility. It’s important to communicate with clients about risk, but it’s even more important that they understand what you’re saying.

Leave a Comment

Your email address will not be published. Required fields are marked *