Pumps & Systems, September 2007
When I was first asked to define MTBF, MTBR, and MTBPM, I wasn't sure why. Of all the myriad reliability metrics employed, I had to ask myself, "Why were these singled out?" It wasn't until I ran across the following definition from PIP did I understand:
Process Industry Practices (PIP) defines Mean Time Between Repairs as: "The most common measure of operating reliability typically stated as the average operating calendar time between required repairs for a particular piece of machinery, type of machinery, class of machinery, operating unit or plant. MTBR is not Mean Time Between: (a) Failures, (b) Planned Maintenance, or (c) any other categorization of shutdowns. MTBR calculations include Repairs due to (a) Failures, (b) Planned Maintenance, or (c) any other categorization of Repair events."
I was surprised to find all three reliability metrics of interest mentioned here. As I thought more about this PIP definition, I began to realize why these metrics are so important and why they need to be better understood.
Dealing with Dirty Data
Before I define the reliability terms in question, I want to provide some perspective on the person that the PIP standard was written for: maintenance personnel. I have worked in maintenance organizations for over 20 years, so I feel somewhat qualified to present the maintenance perspective on maintenance data analysis.
Let's first consider a hypothetical pump timeline (see Figure 1).
We can see that this timeline is composed of various event types, i.e. failures, repairs, and PM events. Ideally, we would like to know how long a new or refurbished pump lasts before it fails. But there is always a trade-off between theory and practice. In reality, you are usually only able to determine the average or mean between time between failures or repairs. Reliability theory tends to deal with failure data, while maintenance organizations deal with maintenance events. "But aren't failures and maintenance events the same thing?" you ask. My response is, "Not at all." Maintenance events fall to many categories, such as:
- Repairs to restore pumps to serviceable conditions
- Regular internal pump inspections
- Preventative maintenance events, such as oil changeouts
- Predictive maintenance events, such as data collections
- Preemptive repairs that are done before a pump actually fails
- Maintenance activities that are not associated with a pump but are credited to a pump's functional location due to the proximity of the work
Only the first category actually pertains to a known pump failure. It should be noted that the second category is deemed to be a repair by PIP if the inspection uncovers a failed or failing component.
To make matters more complicated, defining failures in real world environments can sometimes be challenging. If you are running tests on light bulbs, it's easy to know when failure occurs. However, here are a few examples demonstrating the difficulties in determining what is and what is not a failure:
- You discover a seal has a one drip per hour leak. Is this a seal failure? When did it start leaking?
- During a planned pump inspection, you find the impeller has lost 50 percent of its thickness. Is it a failure? If so, when did it pass the threshold from acceptable to unacceptable?
- A pump vibration levels jump from .11-ips to 0.25-ips from one pump inspection to the next. Management wants to repair the pump before things get worse. Is this a failure? When do you say it failed? Can you say it's 80 percent failed when it was removed?
One thing maintenance folks (and accountants) know for certain is when they have performed maintenance on a pump. In addition, they store their maintenance data to the point of information overload. Ask any maintenance engineer or specialist for pump maintenance data and he or she will present you with reams of it. The problem is that it's usually in a form we call "dirty data." Dirty data is an aggregate of predictive maintenance, preventative maintenance, repair and extraneous data that must be carefully culled before it is usable.
Let's look at some sample pump data in Table 1.
We have run a hypothetical report of completed work orders for Pump 101 over a 15-month period. Over that time, we see there have been 14 completed work orders (a completed work order is any work order that has been created and closed). Does this mean we have experienced 14 failures or have completed 14 repairs? Certainly not.
Any experienced maintenance person can look at Table 1 and determine which work orders represented real repairs, which ones were preventative or predictive maintenance activities, and those that were unrelated to the pump, such as the leaking suction valve. (By the way, the lack of details in Table 1 is typical of real-world data. We never have all the details required to make a fully-informed decision on the true nature of maintenance events.)
Note that I have highlighted two work orders in green that I believe represent actual repairs. So, instead of 14 repairs we really only have 2 repairs required to return this pump to operative condition.
In summary, we can state that maintenance organizations: