As a cornerstone of the maintenance discipline, reliability-centered maintenance (RCM) can achieve benefits in a vast number of areas depending on where and how it is applied.
When properly implemented, RCM provides companies with a tool for achieving lowest asset Net Present Costs (NPC) for a given level of performance and risk.
This implies a cashable effect across a multitude of economic activities, covering both OPEX1 and CAPEX2 .
However, RCM will also provide companies with a range of non-cashable advantages that will have a positive effect throughout the enterprise.
This document contains a brief list of potential areas of benefit only, and not the entire range of potential uses of RCM. Along with these areas, the author has previously used RCM for
- Capital submissions in regulated industries
- To reduce the risk of legal ramifications in management of environmental integrity
- To establish a tool for contract negotiations related to outsourced maintenance
- To reduce of a company's carbon footprint
- As a means of developing troubleshooting guides
The information in this module is to alleviate some of the benefit anxiety that often surfaces in the early implementation stages of large-scale RCM projects, and to provide guidelines for trainee RCM analysts.
The cashable results of RCM
Direct cashable benefits from implementing RCM can emerge in every area where maintenance and operations have an effect.
This can include such disparate areas as increased uptime, decreasing energy usage, reductions in chemical utilization or reductions in inventory holdings and routine maintenance spending.
Instead of trying to cover all the potential areas where the method can deliver financial results, this section will focus more on how RCM influences the profits and losses of an enterprise.
This is evident in two principal areas:
- Increases in potential revenue
- Direct cost reductions
Direct cost reductions
The main, noticeable result of RCM is a dramatic change to the maintenance regimes that are in place.
John Moubray, a pioneer in this field until his recent passing, regularly stated that RCM would achieve “a reduction of between 20% and 70% in routine maintenance where there is an existing scheduled maintenance program.”
Based on the experience of the author, this leads primarily to an increased level of cost-effectiveness of maintenance, particularly in industries that are very asset intensive.3
The team is able to claim benefits in these areas where there is a calculable reduction in the cost of labor, materials or consumables to perform maintenance4 during a reasonable amount of time (usually a year).
Logically, these are only potential benefits at the completion of the analysis, because it will take until the first omitted routine, or the first breakdown requiring reduced resources, before savings begin to accrue.
However, once implemented, they can easily be counted through direct calculation. For this to be accurate, there is a need to quantify both the routine maintenance costs as well as the corrective maintenance costs.
There are some real-world limitations on attempting to forecast cost reductions purely through accumulated data.
The first issue the team can face is that current maintenance regimes often do not exist in the company’s ERP or CMMS program, or they group them at a high level.
Data losses, poor ERP management and distrust of technology means that experienced technicians often keep the knowledge of existing maintenance outside of corporate systems.
Further compounding the issue is the disparate way that maintenance routines are stored. At times, they are at an asset level, a maintainable item level, and still other times they can be at higher system or unit levels.
A second limitation is that on the occasions when RCM proposes a more rigorous policy, there is a tendency to overlook the change in reactive and corrective maintenance.i
Still, some direct cost-reduction cases are obvious and do not require a detailed activity analysis.
Every task in an RCM analysis must be both applicable, meaning it is physically possible to do the task, and effective, worthwhile doing in terms of cost and/or risk, before selection as an adequate failure-management strategy.
When maintenance is developed using an unstructured method, there are common errors that can occur.
1) Ineffective Maintenance
One of the great misleading statistics in asset maintenance today is the calculation of the average life of bearings. This supports the outdated and almost mystical belief that there is a link between age and failure.
Based on this way of thinking, it is still common to find maintenance departments carrying out hard-time bearing replacement programs as a means of managing risk.
However, it has been the experience of the author that hard time bearing replacement policies can increase, rather than decrease, the likelihood of failure, while, at the same time, increasing direct maintenance costs.
This flies in the face of popular beliefs and is an example of how RCM thinking can drive reductions in routine maintenance levels.
The original Nowlan and Heap reportii specifically spoke about bearings when addressing failure in complex assets.
A complex item, as opposed to a simple item, is one that is subject to many failure modes. As a result, the failure processes might involve a dozen different stress and resistance considerations.
Even with complex items, failures related to age will concentrate about an average age for that mode. However, bearings have many failure modes.
Where there is no dominant failure mode5, as is the case in complex items such as most bearings, then distribution of the average life of all the failure modes is widely dispersed along the entire exposure axis.iii Therefore, failure will be unrelated to operating age. This is a unique feature of complex items.
When deciding maintenance policies for bearings, this issue is further exacerbated by the provision of the L10 life by manufacturers. This number represents the point at which 10% of the items may have failed, meaning that 90% will have survived.
Lieblein and Zelen, in their seminal work on the subject of bearing lifeiv, found that the characteristic life, the point where statistically 63.2% of the items will have failed, was roughly five times the L10life.
They also found that the “life” forecasts had a median Weibull Beta value of 1.4, indicating a near-constant probability of failure. This means that the likelihood of failure at any point in the life of the bearings in their study increased only marginally as the asset aged.
Other published analyses have quoted a beta of “1.3” for ball and roller bearings, and a beta of “one” for sleeve bearings.v
In process manufacturing industries, we find contaminated oil to be one of the frequent reasons for early life failures. However, this is only one of the multitudes of stresses that bearings face as complex assets.
Others can include poor storage leading to false brinnelling and early corrosion, excessive heat and pressure, overloading, exposure to vibration, abrasions and cracks. All of these could contribute to either early life failures or premature wearout.
Often, the L10 life is mistaken for an end-life point for bearings, and is used as a reference interval for replacement tasks. However, as can be seen from the information above, it is not the end-life, rather a minimum guaranteed life for 90% of bearings under specific load conditions.
This is in line with Nowlan and Heaps’ findings and shows that, in many cases, we are at best wasting a large portion of the bearings' useful life, making this an ineffective use of maintenance resources.6
Increased bearing life and decreased labor costs are not the only potential savings. By frequent replacing of bearings on, say, motor shafts, we introduce the likelihood of a range of additional failure modes.
For example, installation and frequent changeout failures include:
- Wear of the motor shaft, decreasing the adequacy of the interference fit; leading to bearings spinning on the shaft (A failure of the motor, not of the bearing)
- Overheating of the bearing, which leads to early life failures and distortion of the inner race
- Excessive force (i.e. hammers) instead of bearing pullers, damaging the races of the bearings and leading to early life failures
- Bearing misalignment
- Wrong bearing selection
- Pre-failed bearings due to poor storage techniques
While we can manage some of these, others are a direct result of frequent bearing changes.
Therefore, if we use hard time bearing replacement as a maintenance policy then we are:
a) Reducing the maximum used life of the bearing
b) Increasing the likelihood of failure through the introduction of several additional failure modes
In the RCM decision algorithmvi, a management policy for an Evident Operational and Non-Operational failure mode must comply with the following:
“Over a period of time, the failure-management policy must cost less than the cost of the operational consequences (if any) plus the total cost of repair.”
Ineffective maintenance is more common than most professionals think. It can also include areas such as maintenance out of context, where maintenance regimes are unaligned with how the asset is used, or practices that decrease an asset's efficient operations.
Using the decision algorithm in RCM, the first option available to the team is predictive maintenance. Where this is both applicable and effective, it will increase the effectiveness of maintenance in a range of areas:
- Predictive maintenance detects the signs of the onset of failure, and it provides the capability to manage all failures, including random failures
- It can be done in situ and often without interfering with the normal operation of the process
- It will ensure that the asset utilizes all of its economically useful life, as opposed to hard-time replacements
2) Inapplicable Maintenance
The mistaken belief that there is always a relationship between age and failure leads maintenance departments to all sorts of policies that, in practice, achieve nothing.
Often, these occur during maintenance turnarounds. The opportunity to access items that are normally in a running state drives people to inspect items just in case a life-related failure mode has developed.
In particular, this, again, is a common activity in relation to bearing management.
For example, a turbine turnaround occurs once every three years (say) for other failure management reasons.
The maintenance department has taken this opportunity to perform a dye penetrant check on the bearing to see if any cracks are starting to form, requiring them to take action.
On the face of it, this appears to be a perfectly valid, even wise, use of the opportunity. However, on applying the RCM logic a little closer, this perception changes dramatically.
For the sake of this example, we will say that the P-F interval is about three months. Meaning once we detect cracks in this particular bearing, we have around three months of time prior to functional failure.
If we test the bearing on a hard-time basis of every three years, and the P-F interval is three months, then the following logic applies.
a) The dye penetrant test is only useful if the bearing failure is occurring at the time of inspection.
b) This means it had to start developing at less than three months prior to opening.
As we shutdown every 36 months, the likelihood of this occurring (given the randomness of bearing failure) is around 1:12.Moreover, the likelihood of it not occurring is around 11:12. This task does not satisfy the RCM applicability criteria and is a waste of resources.
In addition, opening the bearing housing and interfering with the bearing, which presumably is operating fine, we again introduce the possibility of human error.vii
It is difficult to categorize this maintenance practice directly; but the closest match in RCM is predictive maintenance (PTIVE).
In the RCM decision algorithm, this means the team needs to answer all of the following questions before this task is applicable:
- Is there a clear potential failure condition?
- What is it?
- What is the P-F interval?
- Is the interval long enough to take action to avoid or minimize the consequences of failure?
- Is the P-F interval reasonably consistent?
- Is it practical to do the task at intervals less than the P-F interval?
The team would be able to answer all of the above questions positively except for the last one. For the task of dye penetrant testing, it is not practical to do the task at intervals less than the P-F interval, therefore, the task is not applicable.
Inapplicable maintenance practices are widespread and, in the experience of the author, often reflect the underlying belief of a consistent relationship between age and failure.
Increases in revenue
There are two specific areas where an RCM team can claim savings.
- Where an asset, or system, has a history of failures leading to lost production opportunities. Principally, this refers unplanned shutdowns, overrun turnarounds, and startup issues of an asset or system.
- Where an asset, or system, has a history of failures leading to reduced production output. This includes areas such as utilization, quality and reduced availability.
For example:
- Reduced turnaround times
- Increased yield (quality)
- Increased availability for full production rates
The RCM team can claim these savings only where they can prove they have isolated the cause of the lost or the reduced production and have recommended a strategy that will mitigate it or prevent it in the future.
These are potential problems because it will take a reasonable amount of time, nominally one year, before effective measurement can prove reduced production losses.
However, it is often the case that there are noticeable increases in available uptime after implementing RCM maintenance policies.
Calculating benefits in this case requires the estimation the value of additional uptime, throughput or yield, as well as the reduced costs of labor and materials.
Because these are historic failures, issues such as quantification of lost production, direct maintenance costs and the frequency of failure are relatively easy to find out.
However, an alternative is to use sophisticated forecasting techniques such as Crow-AMSAA. This is time-proven as an accurate method for forecasting failure rates; enabling the team to then calculate savings from the changes to asset maintenance. This is also a valid method for forecasting savings in direct costs.
Other cashable benefits
It is the experience of the author that CAPEX, as opposed to OPEX, benefits often represent the largest cashable advantages to implementing RCM.
- A delayed use of capital, compared to the pre-RCM scenario, allowing deployment elsewhere in the enterprise. This occurs through life-extension and through higher-confidence decision-making.
- A reduction in operating losses, over the life of the asset base, attributable to correct timing of capital refurbishment and replacement tasks
- A potential reduction in the cost of capital and the cost of insuring assets, due to the increased confidence in decision-making
- Through the incorporation of risk into the budgeting process, the benefits of this are literally incalculable as they depend on how the organization uses this information in the marketplace.
- A calculable reduction in inventory holdings based on the RCM approach.
There are other cashable benefits, but the above listed items represent the most common and the least debated among the reliability communities.
The non-cashable results of RCM
RCM will increase the team's awareness of the limitations and the operational requirements of the physical assets they study, often substantially. This results in the following intangible benefits:
- A reduction in the risk of safety and environmental integrity related failure modes
- Increased knowledge of the assets, their functions and their failures
- Increased ability to trouble shoot failed assets
- Changes to P&IDs specifically, and, at times, to other process drawings
- Changes to operation procedures, training, purchasing, work practices and other related areas
- A tangible increase in the quality and integrity of asset data because of the focus of RCM
However, it is often difficult, if not impossible, to measure the extent of the effect or to link them to changes in the profitability of the enterprise. At times, the effort to do this can actually distort or obscure the achievement itself.
However, it is possible to represent some non-cashable benefits in monetary terms. The most common of these is cost avoidance.
Risk mitigation
When the mitigated risk is economic, it is often termed cost avoidance.
Where the team has implemented a policy for a reasonably likely7 failure mode, where there was an inadequate existing strategy in place, the team is justified in claiming this as a potential benefit of RCM, even though the failure has not occurred previously.
These benefits count as non-cashable for a number of reasons:
- They will never appear as part of the profit and loss of any enterprise. Nor will they cause a change to maintenance budgets or revenues.
- The team requires estimates to calculate the cost-avoidance benefit. Some failure modes might have similar consequences, affect similar assets and have overlapping effects on production.
For example, RCM teams can find themselves presenting benefits of several times the value of the entire installation. If not explained correctly, this is a false representation, which can erode the credibility of RCM and of the team attempting to implement it.
They are nevertheless valid and important benefits for the RCM team to claim.
Note the emphasis on “an inadequate existing strategy.” RCM did not invent maintenance, and often there are adequate existing failure-management policies in place.
As an output, the team will find that some maintenance regimes will disappear, some will remain and they will add some new, more sophisticated regimes.
This occurs because some of the maintenance policies in place are redundant, some are either inapplicable or ineffective, yet others are adequate means of managing failure.
Thus, there is no justification for claiming benefits where there is an adequate existing strategy to manage the failure mode. Nor is there any justification for claiming benefits where failure modes are not reasonably likely.
Other areas of risk mitigation are failure modes that would affect either safety or environmental integrity.
In many cases, these will have direct economic consequences through regulatory penalties, or through secondary economic damages caused by the failure. Where this is the case then the team can calculate the value of the cost avoided in a similar method to economic only consequences.viii
Where the failure mode will not have significant economic consequences, the delta between the discovered risk and the managed risk can represent the benefit of risk mitigation.
The principal barrier to benefits realization
When taken together, these benefits are best represented by the value quadrant, a tool for representing value of reliability programs and for communicating the results.
As the focus drifts toward risk mitigation or knowledge increase, there is a tendency for the level of momentum to slow down. This is mainly due to the reduction in understanding.
The benefits of RCM are obvious to anybody who has studied it or to any maintenance practitioner who can relate to the concepts espoused in the method. All levels within the corporation generally see different advantages to RCM and there is rarely a lack of motivation for improvement.
Implementation problems commence due to fundamental misunderstandings about maintenance and the functions of physical asset managementix. This leads maintenance departments to see increased risk where it does not exist.
For example, a maintenance manager could face any of the following recommendations:
- Elimination of hard-time replacement policies where applicable and effective.
- Elimination of invasive inspections while we have the opportunity on planned turnarounds.
This reluctance to change comes from the perception that this is risky, and instead of implementing the policy changes, things stay as they are.
The result is more of the same.
- The risk of unplanned failure stays provably higher.
- The effectiveness of maintenance stays provably lower.
Moreover, resources remain tight when performing maintenance that is not required or when repairing problems caused by the activities that are supposed to prevent them.
It is clear that before we can successfully implement the strategy outcomes of RCM, we first need to make sure that there is a deep understanding within the company about modern reliability principles.
The role of the RCM facilitator/analyst
In a time of continual change, the ability to implement is one of the most prized and sought-after skill sets.
When training RCM facilitators, I always have highlighted the importance of momentum and the vital role of benefit awareness in creating momentum.
RCM often requires the cooperation of a range of departments, including purchasing/stores, human resources/training, operations, maintenance and the engineering department.
In the experience of the author, initiatives are not successful over the medium to long term when companies try to order change. If you want to change the way an organization works fundamentally, then people have to want to change.
For this to happen they need to understand the logic behind RCM, and they must understand what the benefits are to them in their present role. One of the useful tools for engaging people is a solid, fact-based benefits cases for every analysis that is completed.
If it is to be effective, then this task should commence during the analysis period itself, and presented before implementation.
1 OPEX — Operational Expenditure
2 CAPEX — Capital Expenditure
3 Asset-Intensive — Industries where asset maintenance and asset replacement form major parts of OPEX and CAPEX
4 Maintenance refers to both routine and corrective or reactive activities.
5 Dominant failure mode — the most common cause of failure
6 Over one machine, this appears to be a very small maintenance cost item. However, when applied throughout a plant, or on the so-called “critical” assets, it amounts to a significant maintenance cost.
7 What constitutes reasonably likely is specific to each company, and often to each RCM analysis. Methods for determining reasonableness are not included in this module.
i The issues surrounding RCM and WoL asset management are covered in more detail in the article “RCM and Whole-of-Life (WoL) Asset Management”
ii Reliability-centered Maintenance, F.S. Nowlan et al, United Airlines, San Francisco, December 1978
iii Reliability-centered Maintenance, F.S. Nowlan et al, United Airlines, San Francisco, December 1978
iv Statistical Investigation of the Fatigue Life of Deep Groove Bearings, J. Lieblen and M. Zelen, Journal of Research of the National Bureau of Standards, Vol 57, No 5, November 1956.
v Bloch, Heinz P. and Fred K. Geitner, 1994, Practical Machinery Management for Process Plants, Volume 2: Machinery Failure Analysis and Troubleshooting, 2nd Edition, Gulf Publishing Company, Houston, TX
vi The RCM Decision Algorithm is based on Figure 17 — A Second Decision Diagram Example, page 49, SAE JA1012, 2002-01
vii Human error is discussed in detail the article — Introducing Human Error.
viii Cost avoidance calculation methods are available in Handout RCM-DO-07a Calculating Costs Avoided, inspired by the work of Steve Soos on this subject.
ix The Role of the Maintenance Manager, Daryl Mather, 2008:
- Design effective maintenance policy
- Execute them as efficiently as possible
- Collect relevant data for higher confidence decisions in the future