Your 7-step guide to effective PM optimization
The harsh reality is that many organizations have poorly developed preventive and predictive maintenance programs (PM/PdM). From RCM analyses, we often find that 40 to 60% of PMs add little value. Many tasks are knee-jerk reactions to past equipment failures. Commonly, tasks lack precision and do not address the likely failure modes. Most groups lament over needing additional maintenance technicians but send out individuals to do PMs that add no value. When groups do work to optimize their PM program, it is often with a haphazard approach. There is a much better way.
As a reliability-centered maintenance (RCM2 & RCM3) practitioner, I find tremendous benefit in leveraging RCM2 concepts as a framework for PM optimization. Before rolling your eyes and suggesting overkill, understand my logic. RCM2 is a process to define the necessary actions to cause the equipment to continue to do what its users want in its present operating context. Competency in the RCM methodology enables one to easily apply the concepts to a lesser approach, such as failure modes and effects analysis (FMEA) or basic PM optimization. And using RCM concepts reduces the number of existing PMs frequently. For example, from an FMEA led by an individual competent in RCM concepts, more than 1,000 PMs generated over one year were eliminated across three packaging machines.
People trained in an introductory RCM2 or RCM3 course learn how equipment fails, understand the assets’ operating context, and how to choose proactive tasks to address the likely failure modes – tasks that are both “technically feasible” and “worth doing”, a point often missed without the education. They understand why they choose one type of task over another. And when a proactive task is not appropriate, they elect to do no scheduled maintenance; or change how they operate the asset from a procedure or training perspective with the maintainer or operator; or reengineer the equipment to address the concerns. Trained individuals view spare part stocking decisions and engineering the design of new assets or systems differently.
From John Moubray’s RCM2 book, and later written into the SAE JA1011 standard, there are seven questions for RCM2. From these simple seven questions, you can build a checklist template to use as a PM optimization framework:
- What are the functions and associated performance standards of the asset in its present operating context? (Function)
- In what ways does it fail to fulfill its functions? (Functional failure)
- What causes each functional failure? (Failure modes —cause and mechanism)
- What happens when each failure occurs? (Failure effects)
- In what ways does each failure mode matter? (Failure consequences)
- What can be done to predict or prevent each failure? (Proactive tasks)
- What should be done if a suitable proactive task cannot be found? (Default actions, i.e., reengineering)
A simple example to explain how to answer the seven questions is to consider a centrifugal pump located between two tanks (Tank A and Tank B). From Tank B, the process requires 100 gallons per minute. The design capability of the pump is 120gpm. Beginning with the first question, why was the asset purchased? What are the performance expectations? Writing a function statement for the pump:
“To transfer slurry from Tank A to Tank B at a minimum rate of 100gpm.” (Primary function)
Define secondary functions using the components of the acronym ESCAPES. The “C” represents control, containment, or comfort. No doubt that the safety officer would be disappointed if the pump seal or piping leaked, causing a slip or fall. A secondary function exists using containment.
“To contain all of the slurry.”
With the pump’s purpose known, I understand what I am trying to maintain in keeping the pump flow between 120gpm and 100gpm without leaks. Answering the second question will determine how the pump can fail (functional failure or failed state).
Total failure, “Unable to transfer slurry at all.”
Partial failure, “Unable to transfer slurry from Tank A to Tank B at a minimum rate of 100gpm.”
“Unable to contain all of the slurry.”
Unfortunately, the function and functional failure are typically omitted from PM optimization and, in some cases, FMEA too. Doing so can lead to too few or, often, too many failure modes identified from brainstorming activities lacking a framework. Function and function failure identification makes it easy to correctly identify the reasonably likely failure modes (cause and mechanism). At a minimum, this is the essential requirement to begin PM optimization properly.
For the total failure, “Bearing seized due to improper lubrication.”
Partial failure, “Impeller wears over time due to slurry abrasion.”
Containment, “Pump seal packing leaks following a maintenance intervention.”
The fourth RCM2 question, the failure effect, explains what happens when the asset fails, and no action prevents it. The failure effect is written as follows for the bearing failure.
“The bearing seizes, the pump stops, and the level in Tank B drops. When the tank level reaches 250 gallons, an alarm sounds in the control room. It takes four hours to repair the pump (stocked), and 2.5 runtime hours of water is in the tank before it runs dry. Downtime cost (in addition to the repair cost) is $10k per hour, and 1.5 hours are lost ($15k). The cost of repair is $4k. Maintenance history shows this has happened two times in four years.”
Note the spare parts, estimated cost of the repair and downtime, job duration, and past failure history. With typical PM optimization efforts lacking a framework, these details are often omitted in the decision process. Many of the effects are similar depending on the failure mode, i.e., downtime cost allowing for a cut and paste approach with minor changes. Even if documented at a basic level, the information can help address the “technically feasible?” and “worth doing?” questions when choosing a proactive maintenance strategy. Simple bullets on a checklist can prompt a PM optimization effort to address the effect components, i.e., spares (y/n), downtime (hours, $), repair (hours, $).
The fifth question, failure consequence, categorizes hidden, safety, or environmental, operational, or non-operational consequences for choosing proactive or default actions. With failure effects written at a basic level, the failure’s consequences can be determined and business decisions made regarding the potential loss.
Answering the sixth question provides proactive maintenance strategies when the tasks meet both the “technically feasible” and “worth doing” conditions. Test the two states for on-condition or inspection/PdM tasks first; time-based scheduled restoration, or discard tasks next; or in the case of safety/environmental consequences, a combination of tasks if a single task is not appropriate. In the case of hidden failures, where the failure would not be evident to the operating crew under normal circumstances, a failure finding task may be more appropriate.
If a proactive task from the sixth question is not appropriate, then a default action (seventh question) is required. The consequences determine whether to perform no scheduled maintenance, change the operation or maintenance of the asset with training or procedures, or reengineer it. A standard error in PM optimization is not applying the “technically feasible” or “worth doing” criteria to the task selection. With containment failure mode from above, the pump seal leaking could result from improperly packing the seal and using the incorrect packing. Selecting a proactive on-condition or time-based task would be inappropriate as the failure follows a maintenance intervention. A one-time change to the procedure to properly perform the work would be appropriate.
Another final challenge for many is implementing the checklist results, be it proactive tasks, reengineering, procedures, or training. Write tasks with precision in mind. For example, a failure mode on a cartoner might be “chain stretched due to carton jams.” Across several cartoner bucket pitches, the chain may measure 72 inches when newly installed. The chain is known to be in the failed state at 72.38 inches. Determine the frequency of inspection. We can inspect at one half the P-F interval, and when the chain measures 72.25 inches, create a notification to replace the chain proactively.
Invest in RCM2 or RCM3 training to build a level of competency in your workforce. Based on your systems, develop and implement a checklist template that uses simple bullet points to prompt people to address the seven questions of RCM2, including the “technically feasible” and “worth doing” questions. For on-condition tasks, add the P-F interval. Include the precision aspects and the frequency of the tasks. While you may never intend to conduct a single RCM2 or RCM3 analysis, gaining a level of competency and applying the RCM framework using a checklist approach will drastically improve PM optimization efforts. Be sure to capture the completed checklist in software or save it in Excel or Word files for reference later. Ideally, revisit these when a failure occurs or every 18 months to adjust the frequency if needed.
Jeff Shiver, CMRP, ARP, CPMM, CRL, MLT-I, People and Processes Inc., guides people and organizations to implement effective reliability solutions in industry and facilities environments. Connect with Jeff on LinkedIn at https://www.linkedin.com/in/jeffshiver.