Understanding asset criticality and assessing the value of your plant's assets
Assets Anonymous is a 12-step podcast series designed to help you get grounded in reliability basics and create a culture of continuous improvement with your team. This series will feature interviews with George Williams and Joe Anderson of ReliabilityX. ReliabilityX aims to bridge the gap between operations and maintenance through holistic reliability focused on plant performance. In this episode, George and Joe explore how a criticality assessment can help pivot your plant from reactive to proactive work.
PS: When people are looking to determine what their critical assets are, what should they look for? Maybe we can start by defining what is a critical asset, when you guys start these kind of assessments.
GW: Well, that really depends on the value the asset derives, right? So first, for us, is defining what an asset is, and there's two approaches to that. One is your typical ISO standard approach, so ISO 55000 says if it adds value, realistic or potential, it's an asset.
However, that's not a great definition for “does it sit inside my CMMS?” Because there's lots of things that add value, like a parking space, but you don't necessarily build it as a piece of equipment in the CMMS. So for intents and purposes of CMMS talk and defining criticality and getting down to failure modes, we usually try to start with defining what an asset is.
The standard canned answer I like to give is, if I intend to replace parts on it, it's an asset. If I plan on throwing it out and just replacing it, it's a spare part. Now, that's a high level definition and there's some caveats to that, regarding if it is required from a regulatory agency or if it's a calibrated instrument, then it's still an asset and goes in the system. But generally speaking, we start with that definition.
From there we move into defining criticality, and criticality is based on severity factors, and likelihood of failure factors or occurrence factors. Severity is everything that sits inside the business. If you go to the www dot, insert your company name here.com/about us, they're going to have a whole bunch of stuff there that they commit to. We are going to be financially beneficial, right? We're going to make a profit. Or if they're publicly traded to our shareholders, we have a commitment. They have a commitment to safety, a commitment to the environment, a commitment to the community, All those things are ways in which the company provides value to the world.
Any asset that can impact any of those goals has a severity factor. And so when you're going through severity, you're looking at all those types of factors. Can it impact quality? Can it impact production output? Can it impact financially or safety? And you identify those impacts.
Those impacts end up with a, an average score across those things, there's lots and lots of ways to do that, and then ultimately you end up at an occurrence score, which is how often does this asset fail? That can use either history you already have or a general mean time to failure, that may exist in industry. That's a product score, and you end up with some criticality factor. And then that criticality factor typically has a much broader number range than what sits inside your CMMS. So then you convert it to either A-B-C or 1-2,-3, or 1-2-3-4-5 based on how it sits inside your CMMS. Yay! Criticality!
PS: It occurs to me that this kind of assessment is really the fundamental step that separates people who are focused on reactive maintenance and reliability versus proactive activity, because we've talked a lot about various things that we can do to help drive the culture towards becoming more proactive. But here you are actually taking a look at what you own in the plant and sorting it into different classes, and then figuring out how to record the work that's being done, where to record the work that's being done. How do I assess the value of these assets? I'm curious, do you agree/disagree, that this is a key step to transition out from reactive mode and get more proactive.
JA: The first thing you want to do is your criticality analysis, and that'll help drive the determination of your equipment maintenance strategy. So, yeah, it's definitely one of the first steps. Like George said, it's what do we own? So you got to make sure all your assets are in the system, or you have it documented what all assets you have. But then that next step is to rank those assets, because it's going to be your driving factor behind how to determine how robust your strategy is going to be per piece of equipment.
GW: What's interesting is every day people do risk management in their head. If you look at the basic manual of your car, it tells you to walk around it before you ever drive it, check all your tire pressures, check the fluids. It tells you to do all that before you ever operate the vehicle every day, and people don't do that on a daily basis. Because internally they've done exactly what the next several episodes lay out. How critical is it? What is the failure mode? Is the failure mode likely to occur? In their head they've determined the risk factor is not great enough for them to walk around the car and check the blinkers, because there's things like detectability, right? The blinker blinks faster inside the driver's seat in front of you and gives you an indicator that one of your lamps is out.
In your head, you're doing this every day. And even if it's the honey do list, right? I'm going to fix a hole in the roof before I hang a picture on the wall, usually, if I've made choices. So every day we do these, these risk analysis approaches to prioritizing our work. Yet we come to the plant and a pump is a pump is a pump is a pump, and they all get PM’ed the exact same way, and it makes no sense. But the way we tend to operate at work, is not always aligned with what we know to be common sense, right?
PS: Right, and when it comes to a pump is a pump is a pump, when you visited plants, how often would you say that attitude is out there? Is it out there like half the time? A little more than half the time?
JA: Most of the time, because they don't have a criticality analysis. The example I use is an exhaust fan. The question is, you have this exhaust fan in, say, a warehouse where its sole function is just to evacuate heat for comfortability. What's the consequence to the business if that were to fail, versus you have an exhaust fan in your ammonia compressor room that has to operate in case there's a pop off? It has to shut down, lock the vapor into a room so that you're not shooting a vapor cloud into the air, and risking possibly killing an entire population. The consequence to the business of that exhaust fan versus the one in the warehouse, is completely different. One you can choose to run to failure where there's no consequence to the business. The other one, you would better have one of the most robust strategies to make sure that thing works when you need it to because of the consequence.
So when you're looking at your assets, you've got to understand what is the consequence to the business. You could have a pump that's outside that might pump runoff water for no reason, just because people thought it was really cool to put in at one time. And you have the same exact pump that's pumping oil to your food manufacturing process in order to make a recipe. It could be the same two pumps, but the consequences to the business are completely different.
And what they end up doing is it's typically on the lower end using the OEM manual to derive their PMs, so they'll do the same maintenance to both pieces of equipment. One, they're doing too much maintenance on the one, and they're not doing enough on the other. Understanding that, and you know how critical it is to the process, is very important.
GW: I think there's a misunderstanding of what that can do for people that are maintenance managers and maintenance supervisors. It's seen as work, right? So you come in and you assess a plant, you say, “hey, you got to get done your criticality analysis.” They just see that as an exercise to flag something in the CMMS, and so it looks like work to senior leadership. They don't understand the benefit of it, and the benefit of this is in several areas: (a) it drives how you build that strategy, but (b) you're all screaming for resources, I need resources. Stop PMing your non-critical assets that don't require a PM.
You still have to evaluate the cost of that replacement versus the cost of lubricating it, because the process still dictates that you go through some cost analysis and that'll be in further episodes. But generally speaking that bottom 25%, you're going to run to fail. Well, that's resources that currently are running around executing preventive maintenance that you immediately get back into your shop that focus on your more critical assets. So it's really about risk identification, risk mitigation strategy, and an execution of that strategy is really what all of this boils down to. And a benefit of it is focusing your resources, your limited resources, on the most important work.
PS: I'm really struck by how much of breaking out of the reactive mode can depend on this. Like you said, it touches both the asset care, but then also it opens up resources that you might not otherwise have. You're also mentioning the CMMS as a critical tool to recording this information. People can make a list of critical assets and non-critical assets, but if it's on pen and paper, it's easily lost. Could you talk a little bit more about the importance of the CMMS in this process?
GW: You mentioned the CMMS to help identify the definition of “an asset” because the definitions are different. However, the necessity of a CMMS to properly figure out what's critical or not, I think, is not as important. They can be on paper and be very successful. Unless you are extracting the data, analyzing the data, and taking action from the data, the CMMS is a filing cabinet, so that's a whole different 12 step program!
JA: I mean, at the least, it's an added convenience because all the data's in one place, but if you're not utilizing it, then there's no value in it.
PS: Okay, and since it's all in one place, you can give permission to various folks to share it across their laptops, PCs, phones too. So it's easily shareable.
JA: But again, if you're not utilizing it, there's no value in it. It's like buying an Ultraprobe and putting it on your shelf and saying that you do predictive maintenance. You're not getting any value out of it if you don't use it.
GW: Buying it doesn't count.
PS: You mean the layer of dust on top of the case isn't an indication of how powerful the tool is?
JA: You'd be surprised.
GW: Pass the dirt. Don't remove.
PS: Well, let's say someone's starting out from scratch on this. Do you have a rule of thumb on how many assets normally end up being critical at a plant versus non-critical? Or is it too variable from plant to plant?
JA: The rule of thumb is typically the top 20% of your assets are critical. Now, that's a rule of thumb, and there's a lot of industries that have entire processes where the whole thing is a single point of failure, and so it makes everything critical. It depends on the industry, it depends on the plant, it depends on the design, there's a lot of factors there.
Listen to the entire interview
GW: Let's just look at two really quick examples, right? So you've got, a manufacturing plant that's got a half a dozen packaging lines. The Ops Manager believes they're all critical because they run every. But when compared to the air compressor, they're not really that critical because one of those goes down and you’re down one packaging line, but if the air compressor goes down, you're down six packaging lines. So having everyone understand criticality is just as important as going through the exercise. If you end up with 50% of your assets being critical, then there was probably a misunderstanding of what criticality means.
On the other side of that is, as Joe mentioned, if you have a linear process and no redundancy, then you've got a lot of individualized criticality there that, when you map out a reliability block diagram and even if every one of those is 99.9% reliable, but you have two dozen of them in a linear fashion, you're not in good shape. I mean, you're, you're probably by that point at like 60-65% reliability. I mean, I didn't do to math and somebody will probably comment on this, but, you know, it's not going to be 90%.
PS: I remember talking to someone from Eli Lilly, and when they were exploring which assets to run prescriptive maintenance pilot programs on, they eventually settled on the chiller, because the chiller was a key piece of equipment to maintain temperature throughout the facility, to ensure that the facility stayed in regulatory compliance, etc. It wasn't the production line itself. It wasn't an element of the line. It was the chiller supporting the room in the plant.
JA: Well, number one, if you don't have electricity, how many packages are you going to produce? No one really thinks about the utilities being the most critical. And then you need electricity to run the air compressor, so which one's more important, right? Your electrical power grid is one, like if you have substations or unless you're fed off the city, it kind of depends. Then air compressor is typically number two, because most of the equipment runs off of compressed air, at least in a manufacturing facility. It's a little different, you know, by industry, but no one is thinking about that type of stuff. All they're saying is my line is the most important.
GW: It's usually, you know, as Joe mentioned, electrical and then life safety stuff, so the system he was mentioning earlier with the ammonia, life safety systems, then utilities, right? Critical utilities are typically third and they are the last thing anyone in the company is thinking about, because all they think about is the widget they make. And those systems, are not prevalent and right in your face when you're thinking about what do we produce as a business.
PS: I’ve got two more questions, in my mind for this podcast. We'll get back to the issue of, “hey, it's my line, it's critical” for the second question. The first one is, let's say someone's starting out on a criticality analysis and they've listened to our podcast, they've talked to a few people in the field who have done it. When they look inside the plant, for folks who can help them prioritize, should they look to the EH&S team, for example, or the safety folks to understand, okay, what are some systems that are helping support the whole plant that I might not be aware of, but which might actually be critical?
GW: Nah, we don't care about safety. (laughs) Yes…yes, yes, yes. Like when you go to that About Us section of your website, safety's going to be listed there, so they are a critical, supporting function for the plant to run effectively, as are many other business units, and they should be involved in that.
If you're not sure what the cost of downtime is, then you have to talk to finance, right? Because if you can't derive what the cost of the air compressor going down is versus the cost of a single conveyor, then you're just throwing darts at a board and saying, “this one is going to cost more than this one.” Which may be great if you have five assets, but if you have 10,000 assets, it's not a great delineation between what's important and not. So you definitely need a cross-functional group in order to do this well.
JA: And it depends on your established criteria, so you have to establish a certain set of criteria as you go to ranking your assets. When I was at one food manufacturer, legal was a part of their criticality analysis because of compliance and, and some other issues, from a legality standpoint. So we even had to pull in the legal team, and so it kind of depends on what the established criteria is, what your regulations are.
I mean, there's a lot of variability with that cross functional team, but typically it's safety, quality, operations, maintenance, engineering. Those will typically be your cross-functional groups. And then if there's any outliers, like I said, legal was one for us due to some of the rigs that we had.
GW: That's why it's good to term this more along the lines of risk mitigation, risk identification and mitigation, versus criticality analysis. I think Suzanne Greman does it really well because she calls it risk, right? It is risk, that's what you're looking at. Because if a palletizing robot goes down, but you can't staff doing it manually, then you have just as much risk as the conveyor that feeds those robots. And if you can staff, then the risk mitigation strategy is different. :Well, yeah, we can survive it going down because we've got five people waiting just in case it goes down.” Not that that would be the case, but it's really about business continuity as well. It's not just what can the business lose, but is there a continuity plan in place that mitigates that risk?
PS: This feeds into the last question I was going to ask, which was, let's say you do have that one person who says, “but hey, it's my line, why is it not critical?” It sounds like it's important for that team to come together and identify together what the criteria are for criticality and that way it doesn't get personal when it comes to what is and what isn't critical. How do you get past the personal connection between a person and their asset who would not feel so good if their line or part of their line wasn't in that criticality list?
JA: Once you agree on criteria, it's now objective. It's no longer subjective.
GW: Joe answered it when he said you have to have the criteria set in place, whether you use Excel or some other tool to do this. That same cross-functional team should be defining that categorization and scoring mechanism so that it becomes a process, right? No one individual dictates it. Now, of course, if the plant manager's sitting in a room, they may or may not try to take that over, but once you've agreed to those established criteria, then the process should take over, and there shouldn't be opinions of any of it.