Are you taking the right steps when performing an FMEA?
What is an FMEA? When should you use it? Why is it an important step in helping maintenance teams move from a break-fix maintenance state to one that is more proactive? In this episode of Great Queston: A Manufacturing Podcast, Plant Services editor in chief Thomas Wilk spoke with a specialist in the reliability field, Brian Hronchek, to start answering these questions and more about failure modes and effects analyses. Brian draws from his former experience as reliability engineer for U.S. Steel, maintenance manager for Exxon Mobil, and a 16-year veteran of the Marine Corps, in addition to his current work as a principal trainer and consultant at Eruditio.
Listen to Brian Hronchek on Great Question: A Manufacturing Podcast
PS: For someone who's never seen one before, if they were to look at the conclusion section of an FMEA or the final bit, what would they see? And I understand there's an RPN number usually, there's a bunch of recommended actions. What core data do you want to get there so you can move to the next step in the plan, which is the equipment maintenance plan or an RCA.
BH: You're trying to get to build an equipment maintenance plan. So as you're looking at each one of these failure modes and everything associated with it – the functional failure, the criticality RPM score, and then the recommended actions – you're going to summarize all those recommended actions into the equipment maintenance plan document and it's going to be partially complete at that point because it's going to have the recommended actions. You still need a planner to say, “hey, the recommended action is do a monthly inspection on the tread depth on my tires. You still have to add what, what technical skill? What's your craft? How many hours? Is there any other equipment that's going to be involved? And are there any consumable parts that are going to be used? Because that that equipment maintenance plan is a document that goes in both directions. (1) It will turn into your CMMS PM plan and PM schedule, and (2) it will communicate to all of your bosses the real cost of keeping your equipment running the way that they keep telling you it has to.
PS: What do you do with the FMEA once it's complete? Do you keep these in a library like a job plan? Do you revisit it once a year? Who should see these things in the plant?
BH: So yeah, it’s useless by itself. It's only useful based on the outputs and getting closer to putting your hands on the equipment. Go through the failure modes and effects criticality analysis (FMECA), go through that and once you turn that into your equipment maintenance plan, store it away, keep it for reference later, right? There's going to be times when your strategy has to change. There's going to be times when the business changes, the market explodes and now you have to do more with less, or the market shrinks or the competition changes something, or something happens and it affects the criticality of your equipment, which means you have to come back and redo your FMEA.
You know the goal of keeping criticality as a theoretical measure of your business (money / safety / customers) is so that it doesn't change very much. If it's based on performance, it might swing up and down as you fix things, so keep it theoretical. And again, that's my recommendation, a lot of people will disagree, and that's OK, but you keep it theoretical so that you don't have to come back to this very often.
But when you do come back to it, that record is there. You have a better starting point because you're starting from, “man, we put all these hours into it. Here it is. What needs to be modified? Well, you know what? Our particular asset, the business changed, which is going to bring our severity of if this goes down, the severity is coming down.” Well, that drops it much further out of, maybe we don't have to do PDM on this anymore. We can save a little money. It's just not that important anymore. Or the opposite, and we come back and we realize that something now is super critical because the company just sold the other asset that was the backup. Now, what do we do? We have to maintain this a whole lot different. Now, this cannot fail!
So there there's times to come back, but generally just keep it in in the folder, keep it saved. Come back to it later. But the real value in those documents is downstream.
PS: I've always wondered too, not ever having worked as a reliability engineer myself, when it comes to FMEA libraries in a given facility, are FMEAs at a facility specific to that one location where you wouldn't want to borrow FMEAs for the same equipment from other plants? Or is it the case where you can take a look at other FEMA libraries if they're out there? I don't know how much the location can change the output of the FMEA.
BH: It can totally change it, right? So the FMEA, and again let’s clarify, let’s go back and quit being generic. The FMEA which could be provided by an OEM is generic: “a motor can fail these different ways.” You can provide an FMEA and send it out to everybody and say, hey, this the FMEA for a motor. Perfect, great, everybody has that. But when you add criticality to it, you're adding the context to it. So a motor that drives a pump, that is the single point of failure for let's say a refining or production process. This pump takes this from here to there, and that is the only line that moves that oil to that location. If that thing fails, that oil stops moving.
Versus, that exact same motor, exact same model size that is in a pump bank of 20 different motors and pumps and where you'll only need 10 of them running at a time in order to move your hydraulic fluid. They both have the same ways they can fail, but one of them you can let it burn to the ground and write a work order and replace. The other one, if it goes down it stops production, so how I handle that is going to be different. That one is going to get all of my predictive technologies, all of my inspections, everything. The other one, I may just decide to run at the failure, and when it dies, write it up, we'll replace it, it's not a big deal. We got 19 other pumps to do what only 10 pumps need to do. You know, 10 motors.
PS: OK, you've really captured for me the difference between a generic FMEA that comes down from someone like an OEM and then what happens at the facility. You see a lot of people in the field – how many plants would you say are at the maturity where they have a handle on this kind of stuff and they've got a strategy in place which includes correct criticality analysis and FMEAs?
BH: It's hard to find them, and I can't tell you that I've ever seen one myself, with the exception of the military experience. If anybody out there wants to know why the F-18 costs so much, it's because we buy the FMEAs, we buy the maintenance plans, we buy the updates to the maintenance plans, we buy somebody to sit on the other side and continue to keep these things updated and keep our pubs updated. And you have to take the book with you to do the job on the on the aircraft. So there's a lot that goes into that where our procurement, they're like, “oh, cool, let's buy this asset, we don't need the maintenance plan, we don't need the FMEAs now, those are just added expenses, we’ve got good people to do that stuff.” And so they chop those costs out of it, so you get it at a discount. And then you realize, “oh my god, we got this at a discount.” So it causes some frustration. Out here in industry, it's really hard to find it where it currently exists. And we want to help people build that, and we do, but finding that maturity right now, it's a struggle.
PS: It goes back to that issue of there's such a gap in in available people who can do the jobs themselves that sometimes strategy may seem like a luxury, but in the end you know my feeling is that if you can do it, you have to do it because it's going to cost you more in the long term.
BH: Yeah, life cycle costing is a real thing. Pay less up front and you'll pay for it later.
PS: One last question for you: given that artificial intelligences like ChatGPT, the generative AI's are helping plants save time by at least getting a start on a rough draft of things like equipment maintenance plans, how do you see tools like that affecting the ability of people to do this whole sequence of strategic planning – criticality analysis, FMEAs, RCAs. Is it helping? Is it hurting? Is it confusing the market?
BH: Well, right now, I'll give you my opinion and I'll tell you, my boss would argue me a different direction and some of my peers would stand with me, some would stand against me. It's OK, right now we're still so new in this that I think it's OK to have different opinions and tease this whole thing out.
What I'm seeing is very similar to what we see with predictive technologies and other things, right? The challenge is, we've got maintenance problems, but what do we do about it? Well, I hear predictive technology is great, right? Well, we just got to throw predictive technologies at it. So, but if you don't have a way to manage that data with your planning and scheduling, you know, you can have the predictive technologies and still use it reactively and still waste a lot of money.
With that same type of context, well, AI can take care of it, right? Well, ok, AI is going to get you something. But number one, is it right? Is it accurate? Is it good enough? In my experience, what we've seen so far is that because everybody talks about it and we try and use it and we read through a couple lines like, oh, this good. But then when you get into the details, it doesn't understand your operating context. It doesn't understand the nuances of your business which would take paragraphs and paragraphs and days.
If you want to ask it to do something for you, you have to type it into a chat. Well, you better get ready and upload a whole lot of business documents and context and everything else and the culture of your business, because it's going to give you something without all that context. And it might even tell you to change the belts on your production machine every 80,000 miles, and it's like, “my production machine doesn't run, doesn't drive! 80,000 miles, what does that mean?” You know, so there is quite a bit of refinement that has to be done, even if that's the way that you choose to go.
The other the other challenge I think we have with that is that you know while it can give you something, the more that we put our hands on it and touch it ourselves, the more familiar we are with it, the more it's ours, the more we believe in it and the more we're actually going to do it. So at this point, if I were to do it, I would say hey, go put your hands on it, go do it, right? You'll get more out of it.
PS: Right. There's no substitute for that.
BH: Yeah, bring your team in and build it together.