Podcast: FMEA 101 – How to conduct a failure modes and effects analysis at your plant
What is an FMEA? When should you use it? Why is it an important step in helping maintenance teams move from a break-fix maintenance state to one that is more proactive? In this episode of Great Queston: A Manufacturing Podcast, Plant Services editor in chief Thomas Wilk spoke with a specialist in the reliability field, Brian Hronchek, to start answering these questions and more about failure modes and effects analyses. Brian draws from his former experience as reliability engineer for U.S. Steel, maintenance manager for Exxon Mobil, and a 16-year veteran of the Marine Corps, in addition to his current work as a principal trainer and consultant at Eruditio.
Below is an excerpt from the podcast:
PS: Brian and I touched base about a week and a half ago at the MARCON event at the University of Tennessee's Reliability and Maintainability Center. We got talking about FMEAs and he was telling me a couple of occasions where he'd been with some clients and he had been able to work with them to demystify the whole process of what an FMEA was. And for a lot of listeners, you might already be familiar with FMEAs (we know we do serve the reliability community), but Brian and I thought, hey, you know what? Why not capture this conversation and podcast to share with everyone. So Brian, thank you especially for being here to talk about this topic.
BH: Not a problem, not a problem. This is just so much fun. You know, we love to see that light bulb moment when someone realizes, like, “ohh, it's so much easier, I got it” and then it becomes functional. So excited to share this today.
PS: Excellent, let's start with FMEA 101. For those for whom this the first time encountering this term, what is an FMEA?
BH: FMEA stands for failure modes and effects analysis, and there's a modification of the tool, and this really what we're going to talk about, is the failure modes and effects criticality analysis, or FMECA. A failure modes and effects analysis is really useful for an OEM when they don't understand your operating context. But once you bring it into the operating context, then you can start to calculate or evaluate the impact on the business and determine which failure modes are more important than others. So we're going to talk failure modes and effects criticality analysis.
PS: When teams conduct an FMEA, when they use this tool, what is the eventual output of the tool? Are they looking for a specific number or is there a specific set of recommended actions? What should people expect to get once they conduct this exercise?
BH: Yeah, I know it's funny because we do so many things as reliability engineers, and somebody said we have to do this, and we do it, and then we put it on the shelf and we get absolutely no value out of it. So remember, everything that we do has to be taken to the next step. Eventually it has to be turned into value for the business, and the value for the business comes when people put their hands on the equipment and do something to it, right?
I hate to break this to everybody out there who has arrived at their executive position, their management position, their supervisor position. None of us have any value unless the operator and the mechanic put their hands on the machines, so the output of this is a strategy that can be turned into your maintenance plan in your CMMS, so that they can touch the machine.
PS: So it's more than a diagnostic tool, but there's more work to be done once you do complete it.
BH: Right.
PS: What would trigger an FMEA? Is it a machine that's constantly breaking down in a certain way? Is it something that you would do after you perform a criticality analysis?
BH: Oh, perfect, we're going to have so many light bulb moments today. I'm going to tell you right now we're going to so many moments. So let’s back up just a minute. The value of a reliability engineer again, it's when hands are put on the equipment. So what do we start with? We start with building a hierarchy. Why? So we can do criticality. Why? So we can build an asset strategy which includes FMEAs and you know PM optimization and part problem cause, all those things. Why? So we can build an equipment maintenance plan. Why? So it can be handed off to your planners to input into the CMMS so that the technicians can touch the machines in the right way at the right time.
There’s a string of things that have to happen, but you said something: when do we use it? Is it when something breaks down a lot or not? Or is it some other sort of analysis? Let's go back to the criticality tool and I'm going to tie together the criticality, the failure modes and effects analysis and your root cause analysis tool. What are the categories that are important to your business? What are they? What would they be?
PS: Conventionally I would think it’s uptime of the machines and reducing unplanned downtime. Everyone knows the cost of a minute of unplanned downtime.
BH: OK, so you've got downtime, which is really associated with money, right? So we've got our money component, we have a safety component, and we have a customer component.
The money component is, can we make money? And if you're not running, you can't make money. So we have some sort of internal process component. Then we have the safety component: if we're not safe, people aren't going to want to work here. And if they don't want to work here, we can't make money. And then we have the customer. You know, we do everything else right, but we insult all of our customers or turn them away. Nobody's coming to buy stuff. It doesn't matter if we can make things, because nobody's going to buy it.
So there's really three big categories, and when we look at those three big categories – money, safety and customers – if we look at them from a theoretical point of view, what potentially could be the worst thing that could happen? That's when we're over in criticality. We're looking at the equipment and we're saying “for this equipment, what's the worst that could happen in those three categories with that piece of equipment?” That's how we calculate criticality. And people will argue that criticality might have more to do with elements of performance and things, and it's OK if you want to include performance in the criticality, that's fine. My suggestion is, let's keep this theoretical. Let's take those same three categories, and let's look at the failure modes and effects analysis and the severity (we're going to come back to that in a little bit). Let's go downstream: root cause analysis, which is a measure of actual performance.
In those same three categories – money, safety and customers – let's use those as the triggers for root cause analysis. Why would you do a root cause analysis if it doesn't affect one of those things? So criticality is a theoretical indicator of importance in the asset, and root cause analysis is a performance based problem solving tool. So we're going to say well, “but actually this one's causing us the most problems right now. Let's fix that one.” And once you fix it, it’s good.
So coming back to your question, when to use failure modes and effects analysis, once you've done that criticality and you figure out which one theoretically, which of your assets is at the top of that list, those are the ones that you want to throw the most money at, the most time at, because if they go down it costs you the most. You're going to pick a top tier in your criticality and say “we're willing to spend more time on this, more money on this, let's do failure modes and effects criticality analysis (FMECA) on those top assets.” Everything below that, there will be different tiers and different strategies on the way down, but let's take those top tiers of theoretical importance of that asset, and let's put those assets through failure modes and effects analysis.
PS: Do you find that plants, when they do implement these strategies, Brian, are they in reactive mode generally where they're like, oh yeah, that thing keeps breaking down so we'll do an FMEA. Or do you get a sense that it's the maturity is all over the board, where some people do it reactive, some people do it more proactive?
BH: Yeah, I think it's difficult to find a couple of different places where you can get the same understanding of what it's for and when it's used, right? So everybody's going to tell you something different, or it's something that somebody said we should do, and it helps but they don't really know how, so they're going to do an FMEA and then it goes on the shelf. We’ve seen that before. There is a lot of difference in understanding and that's what really we want to do is break it down into a way that's simple, so you understand, this is this part, this is this step, and it leads to that step and eventually it leads to something else. So we've seen it used reactively, we've seen it used proactively and never reaches its end goal, and we've seen it used well sometimes.
Read the rest of the transcript
About the Podcast
Great Question: A Manufacturing Podcast offers news and information for the people who make, store and move things and those who manage and maintain the facilities where that work gets done. Manufacturers from chemical producers to automakers to machine shops can listen for critical insights into the technologies, economic conditions and best practices that can influence how to best run facilities to reach operational excellence.