Root cause analysis helps engineers analyze asset performance and identify the source of machine failure. But how many RCAs are enough for your maintenance program, and how can you use them to change behaviors instead of just fixing the assets? Shon Isenhour and Brian Hronchek of Eruditio join us for a discussion on how to optimize your time spent doing RCAs so they have maximum positive effect on your plant.
Listen to Shon Isenhour and Brian Hronchek on Great Question: A Manufacturing Podcast
PS: Let's talk about some success stories to wrap up the podcast. What are some examples, maybe one from each of you, a success story you can think that you’ve seen in industry when someone performed an RCA and found a true root cause that was addressed?
BH: In my previous life, the one that always sticks out was a bunch of furnace fans. It was a real simple set up: a motor, a belt with sheaves/pulleys, and a fan on a shaft. For two years the organization was chasing down the failure, like they were experiencing 35 failures per year on 15 different fans. It was over $1,000,000 in lost production over two years, and the investigation was very superficial, which is why we were chasing it for so long. We were finding all sorts of evidence and chasing down components that really weren't actually the root cause.
So when we finally did find the component, which was a flange bearing, which was the root cause, we found that it was false brinelling, because it was installed with a hammer instead of appropriately. Using a hammer to install a bearing on a shaft creates the little divots that that ended up inducing a failure, which is why every rebuild, we were doomed to failure from the very first moment that it was reinstalled.
As soon as we figured that out, and we finally found the evidence – which again, you don't have all the evidence the first time you go look into it, you have to investigate it a little bit – once we found that, I went and asked the mechanics, “hey, how are you guys installing this?” and they just straight out said “we beat it on with a hammer.”
SI: So it was true brinelling, they actually had impact damage.
BH: Right, true brinelling. So they said “we beat it on with the hammer,” and I was like, oh, gosh, OK. Then they got very emotional, and they started yelling about how “management is stupid, that we don't listen to them,” and that they've been telling us for years. “If that's the part you want, that's the part you get, so we will put on the crap that you give us.”
OK, well, we’ve gone to the physical cause: it's brinelling, it's a bad bearing. We've got to the human cause it was installed wrong. But now we're understanding there's something a little deeper.
So we asked them, what do you mean? They said “it’s not the bearing that it's supposed to be.” I said, yeah, it is, it's the Dodge bearing, right? We went to the warehouse and we looked on the shelf, we found Dodge bearings in the system and we found the one and went to it (on the shelf), and it was not a Dodge bearing. So now there's something in the system that's missing. We started making phone calls and we found out that our procurement department had given one of our suppliers carte blanche to replace any part that they wanted with a like or similar part as long as it would provide savings.
We saved $7,000 buying 70 bearings that we shouldn’t have had to buy over two years, and lost over a million dollars doing it.
PS: That's some savings.
BH: When you go through that, you have the effect, which is the fan failures. You have the physical cause, which is the bad bearings. The human cause, which is the installation. You have the systemic cause, which is purchasing parts that are not right. And then you have the latent cause, which is leadership focusing more on savings than they're focusing on doing things right.
When we made the phone call to procurement, we said, “hey, this what you guys are causing. We're not saying don't do it. We're just saying tell us ahead of time, give us a chance to accept or reject any recommended changes” and it solved a lot of problems.
PS: I would think that last step you just talked about: don't point the finger at procurement. Let's talk about ways to address the situation, and the solution was not to point fingers. A solution was, give us a chance to accept or reject.
BH: Sometimes it's the right thing, but the supplier made a mistake and they actually gave us the wrong bearing. They could have given us the right one, but they messed it up.
PS: That reminds me of the story I heard years ago where a replacement set screw took down bearings for the exact same reason. The vendor was given carte blanche to replace equal or equivalent parts, and didn't tell anybody on the installation side, so something similar happened.
BH: Exactly the same thing, yeah.
PS: Wow, that's quite a story. Shon, can you think of one?
SI: Well, I would say he used that story to reinforce the levels of digging down into that RCA and getting to those systemic and latent levels. When we teach the RCA classes, we actually have people bring problems to the class, so they get to work through them in front of us. And because of that, we get to witness some real aha moments as they're going through.
One in particular allows me to tell you a little bit about the other side of that story, where he told you about drilling down into the problem. What we also know is you need to broaden the problem and you do that through two words or two phrases. You look for the actions that happen instantaneously, and you look for the conditions that existed over time. Because there will always be an action and a condition at every level of a fault tree, as an example.
So as you're drilling down, you're looking to see, OK, what all had to happen. And what's really cool about that is if you do it right, it helps you get rid of gremlins, and gremlins are those problems that they've just kept recurring for years and no one knows why they happen they're just gremlins. And my experience has been, especially when we bring those problems into the classroom and start working through them, we list out those actions and conditions – and they may list like two actions and one condition, or they may list three actions and one condition.
But when we go back and we read it from the bottom up and we say, OK, if you have these three actions and that condition, do you get the thing above it every single time? And if you don't get the thing above it every single time, you're missing something. What that allows us to do is to force folks to think about, OK, what all had to come together for this to happen.
Years ago I was teaching an RCA class in an automotive facility, and in that facility they had one of these gremlins, and to make a long story short, we sat there in the class and we worked through it, and finally we found out that there were five things that had to happen at the same time, for this issue to happen. But, every time those five things happened, you got the thing above it – every single time.
So that was just a Eureka moment for them, right? Because now they were able to say, OK, now let's look at those five and do something else that a lot of people don't do, and that's build a business case for which one to eliminate. Because if one of them goes away, the problem goes away, or at least you reduce the risk of likelihood.
So now I get to do a financial analysis to say, which is the right one? The way I say it is, which one lowers the risk to an acceptable level for the money I'm willing to spend? I say that over and over in the class because I want people to bring in business case thinking to the way they solve problems, and that is not what happens in a lot of organizations today. That story was a huge success – they were able to identify what the problem was, identify the five things that were causing it, and then they were able to choose the one that that reduced the risk to an acceptable level for the money they were willing to spend.
PS: This where the finance team can sometimes be your best friend. It's because they're in the business of risk mitigation. They'll immediately speak that language of what is the best risk/reward scenario for this analysis. So, folks looking to make that business case, reach out to your CFO or reach out to the Financial Office because they will be able to help you understand, what will the impact be of the actions you take? For example, will it have a positive effect on the insurance rates for the company? If you can reduce risk by X percent, will that reduce your. premium too? It’s secondary quick win.
SI: Absolutely, yeah.
PS: Well, guys, thank you for this quick survey of RCAs and also for the stories on what went right. I'm curious to know, do you have a story of what went wrong?
SI: You want to go first. Brian?
BH: I'm going to leave that one for you boss!
SI: Alright, man. So you know what goes wrong? I think it's a combination of some of the things we started talking about that trip people up. They end up flagging too many RCAs for RCA implementation, so they start shortcutting them, or they start doing a simple, quick, dirty 5Y so that they can go to that morning meeting and say, hey, the reason were down last night was because the gearbox locked up. So they replace the gearbox and then wonder why it fails again two weeks later.
Those would probably be the things that I would call maybe not successes, because they didn't take the time to understand the problems well enough. They reoccurred, and then you know, as I said earlier, it just makes everybody involved look a little silly. It's still going to happen. You're still going to miss things. You can't be perfect at RCA every time and nobody should think that they can be, because they would paralyzed and not be able to get anything done. But I think at the end of the day, by using good, effective tools from a good, effective toolbox, you can understand the problem well enough to accept the risk and make the comments that you need to make to get those things fixed.