Last Updated on 19 September 2023
FMEA stands for Failure Mode and Effects Analysis, and it is used to measure risk in a system in order to prioritize which ones need to be tackled most urgently. I could never remember the term until I split it up in my mind into its two sections: ‘Failure Modes’ and ‘Effect Analysis’.
- Failure Modes are the ways or ‘modes’ that an operation can fail
- Effect Analysis is analyzing the effect that any failures would have
FMEA is therefore ‘analyzing the effects’ of the ‘modes (ways) a system can fail’. It is a powerful method of assessing and then reducing the risks in your processes.
Why do you need to do FMEA?
Surely you want to remove all all issues, so why not just work through solving them all, using a Pareto chart to make sure you’re solving the most common issues first?
Not all issues are created equally.
For simplicity, let’s give an example of a company that makes electric saws, and there’s only two things that can go wrong with the manufacturing process. 10% of the issues are the guard doesn’t get put on the saw, and 90% of the issues are the saw doesn’t get painted properly. Using a Pareto chart to prioritize your issues you’d be merrily trying your hardest to solve the paint problem, as this would remove in one go 90% of the issues. The guard not being on the saw though is a much more severe issue, as a customer could cut off a hand, whereas they may not even notice the paint issue.
FMEA tries to solve this issue, by factoring in not only how common the issue is, but also what are the consequenses, and how likely it is to get missed through to the customer (an issue the customer finds is much worse than an issue you find, thinking back to the ‘saw guard’ issue).
How do you perform FMEA?
Failure mode and Effect Analysis is a simple but powerful tool, which is useful whenever you need to minimize risk. The step by step guide is:
- Identify all the core risks in the system you’re looking at (the Cause and Effect or Fishbone chart is good for this, or you could use brainstorming)
- Put all your risks into a table. Common columns are:
- Part or Process being analyzed
- Risk of failure
- Consequences of the risk occurring
- Causes of the failure
- Methods of detection for the failure – N.B. there will be several risks of failure per part number / process, and likely several causes per failure
- Rate the following on a score of 1 to 10 (10 being worst):
- Severity of the consequences (SEV)
- Likelihood of Occurrence for the failure (OCC)
- Detectability of the failure (DET)
- Calculate the Risk Priority Number (RPN) by multiplying your 3 numbers together – this is your measure for how risky that risk is
- Sort your table by Risk Priority Number – you now have a prioritized list for tackling your risks
- Assign actions to improve the system
- Re-rate the system
Now that you’ve got your prioritized list, you can create a Pareto Chart using the RPN numbers as the variable, to make sure you have effectively covered all the core risks.
Severity
Severity is how bad the effects of the defect would be if it did happen, primarily looking at the impact on the customer.
Rating Criteria
- Customer unlikely to notice even if they get product with this defect.
- Variation may cause slight annoyance to the customer
- Variation will cause small amount of annoyance to the customer and cause small reduction in performance
- Customer will be upset with the defect, and will be likely to cause issues for the customer
- Medium impact on the customer, may affect future sales to the customer
- Moderate to high impact on the customer that could lead to production issues
- Fairly large impact on the customer that could lead to a reduction in production, sales or reputation
- Large impact on the customer that could lead to a major reduction in production, sales or reputation
- Very large impact on the customer that could lead to large decrease in customer reputation and safety concerns.
- Huge impact on the customer that could lead to potentially organization-ending consequences for both organization and customer
Occurrence (OCC)
Occurrence is the likelihood that the defect will occur. A higher score is a higher risk that the defect will occur. This is independent of detection.
Rating Risk Criteria
- Virtually zero Process is in control
- Very low Defect very unlikely to occur
- Low Process unlikely to fail unless conditions change
- Low Process fails infrequently
- Moderate Small number of previous failures
- Moderate Moderate number of previous failures
- High Relatively high number of previous failures
- High Process has failed many times before
- Very high Process out of control and fails regularly
- Virtually certain Process out of control and failure will occur.
Detection (DET)
Detection is how likely it is that the defect will be noticed before it gets to the customer. A guide for how to rate detection is:
Rating Chance of Detection Description
- 99% – 100% Near 100% chance of detection.
- 95% – 99% Very strong controls with defect very likely to be detected
- 90% – 95% Strong controls and the vast majority of defects will be detected
- 85% – 90% Relatively strong controls with a high likelihood that the defect will be detected before being passed to the customer
- 80% – 85% Quite strong controls with defects likely to be detected
- 70% – 80% Moderate controls with a good chance of detection
- 60% – 70% Relatively weak control with moderate chance that either by controls or observation that the defect will be noticed before it gets to the customer.
- 50% – 60% Weak controls with good chance that either by controls or observation that the defect will be noticed before it gets to the customer.
- 1% – 50% Controls are very weak and it is highly likely that the customer will receive product with the defect if it occurs.
- 0% – 1% There are no controls that would detect the issue
FMEA Template Excel
To get you started I’ve included a Microsoft Excel FMEA template to give you a headstart.
Back to: Analyze phase
Next: DFMEA
Leave a Reply