Last Updated on 9 September 2023
Mean Time Between Failures or MTBF is a metric that is very helpful when measuring issues that don’t occur very frequently, and especially plant maintenance. When you are trying to reach Six Sigma, you should try to reduce issues such as defects, downtime, spills, safety issues etc until they very rarely fail.
A graph of incidence of these occurring wouldn’t give you too much information; they would just be zero for long periods of time, followed by a one unit height peak. MTBF then helps by showing you a measurable quantity for how serious the issue is, that you can monitor and improve over time.
MTBF Definition
The mean time between failure definition is:
The average time that a device or product functions before failing
The average used is the arithmetic mean. It is essentially the average life expectancy of the product before repair.
MTBF vs MTTF
There is a very similar metric that is often confused with Mean Time Between Failures called Mean Time To Failure. The key differences are:
- Mean Time Between Failures is calculated for items that are repairable. All measurements in the calculation will be for the same item.
- Mean Time To Failure is calculated for items that can’t be repaired (the most common example is light bulbs). The measurements will therefore be over a large number of assets.
They are both used for a similar purpose, i.e. measuring the reliability of the asset through how long it lasts before breaking.
When do you use MTBF?
There are a huge number of applications when it would be useful, as it is any rare occurrence that you’re trying to reduce as low as possible, such as:
- Defects rate
- Downtime
- Safety issues
- Machine breakdown
- Battery life before recharge
- Product reliability in real customer use
It is usually used for repairable systems, i.e. all the measurements are for the same machine or product.
Mean Time Between Failure formula
Mean time between failure (MTBF) can be calculated by:
MTBF = Length of period / Number of Failures in a period
You can use whatever units makes most sense for the period, but days is probably the most commonly seen. To measure over time, you need to pick periods of equal length, such as calculating the metric for each calendar year.
Long downtimes
If there is a large downtime to repair after the item breaks, then you need to take this into account, as the is the mean time between failures. The formula then becomes:
MTBF = (Length of period – total downtime) / Number of Failures in a period
It is important that all failures are included, even really short ones, or issues that are causing waste will be missed.
Calculation from uptime
You can also calculate it from just averaging all the uptime:
MTBF = Total uptime / number of failures = Σ(start of downtime – start of uptime)
MTBF and MTTR
These two metrics are two of the most important KPIs when you’re trying to improve the reliability of your machinery. They are both useful in themselves but become especially powerful when used together.
• Mean Time Between Failures is how long you last before your machinery breaks
• Mean Time To Repair is how fast you can get back up and running when it does break
Mean Time Between Failures in TPM
The two combine to make an important third metric availability, which is the percentage of time that your machine is available for production:
Availability = MTBF / (MTBF + MTTR)
Availability is essentially the probability of a machine being available at any given time, and increasing it is a great way for you to reduce the wastes from downtime. It is part of the Overall Equipment Effectiveness calculation, a key metric of Total Productive Maintenance.
A major aim of the ‘Planned Maintenance’ pillar of TPM is improving the Mean Time Between failures as this will influence how much planned maintenance needs to be performed.
Mean time between failure example
A good example is that a company is trying to cut down on the number of reportable safety incidents that they have, as you would expect this to be low in most companies. Their incidents are:
- 2019: January 2, April 5, April 20, May 1, June 25, June 28, August 15, October 29, December 3; 9 incidents
- 2020: January 30, March 6, June 16, August 20, November 5; 5 incidents
There are 365 days in 2019 and 366 in 2020, so the formulas become:
- 2019 MTBF = 365 days / 9 incidents = 40.56 days per failure
- 2020 MTBF = 366 days / 5 incidents = 73.2 days per failure
An increase in mean time between failures is an improvement, so you can see that the safety rate appears to be improving.
MTBF and failure rate
It is often helpful to convert to a metric that is measured in units rather than time. The easy method to do this is to simply take the inverse, giving the formula:
Failure rate = 1 / MTBF
or
MTBF = 1/ Failure rate
The units will then be e.g. failures / hour or failures per day. To keep the units helpful you may want to convert to a longer period, e.g. if you are using units of days between failures, you may want to use failures per year (which will be failures per day x 365).
In the above example, this would give a failure rate in 2019 of 0.0247 and in 2020 of 0.0137. It is easier to quantify improvements using failure rate, as this is a 45% decrease ((0.0137 – 0.0247)/0.0247).
The bathtub curve
Failure rates and therefore MTBF are rarely constant over time, even if no changes are made to the operations methods. They usually follow what’s known as the ‘bathtub curve’. This is a U shaped curve with very high failure rate in the early life and end of life sections, but low failure rate in the middle ‘useful life’.

- At the early life stage, weaker units can break irreparably, and installation issues will cause issues.
- As time goes by, these issues will be fixed and the new unit will be productive with few issues. This is the useful life stage.
- As the product ages, parts will start to age and wear out, meaning they are likely to break more easily. This is the wear-out stage, also known as the end of life stage.
When you are monitoring the MTBF over time to improve the system, you need to verify that the improvement you are making are due to the alterations. If the improvement is just that e.g. the early life issues have been fixed, they will likely reoccur for the next product.
It goes with out saying that MTBF should follow a n shape rather than u shape curve, as it is the inverse of failure rate, with high mean time between failures for the useful life stage, but low MTBF at the start (early life) and end (wear-out stage).
Back to: Improve phase
Previous: Overall Equipment Effectiveness (OEE)
Leave a Reply