Mean Time Between Failures (MTBF) refers to the average time a product or system operates correctly between two consecutive failures, also known as Mean Time to Failure. It is a measure of how long a product or system can typically operate before failure, represented by MTBF.
MTBF, or Mean Time Between Failure, also known as Mean Time to Failure, is an English abbreviation that measures a product's reliability, particularly for electrical products, with the unit being "hours."
Calculation Method
With the widespread use of servers, higher reliability standards are demanded for them. "Reliability" refers to a product's ability to perform its specified functions under given conditions and within a specified time frame; conversely, failure occurs when a product or part cannot perform as intended. In summary, fewer product failures indicate higher reliability. The ratio of the total number of product failures to the total number of life units is known as the "failure rate" (Failure rate), commonly denoted as λ. A WD Caviar RE2 7200 RPM hard drive suitable for servers boasts an MTBF of up to 1.2 million hours and comes with a 5-year warranty. One million hours is approximately 137 years, but this does not mean that each such hard drive can operate for 137 years without failure. From MTBF = 1/λ, it follows that λ = 1/MTBF = 1/137 years, meaning the average annual failure rate for this hard drive is about 0.7%. On average, within a year, 7 out of every 1000 hard drives will fail. When a product's lifespan follows an exponential distribution, the reciprocal of its failure rate is called the Mean Time Between Failures (MTBF), abbreviated as MTBF.
The product, which is repairable, has experienced N0 failures during its use. After each failure, it is repaired and continues to be put into service as if new. The working hours are T1, T2, T3, T4, T5, ... T0. Therefore, the average failure interval time, or the average lifespan, Q of the product is:
Typically, we can find the MTBF value on the manual or packaging of a product, such as 8,000 hours or 20,000 hours. But how is the MTBF value calculated? Suppose the MTBF of a computer is 30,000 hours; does that mean the computer was tested continuously for 30,000 hours? The answer is no; otherwise, we would never be able to test all our products, which would take decades. In fact, the standard methods for calculating MTBF values are established by MIL-HDBK-217, GJB/Z299B, and Bellcore. MIL-HDBK-217 was proposed by the Reliability Analysis Center and Rome Laboratory and has become an industry standard. For product MTBF calculation, GJB/Z299B is the standard in China; Bellcore, proposed by AT&T Bell Laboratories, is the industry standard for calculating MTBF values in commercial electronic products.
In MTBF calculations, the primary consideration is the failure rate of each component within the product. However, due to the significant differences in failure rates of components under various environments and usage conditions, for instance, the reliability value of the same product in different environments, such as in a lab and an offshore platform, will undoubtedly be different; similarly, the failure rate of a capacitor rated at 16V will vary significantly at actual voltages of 25V and 5V. Therefore, when calculating reliability metrics, it's essential to consider these multiple factors. All these factors are nearly impossible to calculate manually, but with the help of software like MTBFcal and its extensive parameter library, we can easily determine the MTBF value.
Reliability is a standard that measures the probability of a system operating effectively over a specific period of time. The assessment of reliability requires the system to maintain normal operation during a certain duration.
Synonym Distinctions
The Differences Between MTTF, MTBF, and MTTR
Reliability is a standard that measures the probability of a system operating effectively over a specific period of time. The assessment of reliability requires that the system maintains normal operation during that time frame.
A widely used reliability metric is the MTTF (mean time to failure), defined as the "expected value" of random variables like failure time. However, MTTF is often mistakenly interpreted as "guaranteeing a short lifespan." The length of MTTF is typically associated with the product's operational period, excluding aging failures.
MTTR, or Mean Time to Restoration, originates from the IEC61508 standard's Mean Time to Repair. Its purpose is to clearly define the time concept within the term. MTTR represents the expected value of the random variable for restoration time. It encompasses the time required to confirm a failure has occurred, as well as the time needed for maintenance. MTTR must also include the time to obtain parts, the response time of the repair team, the time to document all tasks, and the time to put the equipment back into service.
MTBF (Mean Time Between Failures) is defined as the average time required for a failure or maintenance, including both the failure time and the time taken to detect and maintain the equipment. For a simple, maintainable component, MTBF = MTTF + MTTR. Since MTTR is typically much smaller than MTTF, MTBF is approximately equal to MTTF and is often substituted by MTTF. MTBF is used for both maintainable and non-maintainable systems.
Analysis Objective
1) Key countermeasures for frequently malfunctioning parts and the technical basis for extending part lifespan.
2) Conduct calculations for the component life cycle and develop maintenance plans.
3) Selection of inspection objects and items, setting of inspection standards, and improvement.
4) To guide the allocation of internal and external maintenance work. Based on the evaluation of the company's equipment repair capabilities, determine the risks in terms of maintenance quality and equipment efficiency for the different types of work undertaken internally, which serves as a key reference for outsourced maintenance.
5) Establish spare parts benchmark. The various reserve items and basic stock quantities of mechanical and electrical components should be determined based on the analysis of MTBF records, ensuring an economically sound inventory level.
6) As a reference for selecting the maintenance technology methods to improve key areas. To enhance equipment availability, it is essential to reduce the duration of long-term maintenance operations, engineering adjustments, and switchovers related to equipment downtime. Therefore, it is necessary to inspect the maintenance operation methods, and the items to be inspected and the selection of priorities should be based on the analysis records of the MTBF.
7) Research on the establishment of standard estimated operating times for equipment objects, as well as the selection and maintenance time standards for their maintenance operations. The setting of maintenance plan estimated time standards and the selection of maintenance operations must consider factors such as the difference between the equipment maintenance repetition cycle or standard time values and the actual maintenance time, as well as the characteristics of the corresponding maintenance operations. Therefore, an MTBF analysis table is very necessary.
8) Reference for organizing patterns and re-selecting key equipment or parts. The MTBF analysis record table, which documents equipment and part modification projects or friction deterioration information, as well as equipment drawing modifications or preliminary production, can facilitate engineering drawing management through regular analysis and inspection, and importance ranking management.
9) Establish, revise, and determine the responsibility allocation for operational standards and equipment maintenance business.
10) Provide technical documentation on the reliability and maintainability of equipment. Maintenance technology is crucial, based on MTBF analysis sheets, to gather technical information regarding the reliability and maintainability design of the equipment for the design department's reference during equipment design.
Analysis Application
1) It's challenging for the maintenance department to understand the relationship between maintenance activities and product quality. For instance, when it comes to fixing malfunctions, the focus is often on functional repairs, without confirming the extent to which the maintenance work has improved product quality. However, it's crucial to integrate product quality with maintenance activities, and the MTBF analysis chart can serve as a valuable reference.
2) PM (Predictive Maintenance) focuses on equipment diagnostics as its core, yet the specific topics of its outcomes are often challenging to grasp. Identifying the topics for equipment diagnostics development from the MTBF (Mean Time Between Failures) analysis table is a highly effective approach.
3) Production Equipment Training Materials: Developing personnel who are proficient and capable with equipment is a key issue for PM. However, most training is typically based on books available on the market, which do not adequately address the specific issues and conditions of the company or unit. There is also a discrepancy with the background information provided in the books. Therefore, utilizing MTBF analysis to train on the structure, functions, weak points, and precautions of the company's production equipment is a more specific approach.
4) Regarding the grasp and research materials on the Life Cycle Cost (LCC) of equipment: The MTBF analysis sheet is a comprehensive overview compiled over a long period, with the equipment as the main focus. Therefore, understanding the equipment's life cycle cost involves analyzing maintenance activities, costs, spare parts, and loss occurrences. This is a crucial foundation for managing the entire lifecycle of the equipment. In summary, the MTBF analysis is not only a method for maintaining repair records but also serves as the original data for guiding maintenance, management, and technical activities, holding significant value and importance [3].
Analysis Sheet
Step 1: Identify the equipment object to be analyzed – typically, prioritize key equipment for recording, or opt to document groups of similar equipment or specific critical areas of the equipment.
Step 2: Fault Data Collection – Analyze equipment fault data from the past 3 to 5 years, or at least 30 incidents.
Step 3: Drawing the Fault Distribution Map – Illustrate the overall equipment diagram and, using the data from Step 2, indicate the fault locations.
Step 4: Compile the MTBF Analysis Sheet.
The content of the analysis table is based on data that can be recorded for a year.
Record the content of Step 3 in order by part category and date of occurrence.
Utilize graphics or color categories, symbols to record, enhancing readability as much as possible.
Continuously logged to "zero sudden equipment failures" point.
Step 5: Verification of Fault Analysis and Countermeasures.
By conducting failure cause analysis and countermeasure verification through MTBF analysis tables (common methods include the Pareto chart, the cause-and-effect diagram (fishbone diagram), and the fault tree method, etc.).
Countermeasures should be adopted in an easily understandable manner and be genuinely feasible.


