Applying Failure Mode, Effects And Criticality Analysis (FMECA) for Ensuring Mission Reliability of Equipment

Mahendra Prasad
Lt. Col. Mahendra Prasad is Research Fellow was the Institute for Defence Studies and Analyses, New Delhi. read more

Abstract

Reliability of equipment is a major factor contributing towards the success of military missions. Military equipment therefore needs to be maintained in the most judicious, scientific and meticulous manner. Over-maintenance and under-maintenance are both detrimental to the reliability of equipment. In addition to being an expensive proposition, over-maintenance may lead to a long downtime while under-maintenance considerably increases the failure rate and may prevent optimal utilisation of otherwise inherently reliable equipment. Thus, there is a need for evolving optimal maintenance. Failure mode, effects and criticality analysis (FMECA) of equipment is an effective scientific tool to identify the assemblies, sub-assemblies and components that are critical for the satisfactory performance of equipment. Once these critical items are identified, the most suitable and economical maintenance philosophy and practices can be applied to them in order to ensure the reliable performance of equipment during the mission duration.

Introduction

Defence equipment is required to operate for the duration of a mission in which it is deployed, without failing or with minimal failures. This means that there should be a maintenance free operating period (MFOP)1 for such equipment and they must not fail for the duration of the mission. In other words, the equipment deployed for a mission should display a very high level of reliability. The existing maintenance tasks on the equipment thus need to be evaluated with reference to optimal or near optimal tasks that they are required to perform reliably. The basic input for finding the optimal maintenance tasks comes from Failure mode, effects and criticality analysis (FMECA) of the equipment. FMECA can also be used for many other purposes like logistics support analysis, test planning, and inspection and checkout requirements, to identify maintainability design features requiring corrective action, as an input for Reliability Centred Maintenance (RCM), etc.

As no in-service data is available for a product at the design stage, extensive work on qualitative FMECA is carried out at the design stage of equipment and applied for its design improvement. However, for an in-service product, where failure data is available and can be collected, quantitative FMECA procedure can be applied and the results can be used to improve the existing maintenance procedures in order to improve the overall reliability of the equipment. Equipment reliability increases the confidence of the soldiers using the equipment and helps in the achievement of military objectives. For achieving equipment reliability, preventive maintenance is a must and FMECA is a scientific tool to achieve this goal.

Suggested Methodology

The purpose of carrying out a FMECA of equipment is to identify its critical systems from failure data, followed by identification of assemblies and components or the maintenance significant items (MSI), which contribute the maximum to the failure of the equipment and in the end assigning maintenance tasks to these MSIs. The complete exercise is therefore a four-step process and involves the following activities: –

  • Failure data collection.
  • Identification of the highly critical systems through preliminary data analysis and equipment-FMECA up to the indenture level of the system.
  • Identification of dominant failure modes and MSI contributing to them by system FMECA up to the desired indenture level (sub-system, assembly or component).
  • Assignment of maintenance tasks for the prevention of occurrence of the failure modes and for MSI.

Failure data in respect of any equipment in operation may be collected in the following two ways: –

  • By control testing in which the equipment being tested is subjected to operational stresses at accelerated rates until failure occurs. This method suffers from the drawback that the actual field conditions in which the equipment is designed to operate cannot be simulated to perfection in the laboratory. The method is, however, used for reliability improvement at the design stage in the typical “Test- Fix” cycle.2
  • Data collection in respect of failed equipment from the documents like equipment history sheets and or equipment logbooks is another method. This method also suffers from a serious drawback of including subjectivity of the user, or the person maintaining the documents regarding his/her interpretation of seriousness of a defect, in the failure data. Thus a lot of effort is required in sorting the data before it can be used for FMECA studies. However the principle advantage of field failure data is that it is associated with the actual usage environment. This method is used for developing maintenance plans for in service equipment.3

Sorting the Data4

Failure data collected from the field contains a lot of noise. The basic document from which this data is collected is the equipment logbook which is generally maintained by the craft technicians of the repair workshops. They undertake the repair work once the equipment is brought to them after a snag develops. Since no standard method of ‘wording’ the defects is available to them, different craft technicians who may be attending to the equipment may make an entry of identical defects in different words. This needs to be taken care of and only an expert hand on the type of equipment under study can identify and remove such data noise.

Another source of noise in data is the time between failures. This depends upon the intensity of exploitation of the equipment. During lean business periods, the exploitation rate or the intensity of usage of equipment may decline considerably, thus causing an increase in time between failures. Recording the kilometres/hours/days between failures in case of equipment can eliminate this type of noise. While collecting data, one shall come across certain failures that do not adversely affect the primary function of the equipment and can be regarded as minor snags, e.g. failure of pilot lamp, turn indicators, speedometer cable, etc. in case of vehicles. These snags do not cause an equipment to be left out of the mission and hence should be eliminated during data sorting.

Figure 1 FMECA Format

FMECA Worksheet5

A sample FMECA worksheet format is given in Figure 1. The worksheet is in tabular form to foster a systematic approach to the analysis. The various column headings are explained in succeeding paragraphs.

  • Identification number (Id. No.)
  • A serial number or other reference designation identification number assigned for easy traceability purposes is entered first on the worksheet.

  • Nomenclature
  • Name of the system is entered in case FMECA is being carried out up to system level (for identification of critical system of the main equipment); and name of the subsystem or assembly is entered if FMECA is being carried out down to subsystem or assembly level.

  • Function
  • A concise statement of the function performed by the item entered in the ‘Nomenclature’ column. In case of a system, it shall include both its inherent function and its relationship to interfacing system(s).

  • Failure mode and causes
  • A failure mode implies how a failure is observed. Only those failure modes should be listed which either have occurred or there is sufficient reason to believe that they are likely to occur. The observed and likely causes of each failure mode are required to be listed.

  • Mission phase/ Operational mode
  • Under this column, a concise statement of mission phase and operational mode in which the failure occurs is given, i.e. if the mission fails or succeeds due to a particular failure mode is given.

  • Failure effects
  • These are recorded at three levels:

    • Local effects: Local effects concentrate specifically on the impact a failure mode has on the operation and function of the item in the indenture level under consideration.
    • Next higher level: Here the focus is on the effect of the failure on operation and function of the systems/items at the next higher indenture level.
    • End effect: In this case the effect of the failure on the operation, function or status of the uppermost level, i.e. system is recoded.
  • Failure detection method
  • A description of the methods by which occurrence of the failure mode is detected by the operator, as per the existing design, is recorded under this column.

  • Compensating provision
  • These include either design provisions that can circumvent or mitigate the effect of the failure mode.

  • Severity level (Si)
  • The level of severity of the effect ascribed to each failure mode is entered in this column. The classification of severity level is: –

    • Negligible failures (Level 1): These do not affect the acceptable system performance.
    • Marginal failures (Level 2): System is degraded with partial loss of performance.
    • Critical failures (Level 3):
    • These cause a major system failure or its performance drops below the acceptable level and can cause injury.

    • Catastrophic failures (Level 4):
    • These result in total system failure and loss of life and total damage to the main equipment.

  • Probability level (Pi)
  • This is the failure probability of the failure mode and is calculated by taking the ratio of the number of failures attributed to a failure mode to the total number of failures in the system under scrutiny. Classification of failures based on their probability level is given below:

    • Remote: Failure probability between 0.001 and 0.01.
    • Occasional: Failure probability between 0.01 and 0.1
    • .

    • Probable: Failure probability between 0.1 and 0.2.
    • Frequent:>/i> Failure probability greater than or equal to 0.2.
  • Criticality index (Ckmi)
  • This is the product of probability level and severity level for a specific failure cause while working with quantitative data.

  • Overall criticality index (Ckr)
  • This is the summation of criticality indices of all failure causes contributing to a failure mode.

  • Remarks
  • Comments on failure mode/effect and its criticality, including any recommended action(s) are to be entered in this column.

    Maintenance task identification6

    The result of FMECA is the identification of critical sub-systems and their failure modes. The critical failure modes are referred to as the dominant failure modes. These dominant failure modes are applied to the decision tree logic given in Figure 2, to identify a suitable maintenance task. Various maintenance tasks given in Figure 2 are explained as under: –

  • Corrective maintenance
  • In this the equipment is allowed to run till it breaks down and then maintaining it and putting it back to operation. Here, the maintenance tasks are reactive to the breakdown and the focus is upon how quickly the equipment is returned to service.

  • Condition monitoring
  • This is part of condition-based maintenance in which the equipment is maintained when measurements done for condition monitoring indicate an incipient failure.

  • Scheduled replacement of life- timed components
  • This is part of the preventive maintenance tasks. It is applicable to those items/components, which, on failure, would either endanger the personnel and/or equipment or reduce the operational availability of the equipment below the minimum acceptable level. Also, the failure mode due to failure of such items is not suitable for condition assessment.

  • Scheduled rework, adjustments, servicing
  • This also forms part of the preventive maintenance tasks. These tasks are performed periodically to ensure that the equipment remains in proper operating condition. This is always preceded by an inspection of the equipment to determine the extent of work required.

  • Evaluation in relation to risk
  • The following actions ought to be taken in case the failure is evident during normal operation but the function degradation is neither detectable nor is the failure rate increasing with age.

    • Default decision
    • Combined tasks
    • Corrective maintenance
    • Re-design
  • Scheduled functional verification by tests
  • This is carried out normally for hidden functions. Hidden functions are functions not exercised during normal operation of the equipment, e.g. fire fighting equipment. The sub-system or the item performing the hidden function must be exercised periodically to verify its functional capability.

Figure 2: Decision Tree for Maintenance Task Assignment

FMECA Centres

Quantitative FMECA of any equipment requires not only in-depth knowledge of the equipment but also the availability of a substantial amount of its failure data. Another factor that has considerable effect on failure mode of equipment is the terrain and climatic conditions prevalent in a particular area. Maintenance practices evolved as a result of FMECA studies carried out on the failure data collected from exploitation of equipment in high altitude and extreme cold climate can therefore not be applied to the same equipment if it is deployed in deserts or coastal areas. For this reason the data collection centres for FMECA studies should be terrain specific. Since all the equipment experts required for FMECA analysis cannot be placed at data collection centres, an economical and viable solution would be to undertake such studies at the equipment training establishments as project work by students undergoing equipment courses. The Field Failure data of equipment can be freely made available to them for this purpose.

Conclusion

Army workshops used to maintain a “System Fault Analysis Register” for various equipment. This was referred to whenever any rare snag was noticed in the equipment and it proved difficult to diagnose the exact cause of the snag. The register presented an excellent reference material to alleviate the misery of going through the detailed diagnostics to rectify such snags and it also maintained the knowledge base of highly skilled and experienced craft technicians, even after they were posted out from a particular workshop. FMECA worksheets of equipment being more systematic, elaborate and centric to MSI will offer the best maintenance schedules, prevent occurrence of any rare snags, and eliminate non-relevant maintenance, if any.

Although exhaustive and cumbersome, the procedure for Failure Mode Effects, and Criticality Analysis shall provide a very good input for the officers commanding the field workshops and EME Battalions to enhance the reliability of the equipment. It is thus worth undertaking the exercise, as this may lead to reduction in maintenance costs which get escalated due to equipment downtime caused as a result of poor or low reliability, as also the inability to exploit the inherent reliability of an equipment. The aim being that the duration for which the equipment is out of action is reduced significantly and the availability status of the equipment is improved.

  • 1. UD Kumar and Crocker J. Knezevic, “Maintenance free operating period-an alternative to MTBF and failure rate for specifying reliability,” Reliability Engineering and System Safety, 1999, pp.127–31.
  • 2. W.D. Coit and A.K. Dey, “Analysis of grouped data from field failure reporting systems,” Reliability Engineering and System Safety, 1999, pp. 95-101.
  • 3. Ibid
  • 4. Ibid
  • 5. A. Sols and J.A. Nachlas, “Availability of multifunctional systems,” Reliability Engineering and System Safety, 1995, pp. 69–74; Failure Mode, Effect and Criticality Analysis, MIL. STD. 1629A (U.S. Department of Defense, 1984).
  • 6. K.S. Wang, Y.T. Tsai and C.H. Lin, “A study of replacement policy for components in a mechanical system,” Reliability Engineering and System Safety, 1997, pp. 191-99.