Carefully following a proven process, team knowledge and management support are 
critical for success.
by Mark A. Latino, Reliability Center, Inc.

Root cause analysis is a core objective in many facilities, yet problems continue to hinder significant improvements. The difficulty could be the abundance of problems that must be solved and/or the quality of the root cause analyses may not be adequate, which allow problems to recur. This article will cover the wide range of views regarding what root cause analysis is and how it should be performed. The systems required for root cause analysis to be a productive business process in an organization will also be discussed.

Problem Solving

Many methods of problem solving are available. They range from home-grown processes developed by the facility to vendor-provided methods. The methods may be easy-to-use or somewhat complex. They also vary in their ability to eliminate the recurrence of problems or equipment failures.

Some methods are called problem solving and some are called root cause analysis (RCA). These different terms require definitions. Without the definitions, a problem solving method could inaccurately be called troubleshooting, problem solving or RCA interchangeably. The following definitions help clarify the methods:

  • Troubleshooting is a process of elimination (trial & error)—eliminating the potential causes of a problem.
  • Problem solving is a systematic search for the source of a problem so that it can be solved.
  • RCA is examining problems down to their latent root causes, which may include deficiencies in management systems and restraining cultural norms that allowed the failure to occur.

Troubleshooting is neither problem solving nor root cause analysis because it is not a systematic approach. Troubleshooting is a form of trial and error, and its ability to solve problems is solely dependent on the troubleshooter’s skill and experience with the problem.

Problem solving fits the blueprint for continuous improvement but leaves the depth or shallowness of the investigation up to the user. All too often, problem solving stops too early in the investigation process to eliminate a problem’s entire failure mechanism. In many cases, problem solving identifies the physical root causes of a problem, but it is not designed to uncover latent system issues.

RCA investigates a problem to a depth in which the physical, human and system deficiencies are exposed for resolution. This depth will eliminate a problem’s recurrence, and the corrections can be leveraged in other areas where the same system problems exist (see Figure 1).

Root cause analysis versus other problem solving methodsFigure 1. Root cause analysis versus other problem solving methods

Most organizations do not compile a mission statement for the problem solving process because they believe the mission should be apparent. Organizations that do compile a mission usually wrap it into another program requirement, such as continuous improvement.

Continuous improvement is a term used often in problem investigation as a part of the mission or in some cases as the mission. As with many other terms, continuous improvement has many interpretations. When a problem is solved, it is often considered to have met the continuous improvement requirements. However, how often does the same problem repeat at some later date? If the problem recurs, does the solution still qualify as continuous improvement?

The answer depends on the definition of problem solving used by the organization. Many problem solving methods meet continuous improvement interpretations by simply returning operations to an uninterrupted work process and postponing the return of the problem to another time.

Should incremental repairs be performed as continuous improvement or does labeling these in this way encourage employees to only improve problems slightly and settle for mediocrity?

Problem elimination is a more in-depth problem solving mission. The failure mechanism must be identified and eliminated so that little to no chance of a recurrence is possible. This also meets the continuous improvement requirements but on a quantum improvement basis rather than an incremental improvement basis.

Both definitions of continuous improvement are important when problems are divided into a two-track approach for failure avoidance. A two-track approach is proactive because an opportunity analysis tool separates problems into two categories: significant few and random many.

The “significant few” problems are 20 percent of the issues that result in 80 percent of the losses spent for repairs. The “random many” problems are the remainder of the problems, which account for only 20 percent of losses (see Figure 2).

Problem eventsFigure 2. Problem events

Significant issues should always be solved for elimination, and all other problems solved for incremental continuous improvement gains. What cannot be eliminated should be prevented and/or fail-safe. Often, problem mechanisms can be eliminated but only if operators take the time to conduct an in-depth RCA investigation.