• info@helpingtesters.com
  • helpingtesters

Root Cause Analysis

August 21, 2017
Root Cause Analysis, Root Cause Analysis Tools,Root Cause Analysis Methods, RCA

One of the best examples for Reverse Engineering from Tester’s perspective is the Root Cause Analysis performed on Customer issues. This comes up when the product/application is launched LIVE successfully, and real-time users of the product/application face any issues and report them. 

Root Cause Analysis is the process to find out where in the stage the bug was introduced or missed-out to be identified in Testing Phase, once the application/product is released to the real world (Production launch). All the issues that come up post-production undergo the process of Root Cause Analysis wherein testers, test leads, test managers are responsible to dig up on it. These issues are usually termed as Post-production issues.

Reasons for Post-Production Issues to occur

The occurrence of Post-production issue can be due to any of the below reasons:

  • Unclear/missing/incorrect requirements – This should be identified during Requirement study phase itself to avoid issues occurring due to this reason. Conduct brainstorming sessions to avoid this kind of issue
  • Incorrect design / coding
  • Insufficient testing – May be due to tight deadlines, frequent build updates, resource allocation, lack of domain knowledge, lack of skilled testers
  • Environment issues – Setups, server issues, connectivity issues

Possible Types of Issues

When there is a post-production issue raised, the first thing to analyze is the type of issue – whether it is a real bug/environment issue/data issue / Out of scope / Testing error. Based on the type, analysis varies. Below is how it goes:

⇒ Real Bug:

  • Is this the known issue, if Yes, is the issue Open or Closed? If Open, provide a reference (i.e., BugID) of the bug which is already logged. If Closed, analyze on what could be the reason of the issue to re-occur?
  • If this is not the known issue (i.e., new issue), check the below:
    • Was it not found during testing phase (Smoke / Regression / Adhoc / UAT)?
    • Is this due to any particular settings on the product/application?
    • Which build probably have caused the issue?
    • Is this the impact of any other bug fix? If Yes, go through the bug whose fix could have caused this issue. Check for impacts tested when the bug was fixed.
    • Why was this not covered in testing?
  • Check whether there is a test case which covers this issue? If Yes, check for what was the execution result of the test case? If No, add the test case for the issue to achieve more coverage.

⇒ Environment Issue

  • Was this issue found earlier in any of the testing/development environment? If Yes, provide the reference (i.e., EnvironmentID) of the issue which is already logged and also the resolution made for it.
  • If the issue was not faced earlier, check the below:
    • Was this due to any of the server’s maintenance?
    • Request/response of the application/product during server condition that caused the issue? (Here any tool that captures request/response can be used. For example Fiddler, Charles)
  • Analyze whether this requires server restart
  • Analyze is this due to delay in server response
  • Analyze whether any other server component is causing the issue

⇒ Data Issue

  • What particular data set, has caused the issue?
  • Is this specific to customer data only?
  • Check for all settings, data types, data logics that the customer is using
  • Can we create the same set of data and reproduce in the testing environment?
  • Why was this kind of data not tested during testing phase?

⇒ Out of Scope

  • Is the issue out-of-scope for the testing team? If Yes, identify the team who is responsible for handling these kinds of issues and hand-over it to them (Usually, contents of Help Pages, Customer support process – live chat, email handling, localized text meaning, Terms & Conditions content, End-User License Agreement content)

⇒ Testing error

  • The way of identifying issue is incorrect? Document the correct steps to perform the action
  • Data used for the issue to occur is incorrect?
  • The issue is not the bug. It is working as designed and expected, but not known to user

How does the Root Cause Analysis happen?

If the post-production issues are valid and missed out during the testing phase, there are lot many processes to perform Root Cause Analysis on it as different projects perform in different ways. Let’s see the common process which goes through the below steps one after the other:

Root Cause Analysis meeting will be initiated from customer-side by providing the list of post-production issues that require RCA, along with the meeting schedule date and time.

Testing team will then go through the list of issues and perform Root Cause Analysis on each of them as below:

  • Go through the Production launch’s smoke result for the issue’s feature
  • Go through the Pre-Prod smoke result (usually UAT / Staging environment)
  • Go through the Testing environment smoke result
  • If the issue is not part of smoke test validation, check for the below:
    • From Regression testing phase – From the latest regression test cycle, find out which test case has the coverage for the issue and look for the execution result. If the test case is Passed, check out in the previous regression test cycles. If the test case is Failed, check the status of the bug that is logged/linked to it and analyze for:
      • was this not retested properly
      • impacts
      • may this is reintroduced in next build
    • From Adhoc testing phase – Check for all the ad-hoc issues that are logged for the feature.

If the issue is missed out being identified, check for:

  • Why there is no test case covering this issue
  • Why the test case review did not reveal this issue
  • Why the impacts are not analyzed for the existing bugs which might have caused this new issue
  • Why ad-hoc testing did not reveal this issue
  • Why requirement study did not reveal this issue

Prepare action items

  • Add test case to cover the issue with reference
  • If the issue can be part of smoke validation, include verification point in smoke test for the feature
  • Update the documentation wherever needed (Traceability matrix, user documents, guidelines, etc.,)

Document all the analysis for each issue with handful information and share it with all the stakeholders who will be part of the meeting.

In the meeting, issues will be discussed based on Root Cause Analysis provided and identified action item from testing team will be finalized

Stakeholders then will reach the decision on taking up the issue for fix/mark it as change request / defer them for the current release. This is known as bug triage meeting.

Once decided, the issue will be moved as the bug into bug tracking tool and maintenance team will be responsible for further tests on it.

Entire Root Cause Analysis process should not take more than a week and should be as quick as possible. The final decision taken here will impact the next release schedule, estimations, and budget.

Root Cause Analysis Tools

Use these tools when you want to conduct root cause analysis for a problem or situation:

  • Fishbone (Ishikawa) diagram: identifies many possible causes for an effect or problem and sorts ideas into useful categories.
  • Pareto chart: shows on a bar graph which factors are more significant.
  • Scatter diagram: graphs pairs of numerical data, with one variable on each axis, to help you look for a relationship.
  • 5 Whys: continuous questioning “why” benefits in identifying the root cause of a problem in the easiest way and it determines the relationship between different root causes of a problem.

Root Cause Analysis in Post-harvest meeting

This is also one of the important agenda items in Post-harvest meeting after production launch of the release, where what went good, what went bad, what could have been better, test reports, Root Cause Analysis will be presented to the management team.

Conclusion

Root Cause Analysis determines Defect Removal Efficiency of the team. It portraits the percentage of defect leakage to production due to lack of proper testing. This is always expected to be more than 95% for the project continuation decision from the Customer side.

Also, Root Cause Analysis helps in identifying which phase in Software Testing Life Cycle has the problem and correcting it in the future releases. It emphasizes to improve the process being followed in the project.

About the author

Nandini KS editor

Hello, this is Nandini.

Leave a Reply

Your email address will not be published.