Skip to main content
All CollectionsA/B Testing & Optimization
How to interpret A/B test results
How to interpret A/B test results

Learn more about terminology and methodology and how to interpret test results

Lari Lehtonen avatar
Written by Lari Lehtonen
Updated over a month ago

Introduction

A lot of factors influence the results of any A/B test. From the intrinsic performance of each variation that are part of a test or the stability of each of those variations over time for example, it can be challenging to interpret the results of a test and to decide which variation should remain live at the end of a test. At Nosto, we have built a set of functionality, making it easy to understand in which state a test is in a given time, how variations are performing in a test and why they are performing as such. 

In this article, we will cover how to interpret the results of Nosto A/B tests using the reporting view, based on three typical use cases, positive, negative and neutral and other related details

  1. Key principles

  2. Dashboard Terminology (quick glossary)

  3. Positive improvement

  4. Negative Impact

  5. Neutrality

  6. Conclusion

Key Principles

In any Nosto test, the performances of some variations are compared to the performance of a base variation, serving as a control or baseline variation. It’s important to remember that any variation can be chosen to be a base variation at setup time using the Nosto testing wizard.

For the sake of simplicity and consistency, we will illustrate how to interpret results focusing on Conversion Rate optimized tests where two variations are tested against each other. However, the same analysis methodology applies to tests where more than two variations are tested against each other and if other optimization goals would be selected. 

Dashboard Terminology

  • Visits: Overall size of the audience who were targeted with the variation

  • Converted visits: Overall size of the audience, which converted as buyers after being targeted with the variation

  • Conversion rate: Conversion rate percentage of the audience who were targeted with the variation

  • Improvement: Improvement expectation presented as a boxplot: Reveals lower and upper bound values and the expected median improvement

  • Winner probability: Calculated probability of variation to win the experiment. When probability of winning is higher than 95% and regret lower than 1%, a variation is considered as a winner.

  • Regret: Represented as a regret percentage. Upper limit (95th percentile) of potential loss if the variation would be chosen compared to the other variations.

  • Current split: Traffic allocation per variation

  • AOV: Average order value of the audience who were targeted with the variation

  • AVV: Average visit value also known as revenue-per-visit of the audience who were targeted with the variation.

  • Bounce rate: Immediate exit or bounce of the audience who were targeted with the variation

  • CTR: Click-through rate of the campaign by the audience who were targeted with the variation. Note: If a test optimizes for either average order value, average visit value or conversion and uses that as an objective, clicks and consequently CTR are assessed anywhere on the page. In case a test uses CTR as an objective, Nosto assesses only clicks on campaigns.

  • Sales: Overall sales generated by the audience who were targeted with the variation

  • Shows: Total amount of times the variation has been shown to the target audience.

  • Clicks: Total amount of times the variation has been clicked by the target audience. (note CTR above)

Positive improvement

First, let's explore the results of a significant test where a variation is winning over the base variation and yields a positive improvement on Conversion Rate, the optimization goal selected in this particular use case.

Full screen here

In this test, there is 98.63% probability that the Variation B is winning over Variation A (Base variation) on Conversion Rate. Selecting the variation B, can yield an improvement compared to the base variation between +0.49% and +7.85%, indicated by the boxplot.

If you hover over the boxplot the full range of expected improvements can be explored, looking at the lower and upper bound values and the expected median improvement: 

Full screen here

In this example, it's safe to say that the expected improvement is around +4.08% but there is an equal probability that it falls between (in the most pessimistic scenario) +0.49% and +4.08% or between +4.08% and +7.85% (in the most optimistic scenario). In either case, the improvement over the variation A (Base variation) would still be positive, which is a very positive signal for choosing this variation!

Another way to interpret the results is that if the same test would be run 10,000 times in the exact same condition, the Variation B would still win over the Variation A about 9,863 times with an improvement of the Conversion Rate that would most likely fall between +0.49% and +7.85%. 

If for any given reason, the Variation A would be selected, the potential loss of relative Conversion Rate would be 7.23%, which can be interpreted as the cost of making the wrong decision. In any test, the lower the regret (the loss of conversion opportunity) is for a variation, the better. However, regret can also go above 100%, depending on performances of each variation in a test. 

Note: An alternative interpretation of regret is by How much better may (in the worst case) another variation perform based on the observed performance in a given test.


Negative impact

Let's now take a close look at the results of a significant test where the Variation B doesn't yield a positive improvement but rather negatively impact the Conversion Rate of the Variation A (Base variation). 

Full screen here

In this test, there is 100% probability that the Variation A (Base variation) is winning over Variation B on Conversion Rate. With the Variation B, can be expected a negative impact on the Conversion Rate of the base variation between -2.21% and -3.64%.

If the boxplot is hovered, the full range of improvements can be explored, looking at the lower and upper bound values and the expected median improvement: 

Full screen here

In this example, it's safe to say that the expected improvement (or, negative impact in this case!) is around -2.94% but there is an equal probability that it falls between (in the most pessimistic scenario) -3.64% and -2.94% or between -2.94% and -2.21% (in the most optimistic scenario). Either case, the improvement over the variation A (Base variation) would not be positive, which is a very strong signal for not choosing this variation and enabling the base variation instead.  

If for any given reason, the Variation B would be selected, the potential loss of relative Conversion Rate would be 3.65%, which can be interpreted as the cost of making the wrong decision. In any test, the lower the regret (the loss of conversion opportunity) is for a variation, the better.

Neutrality

Finally, the third scenario is focussed on a non-significant test. In the test below, none of the variations has a high enough probability of winning, or in other words, above 95%. 

Full screen here

In this test, there is only 81.99% probability that the Variation B is winning over Variation A (Base variation). With the Variation B, can be expected an improvement on the Conversion Rate of the base variation between -0.24% and +0.66%. In other words, the improvement can either be positive or negative. 

If the boxplot is hovered, the full range of improvements can be explored, looking at the lower and upper bound values and the expected median improvement. 

Full screen here

In this example, it's safe to say that the expected improvement is around +0.21% but there is an equal probability that it falls between (in the most pessimistic scenario) -0.24% and +0.21% or between +0.21% and +0.66% (in the most optimistic scenario).

It can be challenging to know if it's the right choice to choose Variation B. However, in this case, if it would be selected, the potential loss of relative Conversion Rate would only be 0.17%. In this case, it's important to also notice that the Variation A also has a regret rate below 1%, at 0.58% exactly. If Variation A would be selected, the potential loss would also be very minimal.

You can read more about the next steps when a test is not significant in this article.

Conclusion

In general, relying purely on data in the world of A/B Testing is a safe bet but it only tells half of the story as it never explain why variations are winning or losing. Beyond intrinsic results, there can be a myriad of reasons why customers would behave in a certain way and shop more when they are exposed to a given variation. 

For example, would one still confidently declare that a variation is a losing variation if one would know that a majority of the products users ended up viewing were temporarily discontinued because of a deficiency in your buying process or some inventory issues? 

It's always recommended to analyze further to really get to the bottom of the results by backing up number based test analysis. Nosto's layer of Merchandizing Insights helps to unveil and identify these opportunities. You can learn more about Merchandizing Insights here.  

If you want to learn more about how the attribution works for any Nosto A/B test, you can find a dedicated article here.

Did this answer your question?