Test states
Lari Lehtonen avatar
Written by Lari Lehtonen
Updated over a week ago

With Nosto's A/B Testing & Optimization, retailers can use different test types and methodologies to optimize their personalization strategies. As a fundamental functionality, Continuous Optimization lets users automatically drive site traffic to their highest performing variations of a test—avoiding poor performance and empowering constant optimization. While this is a fully automated process that doesn't require day-to-day monitoring, all tests are going through different lifecycle stages. From data exploration to data exploitation to results, each test reaches a variety of states.

In this article, you'll learn about the different states achieved by Nosto tests as well as how to interpret them.

1. Data Gathering

Full screen here

The data gathering state lasts for the first 14 days after a test is activated. During the first 14 days, you can explore the performance of any variation in a test using the variations table. However, we do recommend waiting a minimum of 14 days before analyzing the results of a test in order to cover the weekly cycles at least twice.


2. Test is significant

Full screen here

After 14 days, the data gathering phase is completed and a test can be conclusive and significant with one variation identified as a clear winner. 

If a variation in a test has a high probability (above 95%) of winning against all other variations while observing a regret below 1%, the test state will be announced as significant. Nosto defines significance as an existing variation in the test which has a high probability (above 95%) of winning against all other variations and a regret below 1%. You can safely end the test or wait for another weekly cycle to be completed to see if the trend remains stable over time.


3. Test is not significant 

Full screen here

After 14 days, the data gathering phase is completed and a test can be conclusive if there is one or more variations with regret below 1% but no variation is clearly winning by a probability above 95%. If this is the case, the test is not significant.

To learn more about what to do when tests are not significant, check out this article


4. Test is not yet conclusive

Full screen here

After 14 days, and if it is still too early to analyze the results of a test, it is clearly indicated that a test is not yet conclusive. There is no variation with a regret below 1% and the estimated time to wait to reach such threshold is below 60 days.

5. There is not enough traffic 

Full screen here

Looking at the trend observed in the first 14 days of the test, Nosto can predict that it will take more than 60 days to reach a conclusive state. If this is the case, it is declared that there is not enough traffic to fuel the test at a fast enough pace.

6. Equivalently Performing Variations

After 14 days, and based on the data collected within a test, it has not resulted in a clear leader between the tested variations. It's estimated that two or more variations are equivalently performing and it is unlikely to change. It's recommended to end a test and start testing other variations or change the optimization goal for example.

7. Indeterminate

The traffic used for the test is too unpredictable to provide a conclusive and reliable performance assessment. Collecting additional data by running the test longer may lead to a more definitive understanding. As an example, a test might become indeterminate if the traffic & behaviour is very limited and behaviour erratic without any clear pattern. Typically test becomes indeterminate if the audience subjected to the test is limited. Keeping the test active for additional few days might resolve the status, but if that's not the case, it is recommended to review your test configuration and start anew with different test parameters.

Did this answer your question?