A / B TESTING

5 min readJan 5, 2021

Let’s start with the history of A/B testing. It can be traced back to 1996, but the first tools started in 2003. Before this time cookies were not used, and it was experts digging through log files of websites to understand real behavior. It involved making a few changes and then starting the machine to see if there was any real change. The real democratization of split testing was in 2010 when VWO and Optimizely became popular.

The value of A/B testing can be measured from this quote by Jeff Bezos — “Our success at Amazon relies on how many experiments we do per year, per week and day.” This point is used to stress on the fact that doing enough experiments is key for optimization success.

Key scenarios where split testing is valuable:

Research: looking for impact, not winner

leaving out elements on the web page and identifying which ones are the ones that show positive, negative, or neutral signals, therefore which ones are important.
fly-ins: research if there’s social proof notifications (x people bought in the last 24 hours) are having an impact.

2. optimize — like a deployment done by marketing (usually client-side) that hopefully it will be implemented. In this situation, we are just looking for wins.

3. deploy — instead of pushing live a new feature or other changes to the website, we can shift traffic to the changes and identify if it has a positive or neutral impact. In both cases, you should go live with the changes.

A / B Mastery Course from CXL Institute Growth Marketing Mini degree

Planning A/B tests

The planning phase starts with one question: Do you have enough data to conduct A/B tests?

ROAR model (Risk, Optimization, Automation, Re-think). If you have less than 1.000 conversions per month (transactions, leads, clicks), it will be hard to identify a winner. Still, if the conversions are not there, as mentioned in a previous chapter in the mini degree, you can run the A/B test as research.
Statistical Power: the likelihood that an experiment will detect an effect when there is an effect there to be detected and it depends on sample size, effect size, and significance level.
Use calculators to determine if you have enough data.

The second phase of planning A/B tests is deciding on the KPI.

Most of the time use surveys and cohort analysis before expert opinions. The 6V research model can be used to generate user behaviour insights:

Value (CRO specialist, Data): What company values are important and relevant? What focus delivers most business impact in the short and long term.
Versus (CRO specialist): What competitor analysis and market best practices can be found?
View (Data &PSY): What insights can be found from web analytics and web behavior data?
Validated: What insights are validated in previous experiments or analyses?
Verified: What scientific research, insights, and models are available?
Voice (Psy & UX): What insights can be taken from the voice of customer data such as surveys, feedback, and service contact?

The next step in the planning pillar is setting a hypothesis. If you want to get everyone aligned you need to describe a problem, propose a solution, and predict the outcome. This usually saves time on having discussions.

The Hierarchy of Metrics discussed in the course, from least important KPI to measure to the most important KPI is as below:

Clicks: you can get significant uplift by asking people to click something
Behavior: shifting behavior is something you can optimize
Transactions: Really good driver to optimize
Revenue per user: better goal to optimize than transactions. Achieved by lowering cost of product but hard to do an AB test on
Potential lifetime value: Golden metric.

There are some guidelines to choose the right metric:

KPI needs to be binary, which means that it should have a definitive outcome in terms of conversions.
The distribution graph of the metric used for AB Testing should follow a normal distribution curve.
KPI can’t be average order value, average satisfaction, number of pageviews etc. because these are not binary metrics and are an average of metrics over a certain period.

The final step is to prioritize A/B tests. On top of the frameworks I already mentioned in a previous review, I learned about power determination (when using the PIPE framework) and the importance of unique visitors and the fact that they must have seen the test page before they converted.

Split test execution

Design, Develop, and QA your A/B test is the core of executing split tests. Each step is different depending on the size of the team but as far as I understand, the key is to just do it and keep an eye on the data streams.

Happy to see that for the chapter on configuring the A/B test tool used is Google Optimize. Here is what you need to configure to run a proper A/B test:

create a variant called Default and one named Challenger in order to offer the same experience since we don’t control the tool.
The next step is to add the JS script and send a GA event with the respective variant information to be tracked.
run the experiment with the original variant traffic set to 100% for 2 weeks.
next, change the original to 0 traffic and move it 50–50 to the Default and Challenger. This is a pre-test selection. After you finish the test, and have enough data, in the post-test just move back 100% of the traffic to the original.
This is a better solution than to include visitors without cookies (new ones) and with cookie value > start test time.
Finally, use the analytics solution already implemented in the company and not the tool analytics.

As the final step for executing A/B tests, monitoring is important and I am wondering if having live chat on the experiment pages.

Results of A/B testing

A/B test outcomes are closely related to statistics and deciding what to present to stakeholders. In my experience, data can be the weapon to shift decisions, but present it in a weird manner and everything goes down the toilet.

Therefore, what to present and what not to present is tricky because statistics are involved, but managers almost always care about how to make more money, therefore this is a key element in presenting the learnings. Also, there are some calculators that can help the business case for an A/B program. Doing a proper business case calculation of the program is important to show the value.

A/B testing and statistical significance

Knowing basic statistics is important in evaluating test results or even case studies of A/B testing. So here are a few statistical concepts every CRO should know:

Sampling — Populations, Parameters, & Statistics
Mean, Variance, and Confidence intervals
What statistical significance (p-value)is and isn’t
Statistical Power
Sample size and how to calculate it
Regression To The Mean & Sampling Error
4 Statistics Traps to Look Out For