A/B testing and statistical significance [CXL Review Week 6]

Share on linkedin
Share on whatsapp
Share on facebook
Share on twitter
Share on reddit

I succeeded because I failed more than you.

Seth Godin

A/B testing and statistical significance are underrated topics by digital marketers. Now, after I finished the chapters in the Growth Marketing Minidegree, I understand why: these are difficult to grasp concepts. At the same time, I see the power if I can master them.

Let’s start with the history of A/B testing. It can be traced back to 1996, but the first tools started in 2003. Before this time cookies were not used, instead the logs were analyzed. The real democratization of split testing was in 2010 when VWO and Optimizely hit the market.

In 2020 the value of A/B testing is that it builds effectiveness. Similar to the health industry where randomize control tests are used to make decisions. Or even better, a systematic review of lots of RCT because just one experiment does not prove much.

A/B testing and statistical significance
Understanding the position of A/B testing within the hierarchy of evidence.

Now we can all agree that A/B testing is valuable, but when to use it and more importantly how it should be applied is a different game. Let’s understand the 3 situation where split testing is valuable:

  1. research – we are not looking for winners, but to identify which elements are making an impact:
    • leaving out elements on the web page and identify which ones are the ones show positive, negative, or neutral signals, therefore which ones are important.
    • fly-ins: research if for example, social prof notifications (x people bought in the last 24 hours) are having an impact.
  2. optimize – like a deployment done by marketing (usually client-side) that hopefully it will be implemented. In this situation, we are just looking for wins.
  3. deploy – instead of pushing live a new feature or other changes to the website, we can shift traffic to the changes and identify if it has a positive or neutral impact. In both cases, you should go live with the changes.

Planning A/B tests

The planning phase starts with one question: Do you have enough data to conduct A/B tests?

  • ROAR model (Risk, Optimization, Automation, Re-think). If you have less than 1.000 conversions per month (transactions, leads, clicks), it will be hard to identify a winner. Still, if the conversions are not there, as mentioned in a previous chapter in the mini degree, you can run the A/B test as research.
  • statistical Power: the likelihood that an experiment will detect an effect when there is an effect there to be detected and it depends on sample size, effect size, and significance level.
  • Use calculators to determine if you have enough data.

The second phase of planning A/B tests is deciding on the KPI.

A/B testing and statistical significance [CXL Review Week 6] 1
KPI cannot be: AOV, avg. satisfaction, number of pageviews

Another big part of the planning is the research to get insights to build A/B tests. I already know that doubling down on user research is 80% of the process. Most of the time use surveys and cohort analysis before expert opinions. The 6V research model can be used to generate user behavior insights:

  • Value (CRO specialist, Data): What company values are important and relevant? What focus delivers most business impact in the short and long term.
  • Versus (CRO specialist): What competitor analysis and market best practices can be found?
  • View (Data &PSY): What insights can be found from web analytics and web behavior data?
  • Validated: What insights are validated in previous experiments or analyses?
  • Verified: What scientific research, insights, and models are available?
  • Voice (Psy & UX): What insights can be taken from the voice of customer data such as surveys, feedback, and service contact?

The next step in the planning pillar is setting a hypothesis. If you want to get everyone aligned you need to describe a problem, propose a solution, and predict the outcome. This usually saves time on having discussions.

Here is how to write a proper hypothesis: If {i apply this}, then {this behavior change} will happen, among {this group}, because of {reason}.

The final step is to prioritize A/B tests. On top of the frameworks I already mentioned in a previous review, I learned about power determination (when using the PIPE framework) and the importance of unique visitors and the fact that they must have seen the test page before they converted.

Split test execution

Design, Develop, and QA your A/B test is the core of executing split tests. Each step is different depending on the size of the team but as far as I understand, the key is to just do it and keep an eye on the data streams.

Happy to see that for the chapter on configuring the A/B test tool used is Google Optimize. Here is what you need to configure to run a proper A/B test:

  • create a variant called Default and one named Challenger in order to offer the same experience since we don’t control the tool.
  • the next step is to add the JS script and send a GA event with the respective variant information to be tracked.
  • run the experiment with the original variant traffic set to 100% for 2 weeks.
  • next, change the original to 0 traffic and move it 50-50 to the Default and Challenger. This is a pre-test selection. After you finish the test, and have enough data, in the post-test just move back 100% of the traffic to the original.
  • This is a better solution than to include visitors without cookies (new ones) and with cookie value > start test time.
  • finally, use the analytics solution already implemented in the company and not the tool analytics.
A/B testing and statistical significance [CXL Review Week 6] 2

User behavior is different on weekdays vs weekends. Therefore always consider as a time unit for the length of an experiment in weeks. Somewhere between 1 or 4 is enough to avoid dilutions but include full business cycles. Exception: when you run an experiment on logged in users. You can control better who see what and therefore you can run them as long as you want.

As the final step for executing A/B tests, monitoring is important and I am wondering if having live chat on the experiment pages.

Results of A/B testing

A/B test outcomes are closely related to statistics and deciding what to present to stakeholders. In my experience, data can be the weapon to shift decisions, but present it in a wired manner and everything goes down the toilet.

Therefore, what to present and what not to present is tricky because statistics are involved, but managers almost always care about how to make more money, therefore this is a key element in presenting the learnings. Also, there are some calculators that can help the business case for an A/B program. Doing a proper business case calculation of the program is important to show the value.

A/B testing and statistical significance

If you don’t know basic statistics then you can’t properly evaluate test results or even case studies of A/B testing. So the last chapter in the module is an overview of the statistical concepts that every digital marketer and certainly every CRO should know:

  • Sampling – Populations, Parameters, & Statistics
  • Mean, Variance, and Confidence intervals
  • What statistical significance (p-value)is and isn’t
  • Statistical Power
  • Sample size and how to calculate it
  • Regression To The Mean & Sampling Error
  • 4 Statistics Traps to Look Out For

Conclusions

I really enjoyed the intro and outro of the A/B testing mastery chapter. Having context before diving into the core content helped me understand how to position A/B testing to a client or inside a company. It is just like data: without context, it might not make much sense.

This article is the sisth in a series of 12 reviews of studying Growth marketing Minidegree at CXL Institute. Follow the minidegree tag for the entire series.

Stay In Touch.

  • contact [at] dascalescu.com

CD.

Let's Build Something.