The Joy of Proving Yourself Wrong: A Case Study in A/B Experimentation


Getting A/B experimentation to work for you is a science that goes beyond purchasing a tool. It’s about creating a culture of learning and celebrating failed hypothesis alongside the winning ones. It’s about putting processes in place to make sure you’re following a rigorous testing regimen. And along the way, it’s about making a lot of mistakes.

Through this interactive session, we’ll go through a case study of a recent experiment my team ran and you’ll have a chance to set the variables up yourself. We’ll learn about how complicated these experiments can get and how to discover if you’ve made an error in your test setup. We’ll look at what you should do to build on the momentum of a failed experiment. Finally, I’ll share operational processes I’ve put in place on my team to reduce the likelihood of making these mistakes in the future.

A/B testing is complicated and even the best teams make mistakes. This session is aimed at experienced practitioners who want to improve team processes, learn how to design tests so that they provide accurate data, and recover from those errors when they do happen. It’s about finding the joy in a failed experiment and making sure each test is set up so you’re guaranteed to learn something. A strong experimentation regimen can take you from amateur tinkerer to data-driven guru, where every failure makes you smile.

Learning Outcomes

  • How to structure test hypotheses when you plan to do multiple rounds of testing
  • When to do single-variant versus multivariant tests
  • Why you should always plan on iterating on your winning variation
  • What channels of communication you need to set up between engineering, marketing, design, and leadership to iterate on a/b experimentation
  • How to use qualitative research to drive quantitative research agendas

Additional Details

About the case study

The content for this session will largely be communicated via a case study that gets more complicated as the presentation goes on. We’ll start with setting up what appears to be a very basic A/B experiment, but as we “run” the experiment in the session, we’ll learn more and more pieces of information that complicate the project. The session will end with the audience starting to think about the second variation on the test to learn about why the first test failed, but we will probably not have enough time to actually begin testing the second variation.

Throughout the examples, I’ll also share different steps and processes I have put in place on my team to reduce the likelihood of these surprises coming up mid-experiment.


As a workshop, this session is best when between 75-120 minutes. It can also be done as an interactive session. It’s hard to provide the exact timeline for this, as we’ll be teaching the lessons through the questions and comments people bring up, rather than through a set, pre-programmed schedule. This is really designed to be a discussion-led activity. There is no Q&A at the end because the session is designed for questions all along the way.

  • Refresher on the basics of A/B testing
  • Background on this particular case
  • Setting up the initial experiment
  • Complications – an undiscovered bug
  • Complications – confounding data
  • Complications – cyclical patterns
  • Wrap up – what does the next iteration look like?

Main topics to cover include

  • Why you should test everything: The power of measuring the success and failure of everything you release.
  • Hypothesis framing: How to use qualitative research to set up your hypotheses for A/B experimentation
  • Different kinds of tests: When to use multivariate versus single-variation experiment structures and what they are
  • Iterative experimentation: Why set up iterative experiments where you run one test and follow it immediately with a new variation, and how that should change the way you design your tests
  • All the things that can go wrong: Different, unexpected ways your experiment can be ruined due to poor data, and why the only way an experiment can really fail is inconclusive data, not a losing variation.