My first post ever is about the A/B testing lessons I picked up through years of experience optimization testing at various companies.
I started out my A/B testing career at Travelocity with a bunch of ideas and a couple of excellent books by Bryan and Jeffrey Eisenberg. While it was a good start, only after running inconsequential and incorrect tests did I actually formulate my golden rules for running a winning A/B testing program. This post is not about the 10 easiest A/B tests you can run, and it is certainly not about selling you on the virtues of A/B testing. This post assumes you have experience with optimization and provides guidelines for bettering your A/B testing program. I tried cramming all my lessons into a single post, and realized that there was too much information to share! So here is part 1 of 2 on my top 10 rules for robust A/B testing:
Sounds simple, no? Actually this is the cornerstone of a good testing program that people often spend very little time on. Promise yourself that you will not run a single test without creating a testable hypothesis. It is not enough to write a one line summary of the test and call it a hypothesis! For example, ‘Test a free shipping message in the shopping cart page’ is a very poorly written hypothesis. A robust hypothesis has cause and effect clearly stated in the hypothesis. The cause is very simply what you intend to change, and the effect is what you expect to happen as a result of the change. The effect needs to be stated in measurable terms. Here is a more measureable hypothesis: ‘Emphasizing free domestic shipping in the cart page will increase overall conversion‘. Aha – stating this in measurable terms now exposes how weak this test really is!
The benefit of clearly stating what you are going to test is a great way to validate your assumptions and set expectations about the scope of the test. There are two to three things that can happen as a result of writing a good testable hypothesis:
- You realize the test idea is actually pretty weak and the scope needs to be re-visited
- You realize that the primary metric for measuring the test is incorrect (hint: think RPV and not just conversion!)
- You realize that the target for the test needs to be further narrowed (e.g. repeat vs new visitors)
In the example above, a possible test hypothesis will be: Emphasizing free domestic shipping in cart and checkout pages will increase RPV for first time visitors. Now we’re getting somewhere. You have now noted exactly what you plan to change, what your primary metric will be, and who your target audience is. Now sniff it – does it feel like a big test? If yes, great move to the next step. If no, back to the drawing board!
It is not enough to say, “I’ll measure it in Google Analytics or the A/B testing tool.” Ask yourself – “Where is the data I need to measure the outcome of the test? Is it neatly aggregated in a single location?” Unless you have killer integration across all your points of sale, chances are that you are going to need to pull data from a variety of systems to accurately measure the change. For example, your visitors may abandon the cart online, pick up the phone and place their order with a customer rep. Or the same customer might go into your brick and mortar store and place the order through your internal order processing system. But but but you say, this is an AB test! So if I don’t measure the end point in control, I don’t need to measure it in the test variant either, right? Not quite.
In the above test example, if the message does not address what the delivery timeframe for free shipping is, the test will drive more calls to the call center. And a test that would have actually been a win would become a ‘learning experience’ because you didn’t actually think through how you were going to measure the outcome.
There are other considerations here. If you are going to pull data from the call center, now you need to include the cost of a call as well into your calculations! If you want to measure new visitors only, then make sure your analytics tool can do that. You see how this can get pretty complicated? Don’t rely on your outsourced vendor’s A/B testing suite to tell you the results. Only you have the best handle on all sources of data. Create a model of the raw data you will use to calculate the change in Excel (or whatever other spreadsheet s/w you use). Below is an example of what I would construct for the test above:
This table tells me that I need to talk to the call center to ensure they can distinguish between variants. I also need to talk to finance to get the margin info, and to get the average cost of a call. Finally, I also need to make sure I can measure shoppers that get to the cart page in this case.
Here are three things you should expect to happen after this exercise:
- You realize that the hypothesis you wanted to test needs to be tightened up further
- You change the definition of your control and test variants and change the time needed to test
- You become well respected among your peers as a forward thinking individual
Now you might say that the A/B testing software you use gives a running estimate of how long till statistical significance will be achieved, so why in the world do I need to calculate the test duration?
Simply put, you need to know the opportunity cost of running the test. In other words, because you plan to run this test, what other test will not get done? And is it worth that cost? In the table above, I am able to figure out how much time will be required to test for a certain % change and the confidence level for that change. For example, looking at the table above, I saw that the minimum threshold to measure a 2% change in the primary metric was going to take 3.4 weeks at 95% confidence. This helped me make a call about the timing of the test and we deferred it for another shorter test.
If you want to know how to calculate the duration for an A/B test, contact me.
I hope you enjoyed reading my top 3 rules for robust A/B testing. Because these are fairly heavy to digest, I will cover rules 4-10 in my next post. Hope you enjoyed reading!