Lessons in A/B testing – Part 2

A/B testing

In my previous post, I covered my top 3 rules for A/B testing.  I’ll continue where I left off and will wrap up the remaining 7 rules for A/B testing in this post

This rule will save your job.  Nobody likes surprises – not your CEO and certainly not the customer service representative who has her hands full handling customer issues.  So make sure you socialize the heck out of your A/B testing program.  It isn’t enough to send out an email notifying of a test launch – that is the lazy optimizer’s approach that is sure to get you kicked out of A/B testing eventually.  Socialization actually happens way before the test is even built.  Here’s a high level process that worked really well for me:

1. Invite the entire company to submit test ideas to you.  Don’t set up meetings! Just make the process efficient and collect ideas through remote means.

2. Curate the list of ideas and rank them for benefit/execution effort/time.  Your prioritization process should not be a public event, but the outcome needs to be very public.  People like to know what happened to the idea they submitted.  Even if you say, you considered it, they will be happy that they were not ignored.

3. Review the best ideas that you have selected with a group of stakeholders. Remember, the goal of the meeting is not to ask permission, but to address any concerns people have.  Hopefully, you have done an excellent job of collaborating before this so the meeting is more of a formality.

4. When the test is built, involve the key stakeholders in QA.  If the stakeholder is too high up in the chain of command, grab somebody from their team for the signoff.  This is certainly not a CYA, but instead it ensures that process intricacies are accounted for when testing.

5. Notify the entire company when tests are launched (don’t just send to stakeholders and expect them to forward your email). Let people know exactly what the test variants are and how long you expect to run the test.  Some A/B testing platform providers recommend that internal traffic be excluded from the test.  I beg to differ!  Unless your internal traffic makes up a big chunk of all traffic, you shouldn’t have to worry about the traffic diluting test results.  Your internal customers are your best friends who will let you know should something go wrong.

6. Notify the company when you end the tests as well.  During the testing process, it is not unusual to get issues reported to you from across the company.  Expect this to happen and embrace it.  You may have to make decisions on stopping, fixing, and re-launching the tests, and that it OK.  It is better than coming to an incorrect conclusion.

7. Socialize the results.  Do A/B testing roadshows, talk about wins and non-wins (or lessons for the politically correct)

 

This is by far the worst part of any A/B tester’s job profile.  QA is boring, I’d rather take a nap than QA any day!  QA is a necessary pillar of conversion optimization though.

 

quality check

 

You need to think of the test as a mini-project and have to go through all the test scenarios for the part you are A/B testing.  Work with your organization’s QA team to come create test cases.  However, do not leave it to the company’s QA to find all the issues.  You are the one closest to the test, so you should personally QA the A/B test.

Don’t forget to QA across all major browsers.  IE still holds a significant enough portion of the browser share to be a pain, so don’t ignore it.  It is very common to have to code specifically for IE.  So either exclude IE from your test altogether, or code for IE.  Just don’t ignore it.

And oh, don’t forget to test for the major tablets and mobile devices.  Did I mention how much I hate QA?

At the end of your QA, you can expect two things to happen:

  1. You send the test back to the development team for bug fixing
  2. You change the scope of the test to exclude a certain segment of the population because your test code can’t handle it

 

I had an agency optimizer once deliver a 1 hour monologue on why exclusively testing different colors and font sizes is so important and that you really have to run a full factorial test or Taguchi test to determine the best color and size combination for the highest benefits.  A few months later when I looked back at the incident, I realized just how wrong this approach was. Let me stick to the point though and I’ll tell you the rest of the story in another rule.

Once the test is done, your A/B testing tool will provide you the lift that your variant generated.  Validate the percentage lift as mentioned in Rule #8. Run your own calculation on the estimated benefit of the test.  Third party tools tend to provide an annual benefit and that benefit may not be seasonally adjusted.  The assumption that the lift you saw will last 1 full year is hopelessly wrong.  It is so wrong that your chances of winning in Vegas are higher.  So go conservative on the time duration of the benefit.  Estimate it to be 3-6 months. Why?  There are competitor changes and site changes that are outside your control, and are bound to impact test results.

And then there’s the sneaky suspicion you have had all along that your visitors eventually will tire of the shiny new thingy and conversion rate will go back to what it was.  There I said it – the A/B testing wins don’t last as long as the A/B testing gurus and vendors want you to believe they do.  In the color and text size example above, the win lasted about 3 months.  There were other problems with the color and size test, though.  It isn’t the color that matters, but just the contrast of the section compared to the rest of the page so it can catch your visitors’ attention

Internet marketing has got to be one of the most incestuous industries out there.  Once a new idea is out, a dozen sites will clone the idea overnight.  Competitors regularly copy each other’s site experience in the hopes of not being left behind. That’s why once you A/B test an idea, make sure you re-test it within a few months.  You’ll then be able to explain to your CEO why the benefit of your biggest A/B test didn’t move the bottom line as much as he thought it would.  See Rule #6 for more on this.

In the color and font size test, we re-tested the hypothesis in 3 months, and guess what? There was no measurable lift between the new control and the new test variant!

Pages change, code breaks, analytics lies.  S**t happens.  So make sure you have a second source of truth to validate your test results.  The best A/B testing tools integrate with a wide variety of analytics tools.  So validating what your A/B testing tool results shouldn’t be hard using another analytics package such as Google Analytics or Adobe Analytics.

Now don’t try to exactly match up analytics with results from an A/B testing tool.  You can spend months spinning your wheels and get nowhere.  All you are looking for is whether the two tools directionally say the same thing.  The way each tool measured a visit or a visitor or even a transaction is slightly different, so you will never really be able to get the exact same lift or drop from both tools.  But wait! What if the lift in one tool is statistically significant while the lift in another is not?  This brings us to rule #9!

At any given time, you should be running at least 1 A/A test.  An A/A test is nothing but running the same experience as a two variant test.  The goal of the test is to measure the level of noise in your system/website/A/B testing tool.  Now there is some debate about the usefulness of an A/A test.  To that I say –  Are you freaking kidding me???  An A/A test is THE most accurate indication of how good your tools are, and the minimum threshold of change you need to target.  For example, let’s say the noise in your system is 0.6%.  That means, given a random set of 2 samples, your key metric is likely to be 0.6% different for no apparent reason!  With this knowledge, you now know that your minimum threshold for lift should be > 0.6%.  A/A tests are a must – run one at least periodically if not all the time.

Many a time I have heard people mention the number of test they are running at any given time as a badge of tremendous success.

Here’s the problem.  The number of true A/B tests one can run depends on the amount of traffic and the number of pages on a site.  There is a finite limit.  If you work for Amazon, then you probably have enough traffic and enough pages, and enough resources to run 30 tests.  Or maybe you work at Dell, and have a whole bunch of sites to test on.  If not, you are better off running a small number of quality tests rather than a large number of low value tests.

The other problem with running a lot of tests is the sheer number of resources needed to manage the changes.  As great as your A/B testing vendor might be, fact is that you need to socialize the tests, write up requirements for them, create the hypotheses, QA all of them, and measure the impact of changes.

So the next time someone mentions the large number of tests a competitor is running, explain to them the practical impossibilities of running 30 tests on a site that has a small number of pages in the main funnel.  And remind them that running 3x tests does not result in 3x benefit.

This brings me to the conclusion of my top lessons learned when running an A/B testing program.  What have your experiences been?

Does Your SaaS Need a Marketing Boost?

Grow your SaaS with a results-driven strategic marketing partner. Fill out this form to get started.
Author Profile
Abhi Jadhav
Abhi Jadhav is the head chef at Bay Leaf Digital. His primary goal includes driving value for all clients by ensuring learnings and best practices are shared across the company. When not brainstorming on client goals, Abhi focuses on growing the agency at a sustainable pace while making it a fun, collaborative, and learning environment for all team members. In his spare time, you can find Abhi at a local Camp Gladiator workout or on an evening run.