Colin McFarland: A-B testing
Colin McFarland was talking about A/B testing at DiBi, these are my notes from his talk.
When you run lots of experiments they become more difficult.
Skyscanner use an in-house A/B testing framework.
A/B testing shouldn't just be for data scientists - can you reduce the barrier so that anyone can run a test. Understanding the pitfalls is key when you let anyone run tests.
- Confirmation bias - we have a tendency to focus on data that confirms our beliefs.
- HARAKing: Hypothesis after the results are known. Very easy to get false positives.
- Cherry picking, don't understand the null hypothesis theory. (You can get different results on different days.)
- If you get a big result ask where the errors are?
- Have a hypothesis you are looking to validate.
Test like you're wrong - assume nothing interesting is happening. Design like you're right - try and make something interesting happen.
Default to rejecting the test if the result isn't significant. Run fast experiments, you might need to extend them or make a bigger experiment. Think about what's measurable. Check out the Hypothesis kit from Craig Sullivan - it might not work for you without modification.
When we use A/B tests we find out that things we love don't work - 5% (or less) of tests are successful. You need to change how you work, you need to change track or iterate. The Ikea effect: you are more attached to something you create. Experiment calculator
Things may not move in the direction you want - they will move in the direction that users want. We build products based on truth. We can give ideas that would normally be rejected a chance, they get assessed by users not by internal stakeholders.
Run A/B tests to learn, they don't have to be entirely focussed complete features. Create a button and see if people use it - if they do then you can build out what happens afterwards.
Don't take one experiment as the truth, run variations to validate the core principle.
Can use A/B tests to ensure that changes don't have a negative effect. This means we don't have to purely data driven, there is room for experimentation.