This is a summary of Jason Cohen‘s Business of Software 2012 presentation.
Metrics – You’re Doing it Wrong
The way you’re approaching metrics in your business is wrong. How you’re using data is wrong, because the tools you’re using are wrong.
When asked to guess how many jelly beans are in a jar, individual answers are off by 67%. However, if you average all the answers together, the average is only off by 3%. People suck at guessing, but averaging guesses across people is a nearly perfect predictor of truth…sometimes.
The wisdom of the crowd only works for certain things. When asked to vote for the funniest joke, the wisdom of the crowd didn’t produce it. Crowds are wise when there’s a correct answer. Crowds are useful in objectivity, and destructive in creativity. Yes, crowds are not just neutral in creativity, in fact they are actually worse (i.e. destructive).
Take A/B tests as an example. A/B tests are usually not done right. Picking B over A because it is beating A by a little bit can be very destructive. Sparefoot, an Austin startup, ran an A/A test using Google Website Optimizer, and found that Google Website Optimizer had one A beating itself!
When what you are testing for is rare, the results are overwhelmingly wrong.
Jason has a great (and super cute!) article on his blog on easy significance testing (and hamsters).
The way you determine whether an A/B test shows a statistically significant difference is:
- Define N as “the number of trials.” N = A + B = total number of conversions
- Define D as “half the difference between the ‘winner’ and the ‘loser’.” D = (A – B) / 2
- The test result is statistically significant if D^2 is bigger than N.
Seek large outcomes from more traffic (as in 100,000’s of data points, not 10,000’s), especially if you are a small company. Go for the big stuff and shy away from the small stuff. A great example of this is Google’s famous 41 shades of blue test. Google ran this experiment to determine which shade of blue received the most clicks. Running this test with 2 shades of blue, and picking one winner with a 95% confidence level, leads to a 5% chance of a false-positive. Running this test with 41 shades of blue, leads to an 88% chance of a false-positive.
Test theories, not headlines. Don’t spitball headlines. First form a theory about why a change would be better, then test it. If the theory turns out to be invalid, think about what other assumptions could be wrong. Invalidating a theory gives you an opportunity to think deeper, and come up with another theory. Examples of theories:
- People from the UK like to see a lot of vowels in the word color.
- On this page people are ready to buy.
- At this point, the user is ready to talk to customer service.
- People that land on this page by searching Google with the keyword “security” would like to see a testimonial about security.
Which metrics matter?
Which metrics actually matter?
- Growth rate?
- Cancellation rate?
- Conversion rate?
- Total revenue?
- Support costs?
Which variables should I care about? The ones that have the biggest impact on growth, revenue and cash.
Let’s take a hypothetical affiliate program for a SaaS product as an example, and figure out what’s important. Affiliate program parameters:
- 20 sales/month
- $100 average affiliate on time payout
- $20/month average new monthly recurring revenue (MRR)
Using a simple model in a spreadsheet, it looks like we will break even in about four to five months. Now let’s add a 15%/month growth. That 15% growth causes 50% more costs. But that doesn’t count cancellation rates, or affiliate customers being lower quality (i.e. cancel more). The end result: your dead (with a 10%/month cancellation rate). If we then increase the price by $10/month (50% more MRR), we’re back to breaking even at 4 – 5 months.
Affiliate program optimization priorities:
- Increase MRR.
- Start with small payouts (threshold). Don’t try to optimize the hell out of it, just make sure to stay within the box.
- Prevent cancellations by affiliates.
- Growth is bad until 1 – 3 are solid.
- Pick 1 – 2 key metrics to optimize.
- Pick a few thresholds to watch via simple sensitivity analysis.
- Fold real data back into the model.
- The wrong process is worse than nothing.
- Test theories, not “see what sticks.”
- Use real significance tests (hamster test), and seek large effects.
BoS in your inbox.
Don't miss any more updates! Leave your email address (unsub anytime and we'll never sell your details).