Why Data, Statistics and Numbers Can Make You do the Wrong Thing | Jason Cohen, WP Engine | BoS USA 2012

1 comment

Jason Cohen, CEO of WP Engine, has been worried that too many people take A/B Testing too seriously, or at least take it seriously without understanding what it can do for your business. He is worried that not enough people know when to use AB Testing and when you should do something else.

Video & Transcript below

He makes an excellent case to focus on the things that make the biggest difference in a startup, before you have enough data to make data-driven decisions. He rightly points out the fallacy of focusing on small incremental improvements using AB Testing and the like without enough focus on getting things right.

“When you’re a small company you need to seek very large outcomes from any optimization because any subtle things like a 10% increase or whatever you literally can’t measure them”.

In this talk, Jason Cohen covers, amongst other things:

  • The mathematics prove the problem with AB Testing
  • Common AB Testing mistakes
  • 41 Shades of Blue – the Google debacle
  • What are the most important things for companies to think about optimizing?
  • Workshopping optimization for an affiliate programme to understand the key value drivers an therefore what you should focus on optimizing.

Video

Next AMA: Clarke Ching, 23rd February 17.00 GMT.

Register for Clarke Ching Hangout

Don't Miss a Thing - Get BoS Updates

Want us to let you know about new talk videos, speaker AMAs, Business of Software Conference updates? Join the smart people who get BoS updates. Unsubscribe anytime. We will never sell your email address.

Transcript

Jason Cohen: Hello, everybody. Yeah, if you were here last time you know why I have to start off with a picture of my kid. I was in a conference once and a twitter said ‘How come all these geeks are putting pictures of their kids in their presentations? Are they trying to prove they are not virgins?’ [Audience laughs] Well, because I’m a parent I’m also a student of Dr. Seuss. And Dr. Seuss accidently summed up exactly what it’s like to be an entrepreneur. And, how I feel about this conference, what it’s for and why I love it.

“At the fork of a road
In the Vale of Va-Vode
Five foot-weary salesmen have laid down their load.
All day they’ve raced round in the heat, at top speeds,
Unsuccessfully trying to sell Zizzer-Zoof Seeds
Which nobody wants because nobody needs.

“Tomorrow will come.
They’ll go back to their chore.
They’ll start on the road, Zizzer-Zoofing once more
But tonight they’ve forgotten their feet are so sore.
And that’s what the wonderful night time is for.

Dr Seuss, Sleep Book.

That’s how I feel most of the time anyway, just crazed and exhausted trying to get people to buy stuff. But what I feel is that Business of Software Conference is the time to stop running for just a little bit. Get your head out of email, and tickets, and the terminal and SMS and all that. And just, relax for a little bit and get inspired; and look around there is 350 of your peers here. That never happens, everyone here is doing something.

That’s cool I made friends here that I’ve kept for years. I think the hallway talk here is often better than the speakers, and a lot of people I think agree. That’s what makes this conference so amazing actually so, I implore you to use that. And for the speakers of course, inspiration and ideas I know I literally filled up one of these pages here with ideas for myself just listening to the nameless first speaker. And I hope to give you some ideas too.

And in particular, I think that for most of you the way you’re approaching metrics an optimization in your company is wrong. And the reason it’s wrong is because the tools you’re using tell you wrong things. And much of the things you read on the internet is wrong, but I have an education in statistics and in the past fifteen years I’ve started four profitable tech start ups. And in particular I’m running WordPress Engine, and in the last two years we’ve made millions of dollars. Based on my perspective of data and my technique of optimization, and so I want to show you all of those things too so you can use exactly all of those things to transfer your things as well. So, let’s just get started.

Three statisticians are in a forest hunting deer with bow and arrow. And they see one, and so the first statistician aims and shoots and misses left. And the second one aims and shoots misses right, and the third one says: “We got it!”. [Audience laughs] So it doesn’t work for that, but it actually does work for; that exact technique works for many other cases. This neat study made by a guy named Jack Treynor (and if you’re a Wall Street person you’ve probably heard of the Treynor Ratio and that is the same guy). And so what he did in his university class was put a jar of jellybeans on the desk and ask people to estimate how many jellybeans there are. And they’d write them down, pass the slips forward, and they’d look at it. And the first thing that you learn is that people are crap at guessing about jelly beans, in this particular real example the lowest was about 250 and the highest about 4,100. The real amount of jellybeans in this particular example is about 1,100. And on average an individual guess was about 67% off, in orders words everywhere crap.

But here’s the interesting thing, if you averaged the number on all the slips of paper. The averages were almost identical to the truth, in fact the average was only 3% off better than all but one guess by a student. So it’s kind of the statistician thing, and so this phenomenon is called ‘The Wisdom of the Crowd‘ and it’s neat and it basically means that the errors cancel out in a group and you can get to the truth. And how powerful is this? You start thinking wait a minute so I can use user voice to try to select features for my product. And adwords and AB testing to determine the best marketing messages for my product. I can use crowd sourcing to get design and marketing assistance and ideas and all this.

Like, Oh my God there is so many tools especially today to take advantage of this. Except, that this doesn’t work all of the time. For example, in the UK they did this project over a year called ‘Laugh Lab’ where they tried to find what was the funniest joke in the English language. And so there were tens and thousands of jokes and millions of votes, and you want to hear what the funniest joke is? I’m going to read it just to make sure that I get it exactly right, this is the funniest joke.

“Two hunters are out in the woods when one of them collapses and he doesn’t seem to be breathing and his eyes are glazed. The other guy whips out his phone and calls emergency services, you can tell that this is in the UK by the way. In the US we’d just say 911 in the UK it’s ‘Emergency Services’ good one, that’s not part of the joke. [Audience Laughs] He gasped my friend is dead I don’t know what to do! The operator says calm down I can help, let’s make sure he is dead. There is a silence, and then a shot is heard. [Audience Laughs] Back on the phone, the guys says OK! Now what?”

It’s pretty good, it’s a pretty good joke. Best joke you’ve ever heard anybody? Top 10? Not even Top 10? It’s the funniest joke.

It’s not bad, but the Wisdom in Crowds did not produce the funniest joke. And that’s interesting because it did produce the correct amount of jelly beans, and so the difference is… or the way to understand the difference to me is a good example would be, ordering pizza for a crowd like this. And I like pepperoni and olives, but then this person doesn’t like olives, and then there is a vegetarian so we can’t have pepperoni, and then someone is lactose intolerant so we can’t eat cheese so we have to get soy cheese but they can’t eat soy cheese. So by the time we are done, it’s some kind of plain horrible pizza that nobody likes. That nobody likes, because it’s nobody’s not even close to favorite pizza. So when it comes to matters of style and perspective and creative work, we find Wisdom of the Crowd does exactly the opposite. It cuts away the interesting edges, leaving you with something bad.

And so that is what we kind of used to summarize this particular technique, that when there is a single objective, correct answer then this can be a useful tool for zeroing on what that might be. But when it’s creative, when it’s style, when it’s perspective then I feel like it cuts it away. And in fact it’s interesting it’s not just that, it actively finds a bad answer. It’s destructive it’s worst than almost any other choice you can make, when it’s something creative. And this is an important point because a lot of times with metrics and optimization techniques stuff like this, we’ll I’ll just pick I’ll just use whatever comes up as this technique. Which may or may not be statistically significantly better but it’s better than what I can think. I might as well pick it and this is a good example of why I might as well pick it can actually lead to a worst decision. It’s not good enough to say I might as well pick it, it has to be something systematic about it and in particular this is the true AB test where you often don’t see a particular difference. Screw it, I’ll pick the one I like or screw it, it’s B. B is beating A by a little so I’ll pick is, this is really destructive and actually I want to get into AB testing more because obviously this is primarily the way we do business for many optimizations in metrics in our company.

Like getting paid using conversions and all kinds of important things, so here is a real AB test done with google website opitimizer which I think is no longer in existence anymore. This comes from a company in Austin called ‘Spare Foot’ which I emphasize is a really smart company, extremely metrics driven, and they constantly drive their revenues up and costs down and they are really awesome at this. This one of their AB test, and this all the good things about AB test it’s very clear that orange is beating blue consistently with a nice thick margin there. It’s a lot of ‘n’ 26,000ns that’s a lot of visits. So there’s no problem here with the actual visits here, right? And google says 85% chance to beat, which that is that particular tools way of saying there is a 85% chance to beat, that difference is systematic and real. And not just due to chance, which sounds like a lot 85% is a good bet.

Except, this is not actually an AB test, this is an AA this is a test of a page literally against itself.

And yet I’m sure everyone in here was ready to pick orange and think that we learned something about whatever difference it was that it had. So it’s important now for me to explain exactly why this data is not conclusive so that you can be similarly not fooled not only by AB test. But in general about this kinds of stuff, so I want to show you exactly why.

And I’m going to do it with a drug testing sample test, so suppose that I’m an athletic director of a race with 105 athletes. And I suspect some of them are juicing, some of them are taking steroids and that’s not cool. So I’m going to test them, and the truth is that out of 105 athletes 5 of them in fact are juicing. I don’t know that, but this will help us understand what happens. And I have a test that is to known to be 95% accurate, so what actually happens when we test the athletes. Well first of all, of the 100 that are clean the test will be accurate in about 95% of the cases we test. So I put 95 of them will show clean, and of course 5 of them will incorrectly show up as juicing cause it’s 5% error. And then for the five that are juicing probably it’ll correctly identify those and none there. Of course in reality there is an extra one over here and one over there, now we know this but as the old athletic director I don’t know this. This is all I know, the 10 athletes have been identified as juicing and my test is 95% accurate which I trust. So throw em out, half of them aren’t but the test which is quote on quote nominally 95% accurate half of them are not juicing. Half of them, and the reason is the thing I’m measuring the thing I’m trying to detect is rare. And when the thing I’m trying to detect is rare the normal error from the vast majority is overwhelmed by the real positives. That’s what happened here, and this example is incredibly relevant to this is a conversion rate thing and look a the numbers 2%, 3%, 84%, so in the in the juicing example it was a 5% thing and that was sufficiently rare. Being 95% sure was literally flipping a coin, and this is even rarer this is 2 and 3% so 95 is flipping a coin at 5% this is worst, You’re going to need 95% maybe even more, because the thing you’re looking at is looking at like a conversion rate is rare.

And that’s fundamentally why this didn’t work out, the reason they even did this test is because for a year they had been doing all these AB test and picking winners and proceeding believing they were learning about messaging and improving their stuff. And then they looked back over a year and said wait a minute our conversion rates are the same after a year at the beginning of the year, even after we’ve had all these successful test. What the heck? [chuckles]

And so they started saying the testing software itself must be faulty, so the testing software they ran this test and still this concluded that the testing software was faulty. And [chuckles] it’s not, as proof they let this system continue to run. And oh, I’ll show you that in a second and I’ll show you the tests they ran to show proof. First I want to show you this correctly since 85% is not a enough, so I wrote a blog post about this a year ago in which there was a hamster choosing.

We were seeing whether the hamster liked organic or non-organic food better. And it was the same thing, like he chose one more than the other but really was the same thing. So I made this easy and put the post at bitly.com/abhamster and you can get the details and the formula and stuff. I’ll show you the formula right now because I made a super duper simple formula, that is still statistically accurate that’s in the bottom of the post if you care about such things. And so we will run it real quick to see how simple it is, so therefore you zero excuse not to use this.

So first you compute the total number of conversions that happened which in this case there is 700 in change and 600 in change, so on the end that is the A and B test so that’s easy. And then you compute half the difference between those, also very easy right? Compute half the difference between them. And then this is it, see how simple it is if D squared is bigger than N it’s statistically significant, 96% by the way that makes the math really easy. And so in this case it isn’t so it’s not significant and that’s it. Actually B is not significant that’s why, so it’s cool. They didn’t believe me so they kept running the test, and sure enough after a month of running the test eventually they converged like they should because there isn’t a difference. But meanwhile they wasted a month, because they wouldn’t believe the hamster test.

But actually even the hamster test may not be a good test for something of low, something so rare you may want the square to be bigger than that or run your own test. But as a counter example, there’s another company in town called OtherInbox and they did use the hamster test not the whole time because they read the article.

This is literally all the data for something like a year or two, a year in a half of them doing the AB split test on a certain conversion page. And the blue line is the A test on the normal sort of current known good test and the orange one is their B test. It’s actually already interesting already because look at how much variation, natural variation there is in just the normal test, the normal blue test it’s a lot. In fact it’s more than lot of variation should be, which is more again the same point and there all these points along the way where they might of thought the B won, but hadn’t. They knew they weren’t learning and that they weren’t improving things, although they were trying new things and radically different things to achieve a big difference. And they did as you can see they have this one huge increase where they increased the conversion by 50%, and they would not probably have found that had they thought they were making huge incremental progress along the way.

And by the way this is sidebar, it’s a landing page with a 60% conversion rate. This is incredibly bad ass if you want to… So here is what it looks like and I’ll let you draw your own conclusions about whether you like it or not. I’m not going to make a big point about landing pages, I just thought that was interesting. So first of all you got to trust real statistics not just 85% that’s what the tools say, sound good?

Next AMA: Clarke Ching, 23rd February 17.00 GMT.

Register for Clarke Ching Hangout

Don't Miss a Thing - Get BoS Updates

Want us to let you know about new talk videos, speaker AMAs, Business of Software Conference updates? Join the smart people who get BoS updates. Unsubscribe anytime. We will never sell your email address.

Also when you’re a small company you need to seek very large outcomes from any optimization you do. Because any subtle things like a 10% increase or whatever you literally can’t measure them. Look they had 24,000 data points and still couldn’t measure a difference between them if you have fewer data points obviously you can see even smaller increases. You can’t see smaller differences in larger data points, besides if you’re smaller who cares if you have a 10% increase, if you’re only getting 1 or 2 sign ups a day and that goes up 15% how would you know? And it doesn’t change anything can you quit your day job now? That doesn’t fundamentally doesn’t change anything. As the size of the company or stage of the company is smaller than earlier the only valuable changes are big changes, double the conversion rate. Three times as much traffic in the first place for you to go optimizes, not only because you can’t tell the difference in the AB test in something more subtle but because that is the only thing that’s going to change our life and the company. And it’s not until you get really big that you’re going after subtle things.

Don’t get excited when 37 Signals says well we spit balled some headlines and they made a big difference. That’s fine eventually when you have enough people going to the site and that kind of stuff. It makes a impact in the business and it’s measurable. But at first I would shy away from all that and go for big stuff. However, even knowing all that and I’m going to explain all of that correctly it’s still not going to work right and I’m going to show you exactly why.

Speaking of big companies with tons of data, that are able to look at subtle things. Google maybe has some of the biggest sets of data that exist, does anyone remember 41 shades of blue controversy? Some people do? So what happened here is there is this designer here named Doug Bowman. And he was Google’s chief designer, maybe chief designer architect or whatever kind of title it means that you get to do the design work for Google. Which is pretty sweet actually, and he designed things like Gmail, the original Gmail. And in doing Gmail his boss who you’ve also probably heard of Marissa Mayer she said, oh you know what we need to do? Is get people to click those ads more in the corner. Because who here clicks those ads here while they were using Gmail? You know I’ve never met anyone who did, never. So I guess that’s why they wanted people to click more, certainly a lot more page views right? So they said ok here’s what we will do, we’ll test 41 different shades of blue for the link color for those ads and see what if we can get people to click more. And see what shade of blue causes them to click. And Bowman said oh ok, and he quit. [Audience laughs]

The reason he quit as Chief designer of Google is he said this I get you guys are data driven and all that, and I’m OK with that. But I’m a designer and there are much more important things about design than trying to control the color there, even if the goal is to get more clicks. And I can’t and won’t operate in a company that is so driven this way. And so good for him, so he left Google and went to Twitter. You know the new twitter face, that’s Doug Bowman too where he’s not driven that way. So, I talked to Doug about this and I said hey Doug, I got something really cool to tell you. You were right, but not just because you were an artist and you get to you know say things like that and it’s cool. But you’re actually right mathematically too, he was totally uninterested in this argument. He was like I don’t care, [Audience laughs.]

But to be fair though, he’s probably gotten so much crap and stuff you know he’s probably sick of talking about it. But I want to show you guys why mathematically Google was wrong, because once again this directly impacts you when you guys run AB test to avoid these problems just like the other one. So suppose you did the test of just 2 shades of blue and indeed one was the winner indeed with a 95% confidence level. 95% meaning we are 95% sure that this is a real thing and not a false positive. So there is only a 5% chance of a false-positive and that’s pretty good, so we are sure who is the winner and rightly so. But if I ran another test and so there is this one and another test so I have two tests, either one could have a false positive. So the chance of a false positive goes up to 9%, and if I run 10 of these how many false positives that gives me. Theres are 40% I get a false positive, and if there is 41 shades of blue it is almost certain that one of them will show up as a false-positive which is exactly what happened, I mean I don’t know if it really was a false positive but probably. So, this is always true whenever you have run a lot of tests there is always error and so on. So not only when you have one test are you adding false positives but in a series of test too. And the reason that

I’ve told you all of this, is because what is an AB test program and your business other than test this window it wasn’t significant, test this headline, not it didn’t work, how about this headline, and then on the seven try your like, WE FOUND ONE! Except how do you know it’s not 41 shades of blue, how do you know you’ve found one? Especially because usually it’s spitballing headlines right? Let’s use this word, let’s use that word, let’s be more intentional, let’s use an exclamation mark. And then one works, and you’re like shoot, I don’t even know what’s the typical one. I don’t know why the headline worked, but it does.

Even those awesome 37 Signals people say, we’re not really actually sure why. Maybe in retrospect we come up with some theory but I mean we don’t really know why. It’s amazing! You should do it! Right? So the fix for this, actually it does something else for you too but it also plays as an mathmetical fix for you and it actually makes your AB program more valuable to you anyway, I’ll explain how. But the idea is that you’re going to form theories about why you’re doing a particular test, instead of spitballing crap.

When forming a theory there is actually more valuable things that come out of it, and I’m going to explain a bit more now.

So let me give you some very concrete examples, one of them is think that at this point when the customer gets to the pricing page. They are ready or a hard sale, I should change the headline for example to ‘Buy Now’. And not look at all of our great pricing plans to pick from. Or I want to funnel them in to do a certain action or a certain button, so I’d remove a certain side bar. Because I want to focus them, that’s an example of making a theory and other stuff. Maybe I think maybe people who come from this traffic source may be interested in seeing ‘x’ next. People who have just searched for security might want to see a landing page that has a testimonial from a customer about security or a big list of stuff that we do about security. Or I think that people from the UK would like a landing page where we use too many vowels in the word color.

Something like that, but depending where they are, or why maybe they’d like to see something, that’s a theory maybe I think _____ wants to see something next. Another theory would be I think that at this point the person would like to chat with a human, and again that’s a tricky one right? On a homepage is a chatbox pops right up it might piss immediately is that good or bad? It might piss people off – but it might engage more people.

That’s a good question or maybe not a homepage but this one, these are all interesting theories but not the spitball . I think that once they are on the pricing page then they are ready to talk to a friendly person that might encourage them to proceed, because they made it to the pricing page. Here’s another theory that my company diligently believe. I tell you some of the results of this one and this is a really common one we think people, would want to watch a movie on the homepage instead of just reading text. And the ones who do will be more engaged and understand the value of our products, and benefits better, and therefore more likely to buy. And certainly more like to look at more pages and get more involved in our website. Good theory right? So in particular I’ll just get a little deeper on that one since we did it. So we put a video and we AB tested on having a video and almost no one watched the video. And those who did, did not buy more frequently.

And then we thought, well we got to get more people to look at the video so we put a big play button on it. And then that helped and then they did watch the video more, and still the people who watched the video were no more likely to buy. But what we did see is that they spent more time on the website, except when we got statistics data about how long they watched the movie. And the exact amount of extra time they spent on the site to watch the movie, was the exact amount of time as the movie. So they did tecniquely I suppose stick around to watch the movie but, it didn’t influence anything interesting afterwards.

And we tried other variations and what not, and we never got it to work. And that doesn’t mean that it could never work, but what it does is give us counter evidence to decide if the idea of a movie will help people buy. In our particular case, but this thing about the theory here is why it’s so powerful.Because first I form the theory about people who watch the video are more likely interested or they something they didn’t know before. Or more interested in buying all this stuff since it’s invalid what else does that mean? Are there assumptions we’ve been making around peoples engagement or not? And what they might of been seeing in other pages and what gets them to buy? Or whats important about us? Because I had a theory we were not just spitballing video or not video it allows me to think more deeply and have other ideas of what might be better for the homepage.

By the way we did come up with some other languages for the homepage, with other theories about what people might want to see. And we got the bounce rate to 20% with just adwords which is pretty cool, in other words because we had a theory we knew we were invalidating something. Because we knew to come up with other theories, now ultimately on the pricing page we came up with when it is a good time to start funneling people in. So suppose you had that theory and you funneled people with stronger headline language, you know ‘Buy Now’ and suppose it worked. Again how do you know it’s not 41 shades of blue? Again you don’t quite know yet, but since you’ve started to validate a theory you can just go further with that theory. Well, if they want to be squeezed lets continue to squeeze them.

And if you’re right about the theory, then very rapidly you can start to make a lot of additional progress on that pricing page because you’re taking that theory all the way in. And that’s way more value than spitballing where you haven’t learned anything, in the land of start-up all they talk about is learning. Well here it is in practice in a real way. You haven’t learned anything from spitballing, but now you’ve have with the practice. And on the other hand, if it turns out to be wrong you’ll also find out because you’ll try to do all this stuff and it won’t work. So that’s how you get around the math, but it’s also a much more valuable process for you to do anyway. One more slightly example of the same thing is in Google analytics, another thing we all do, we collectively do is we log into Google analytics and we start making charts.

And if people from Bulgaria seem to like this page, I don’t know why but seems kind of cool – maybe we should advertise in Bulgaria or something and people who spend a lot of time on the time, which might mean they are interesting or they are confused and can’t find the information that they want. But we see all this stuff and we start researching on it right? The thing to notice is that, that is the exactly the same thing again as the 41 shades of Blue thing. Where you’re taking tons of data and simply mixing and matching and in that way you will find things that are significant in one since but spitballed and made up in another.

So that’s OK, you just treat that exploration as idea generators, theory generators, and not as facts. So hopefully many of the things are valuable and true, and you’ll find that out because you’ll go and make theories out of them and test them. Instead of just validating things as just true. OK, so all that specific stuff is packed in around things that you do optimizing, and looking at numbers and using numbers. But perhaps an even more important question is which should I be looking at?

What are the most important things for my company to be focusing on?

Should I be focusing optimizing the growth rate as high as possible? What cancellations increase too? Is that ok? Or should I be focusing on cancellations instead? That implies that my company is delivering a good product or service if they stick around. So that’s valuable or maybe other things such as conversion rates and so on, how do I know or for a particular product inside my company, how do I know which variables or what way to spend my time and effort measuring and optimizing etc. Probably say I can’t optimize 17 things at ones, probably going to pick which ones are the most important. So I want to give you my technique that we use for pretty much everything at WP Engine, for the entire company all the way down to the specific projects I’m going to tell you exactly what we do.

And we’re going to workshop that thing against a specific example so you can see exactly how it works. So the important thing that we’re going to do is an affiliate program for a ‘SaaS’ business program. I imagine a lot of the companies here are, and a lot of you could potentially use an affiliate program so this I realize that this is directly relevant to a lot of you. The process is very easy and can be applied anywhere, so first this is the starting assumptions we are going to make about the affiliate program. Obviously since the program hasn’t started yet we’re making this up but we are just trying to understand how the program works. So it’s ok if these numbers are made up as long as they are in the ball park. So we’ll assume that we make 20 sales per month through the affiliate program, so we’ll assume this is a small company and a small affiliate program. And we pay $100.00 to an affiliate when someone signs up, and that person that has signed up has signed up for a plan that is $20.00 per month so every sign up represents $20.00 a month in recurring revenue. So the first thing you can see right away is that because I pay $100.00 a month in the first place and only get back $20.00 a month, it’s going to take me 5 months just to make up for the 100 bucks. It’s either four or five depending on when you charge for that first month, I think that’s a brain teaser for some interview questions right?

So the first thing I do is I make a very, very simple spread sheet, you can call this a model or whatever but it’s super, super duper simple. It’s just this is how many sign ups it’s going to be so this is how much I’m going to be paid. This is how much new monthly recurring revenue I got in that month, so this is all the recurring running revenue I’ve built up so far in that month. Super simple, and so of course you can graph some of this stuff so here is me graphing exactly the month you’ve seen on the first slide. Where you can see we are paying 2 grand a month, because it’s 20 sign ups times a hundred grand each. And the revenue starts lower because over time I’m gaining all those customers and then the revenue starts continuing to add on, and then there’s that total cash used which is adding those two together. Or subtracting the two from the revenue right? So I’m losing money at first and then it starts to turnaround at some point. And I can see and this is just a useful point of reference, is that right where it is about four months is where I actually start to make money on this damn thing. And I’m going to be four grand in the whole when that happens, so that’s kind of interesting to know.

Maybe if you’re a small company it’s actually good to know that this is actually going to cost some type of money and maybe I should have it in my pocket by then. But supposing this works, maybe I can actually get more affiliates and it’s not out of the question that you can grow it by a certain number of affiliates a month, the first one is 20 the next on it’s 23, the next month 26 that’s not crazy. So what happens when you add them at 15% month growth and affiliate count? So it just takes longer to catch up because I’m spending more money and faster and in particular it’s going to cost me 6 grand now and take me 6 months to catch up but already we are seeing an interesting effect, and this is why it’s so helpful. Because I only added a 15% growth and there is a 50% increase on how much I get back, how long I can ride that money. In other words, something of a small change in growth, just adding three or four more affiliates, has at least to me kind of changed the nature of all this. But so far nothing outrageous, so this model is wrong.

However, this assumes that we never ever lose customers, and that’s just not true and further more the customers we get from affiliates are often lower quality and have a higher cancellation rate than normal customers. So let’s just say for sake of argument that we have a 10% cancellation that is 10% per month cancellation rate, so now what does it look like? Oh oh, we’re dead we never make money because even a 10% cancellation rate is not good enough to kill the whole thing and make money.

So now we have a metric that is really important, we got like our first clear thing that is very important like oh this something I need to track? I probably need to optimize and you start running through more things in your head like, OK I probably need to track it per affiliate because of course some affiliates are good and some affiliates are crappy. And I’ll want to kill the crappy ones because obviously they aren’t profitable. Ok, I know I’m going to use this build stuff we have an important metric. So lets continue and again the name of all this stuff is sensitivity analysis and there likes models and tools you can play with if you want to. Of course you don’t need to you can just make a spread sheet just like this, so next thing we keep all this the growth and the cancellation rate we’ll keep but now we say you know $20.00 a month isn’t enough. And now we see if there are things we can do to increase the amount of customers, all kinds of things. One of the most common things that a small business hears from an adviser or mentor is, raise your prices. Right Patrick?

Yes, raise your prices. Double them or something, haha. There’s a company I was talking to at Capital Factory in Austin, a sort of a combination of co-working and accelerating and all kinds of exciting stuff. I was talking to a company there and I forget what their prices were, but I think it was $16.00 to do it. And I said, that’s really small try doubling your prices and see what happens? And of course they haven’t been in touch with me for a couple weeks. And finally they did it, just doubled their prices and they came back in a week and I said what happened? And they said well we doubled our prices and the sign ups were absolutely unchanged no one said a word nothing changed, we just doubled our prices. I said no one complained? They said no one complained and nothing changed except we are making more in profit, about double the profit.

I said good what are you going to do next? They said I don’t know this is really cool maybe we are going to buy some adwords with all that money? [Audience Laughs]

No! You just said no one complained so double your prices again. [Audience Laughs]

Do that until somebody complains about it of course. Until the sign ups go down a little or make some tiers or something right? OK, so the point is maybe you can just increase prices, maybe you can just have different tiers, you can just reorganize what the tiers even are, maybe you can incentivize the affiliates and somehow give them extra or a little something or a coupon or somehow incentivize them to get people to sign up for a bigger tier. Lots of lots of ways, but the point is let’s suppose you can add ten dollars of monthly recurring revenue just by various techniques. Optimization and testing and these things, if you focus on that maybe you can perhaps raise the average by ten dollars.

So that’s a 50% price increase from ten dollars, so ten dollars is definitely not out of the question you can most likely do that. By the way our prices went from $19.35 to $1300.00 through a serious of price increase. Of course we added features and stuff it wasn’t just for nothing. But it’s not crazy at all to start marking up, at WordPress Engine also in Janaury we changed our prices what was on the tiers, and what was in the tiers and lots of things that of course we did through customer development.

So we started figuring out what we wanted to do there. And we doubled, doubled the average amount of money a person gave us. And sign ups went up so that was especially nice, so we did customer development projects two months ahead of that to try and understand what we were going to do. Not for nothing, but still the point is imagine this company can get $10.00 a month through recurring revenue and then all this stays the same then what happens? It’s amazing we are right back to where we started, the only 4k in the hole and it only takes about 4 months to start paying it back. More monthly recurring revenue even a little bit more fixes everything, it’s awesome. You can’t do better pretty much, so we have a new winner for most important metric get more people to sign up per month. And that in itself doesn’t sound terribly surprising, maybe it’s surprise because of how insane it is. Maybe doubling the number of people, how much more it is than cancellation these things might be much more surprising. And it might be different with different companies, with your numbers right and different projects for your companies. For example at our scale which is much more different and larger than your scale our cancellation rate is much, much more important to us than it is on this model.

A .1% change in cancellation rate because we have so many customers has a real impact in the company, we can literally hire a couple more people on .1% that’s crazy. That’s what I mean it’s obvious but it’s not necessarily obvious, and yet the model doesn’t have to be so complicated either. So we have a new winner for most important KPI, so now we get all cocky and we’re like sweet now that I have this all dialled in also the numbers on here are bigger we are also making more money on here.

So you get cocky and you’re like I got to get more affiliates here, I got to throw gas on this fire. So of course affiliates are given offers all the time, and of course the bigger ones are given even bigger offers so let’s pay them 150.00 a sign up instead of 100.00 since we’re so much better at finding these guys. So you increase that by 50% also, and you’re back to dead again for some reason going between 100 and 150 just blows the model apart again, to me it’s kind of surprised me it did that to a one time cost but there you go.

So here’s how I take all that information and synthesize that into so here’s what we are going to do. So here is monthly revenue included all into one thing, and also because there’s no way this is only beneficial to the affiliate program, surely this just makes my company a lot more money. So that’s going to be my number one priority clearly, the second thing is I have to be careful about those pay outs, because it’s hard to reduce those pay outs to affiliates after setting them up. No one wants to see that go down, so be careful there is some line between 100 and 150 bucks but I don’t even want to be close to that line. Because again it’s hard to change, and that’s another thing not every metric is where you are tracking it an optimizing it continuously.

With the pricing, because there is so much to be gained there, it is quite possible to do that continuously.

But with payouts it not something you’re constantly changing by a dollar and optimizing by some change, it’s something I think of as a threshold it’s important that you track it and know what that number is, but really what you think is I’m going to set that thing and watch it, make sure it doesn’t go out of the box. Otherwise I’m going to try to improve my business, and optimize those other metrics and so on.

So an example for most companies in our industry, there’s not a single person I’ve met with cancellation rates under 2% it seems like a magical floor where people generally turning their projects and stuff. OK, so if our cancellation rate is 2.2% I know there is not much more room there, I suppose I could try to increase our conversion rate instead, but if that cancellation rate comes out of a certain box like a 2.5% then all of a sudden it does become more and more destructive. And it’s not worth extra growth if I have a cancellation rate like that. So what I think of cancellation for us is a threshold where as long as it stays in a box I’m not trying to optimize the crap out of it. But if it suddenly came out of the box then it would become more important, so thresholds and then in the affiliate hole we are important don’t forget so that would be my number two thing I’d watch. My number one thing would be is the money the income, but then my number two would be my cancellations and just a meta-observation about this if I grow the affiliate program faster whatever happens gets exacerbated the bad gets exacerbated, and that’s fine that’s just good to know, and therefore I’m going to make sure I start small and get these numbers right and make sure these numbers are working right before or else I’m going to blow myself away.

And these are one of these areas where if you’re funded you can afford to go to blown away right away and if we lose money on that, thats fine. Because we’ll trade that for if that is successful we’ll get there faster and quicker and so forth, might be a good trade if you have money to spend and of course if you don’t have money to spend it’s a terrible trade. And you want to potentially start slow and look where you go.

One more example of this I’ll throw out there, I want to show you all the charts and stuff because that kind of thing is like key, at WP Engine we have a large support staff in fact more than half of our 30 employees are at support. And so it’s an interesting question how to make the most of support, because if that is where most of the people are obviously people are the most expensive thing. And so you want to not make that bigger if you can help it, so we model it like crazy we look at stuff the number of tickets the new customer creates in 30 days, because that is much higher than what the regular customer creates. We track both of those separately, and then we track how many tickets can a support person do and still have time to do a good job. We want to not be underwear and go on vacation or get sick or something and when does it break?

Unfortunately earlier this year, we didn’t have enough people and we did break and we kind of know where that threshold is unfortunately we know now where that is. Which we can make a model of this, which is important because we hire a couple of people a month 2 or 3 or so which is important because we need to hire ahead of that otherwise you can get behind by 6 or even 10 people real fast in just a couple of months if we don’t know that very well. So in the scaling phase we often need that stuff so that you don’t get behind, you don’t need that stuff earlier on.

But again the question is what should we optimize, again how many tickets a month do people do? Because if it’s more than we don’t need that many people that’s pretty obvious. Well what’s interesting is if you twiddle that number well let’s suppose they can do 50 to 100 tickets a day, it basically doesn’t change the modeling plan you still need to hire about 1 or 2 people a month, it basically doesn’t impact the material stuff. Because you shouldn’t go to someone and say hey maybe you can knock off a few more or so tickets a day. Besides being a crappy work environment it also turns out to be not beneficial for the business to do, but I thought it would had been. It turns out that the number of tickets our existing customers create per month is interesting, and our hiring plan changes a bit when it’s .1 or .5 per month. In fact our hiring plan changes a lot, and also changes on the growth rate of the company because so many new customers create so many tickets. That too is more important than how many tickets a human being produces per month, and that leads to more thoughts like oh maybe for the new customers, maybe if I have a knowledge base I can send them to. That’ll hopefully make certain tickets close faster, and hopefully educate them enough to go back there next time. Sometimes in other words it produces totally different ideas for what we should be doing instead of flogging people in hopes them doing more tickets per day which we should NOT do. And fortunately I think for all of us because that would be a crappy culture anyway.

So again, high level strategy is find 1-2 key metrics that you’re actually going to obsess and watch over, maybe a few more that are thresholds that you’re not going to obsess over but watch over. You’re going to use something like a simple excel model and play with the numbers to determine what they should be. And the final thing which I haven’t mentioned before is that as you find the real data you actually run the affiliate program or whatever, of course you want to put that data back into the model and rethink your model. Our model I rethink every month we put in new numbers are they trending that way or they heading somewhere else, are they still sustainable and all that kinds of stuff. So of course putting more data back into the model is infinitely better than putting back into the model what you had in the first place, and once you continue to do the same analysis and make sure you’re doing the thing that is indeed the most important thing.

Don’t trust the tool that says 85% primarily, or if it is 85% that’s fine but now you know it may not be enough. So you’ll probably have to make a model about that, but especially when you’re small but even data sets of 24,000 or even as big of data as we have you’re always trying to see large results and maybe even a test that’s showing a 5% difference, well who even cares it’s probably wrong, it’s probably not true, and it probably doesn’t matter anyway. So it’s just not useful to seek little things, and mostly when you read something online or you see a tool and someone tells you just heard regurgitated by a bunch of people who probably never really ran a company, who probably doesn’t know much about statistics you’ll ignore that and know that you’re doing the right thing. Thanks. [Audience Applauds]

Next AMA: Clarke Ching, 23rd February 17.00 GMT.

Register for Clarke Ching Hangout

Don't Miss a Thing - Get BoS Updates

Want us to let you know about new talk videos, speaker AMAs, Business of Software Conference updates? Join the smart people who get BoS updates. Unsubscribe anytime. We will never sell your email address.

One response to “Why Data, Statistics and Numbers Can Make You do the Wrong Thing | Jason Cohen, WP Engine | BoS USA 2012”

  1. […] Cohen, founder of hosting company WPEngine, said in a presentation at The Business of Software that including a video on the home page of his business did not yield any significant lift over the […]