How to make sense of big data.
At BoS Europe 2015, Vince draws on 20 years of experience solving complex data problems. As founder of Eurobios, Head of Data Analytics at Ocado and most recently, Chief Scientist at King.com (Candy Crush etc) Vince has built and runs a team of over 100 data scientists managing over 160 Hadoop Nodes – Big Data by any measure.
He talks about how you can use data to turn knowledge into real, practical benefits to businesses, that can be implemented next week, not next year.
Find out more about BoS
Get details about our next conference, subscribe to our newsletter, and watch more of the great BoS Talks you hear so much about.
Vince Darley, King Games: So it’s very nice for me to be back in Cambridge, I’ve hardly returned since my undergraduate years, far too long ago. So it brings back lots of good memories.
What I want to do today is take you through a little bit about our approach at King to the data we get, how we try and make sense of it, and some of the really kind of interesting challenges and problems with that.
So maybe just to set the scene, bring you into our world a little bit, quick show of hands, who’s played Candy Crush Saga? About three quarters of you, that’s great, thank you. So Candy Crush Saga is our biggest and most famous game, there’s a whole host of others on all the app stores these days.
We actually have a pretty long history. The company was set up 12 years ago, back in 2003, and for the first eight or nine years of its existence we were a gaming portal on the internet, where people would and come play our games, at one point we were in the Alexa Top 100 sites. And that portal still exists today, but it’s now a rounding arrow on our business, it’s still important for a few reasons that I might get into. But we transitioned the business when Facebook’s platform exploded and became a real kind of gaming platform in its own right. We started to move some of our games from our own portal where they were played very competitively and we changed the way in which they were played and moved them across onto Facebook and people played them in a much more casual more collaborative way, still with some small elements of competition. And that, after a couple of tries with Bubble Saga, was our first game there, which it started to do amazingly well, and then we released something called Bubble Witch Saga, which did ten times better and we were really excited, and then not long after that we released Candy Crush Saga on Facebook and that did ten times better still and then when we moved that across to mobile, at the end of 2012, that really just took over the world.
So across all of these games, we’ve now got a network of 364 million monthly players, so it’s an incredible scale, about 100, just under 160 million people playing each day. And just to give you also a little bit of a sense, the game’s industry is a, is just a very, very interesting industry from a financial point of view as well, depending exactly who you ask, it’s bigger than the movie industry and the music industry combined in terms of annual revenues, so crazy, crazy scale. And more and more of that is moving away from the traditional consoles that people have in their homes towards games that are on mobile and tablet. And so we are a small part of that.
Also what’s really interesting, this is some data from a report released by BCG just a few months back.
We seem to hold the record for reaching 100 million people in the shortest possible period of time. So one year, three months it took Candy Crush Saga to get to 100 million people, and I’m sure someone’ll beat this soon enough, so we won’t have the record for very long.
So you see there’s a massive transformation happening in all this area, and so it’s been very exciting for me to be a part of it. And the particular bit that I love is the, in some sense, the kind of scientific search for truth in the data, trying to understand what’s going on with our players and our games, what can we learn from that, what should we change in the game, so it’s really, really exciting.
Let me just back up for a moment and give you a sense of how we approach these things, cause you know, I love data. I spend all my time dealing with data, but that only gives you half of the picture, and it’s really, really important not to forget the other half. So, just to get a three dimensional view of the world you need two eyes, to have binocular vision, the approach that we have at King, is you know, we need the data and the metrics, and all those sorts of things, and we look at retention and engagement and monetisation and conversion and virality and all these sorts of things, very data driven perspective, but at the same time we need to look more at the arts perspective and understand peoples’ behaviours and their motivations and frustrations and fun and challenge and all of those sorts of things, because those are the things that really drive people to play the games and to enjoy them and to stick with them for a long time. And so, the view we have of data is that data is the way in which we can bridge that scientific world of metrics and that human world of behaviours and bring them together in a deeper understanding.
So, I’d like to take you through a first simple example. This is Candy Crush level 65. It’s fairly infamous level, if you Google Candy Crush 65 you get a few million hits, there’s lots of people writing about it. I’d just like to read you a short quote from one of our fan pages by a lady who is playing on this level for quite some time. So she writes, “After two months on 65, I finally cleared them all. So excited, I screamed at my hubby to come see. Then it said I failed. I never noticed besides clearing the board you needed at least 150 thousand points, too. Since the game ended, I’ve no idea how close I was to that total. I cried myself to sleep. It’s been another week and I’m still not off level 65. Ugh, so frustrating.” So, people feel pretty strongly about these things. Other fans expressed their feelings a little bit differently.
So, it turns out that Candy Crush level 65 was, and still is, one of the hard levels in the game.
And the way that we measure level difficulty is simply in the pass rate, or the average number of attempts that players make on a level before they get through or eventually give up on the game. And so an obvious question is, what is the average number of attempts on this level? So that turns out to be around 130. And that’s an average, and you know one of the lessons that I’ll, I think we picked up a bit on yesterday in Des’s talk and I’ll certainly dive into more today, is that averages don’t necessarily tell you that much. And so if you imagine there’s a distribution around that average, and these distributions tend to be quite long-tailed, that means there’s very large numbers of people making300, 400 attempts on a single level, which is very impressive dedication.
So the other obvious metric when we looked at dived into the data a bit, and looked at this particular level, was to look at well, what’s the churn rate, how many customers are we actually losing on this level? So that turned out to be 50%. So 50% of the people who started, who got to level 65 and started playing on it eventually just gave up the game on that level. So that was alarmingly high. So the obvious question here was, well should we make this level easier? And from the numbers I’ve given you so far, it seems obvious we should make it easier, however, there’s also some financial picture in there, and it turned out that this level was a very strong driver of revenue within the game. And a strong driver of conversion, which for us is people spending money for the very first time. So it wasn’t quite so obvious.
And so the way we approached things is to run experiments
AB tests, in the game, so we run experiments, gave different versions of this level for different players for a short period of time, and observe what happens. And so sure enough it was pretty clear from that that indeed the correct thing to do was to make it easier. And the primary reason for that was the long-term effect, that of course, if you make it easier, well the revenues from this particular level will drop, however the long-term effect of people sticking into all those levels beyond level 65, and in fact Candy Crush is now more than 1000 levels today, the revenues and other benefits of people progressing later into the game far outweigh whatever loss, reduction in revenue there is on this particular level.
And so, it’s really these five areas of metrics that we care about here, so it’s worthwhile considering this level 65 example in this five-pointed perspective. So from a retention point of view obviously we make it easier, more people are going to get through. Obviously there’s a question mark as to whether they’re going to continue into the future of the game as well as those people who were getting through when it was hard, and so these numbers are, things are often, you often see surprising behaviours emerging when you change quite simple things in the game. But it turns out retention is dramatically improved if we make it a bit easier.
is an interesting one because actually, as you saw with that lady on our fan page, for many players, they get super engaged dealing with these difficult challenges in the game and then when they finally get past them, it’s super exciting and that really commits them to the future of the game. So there’s a danger if you make things a bit too easy, a bit too kind of blah, there’s no challenge, and peoples’ engagement levels drop over time. But it turned out in this case, that you know, it’s not like we made it, took it from here to here, we just reduced it a little bit, and that had very little effect on engagement.
as I said in the short-term obviously worse, but the long-term is what we care about and that’s better. Conversion, again, is ambiguous things there, you’re gonna lose out on short-term conversion but it turns out the long-term conversion is actually pretty healthy. And virality is a very interesting area as well, where, obviously these sorts of hard levels, as I mentioned, there’s millions of hits on Google, people writing about this level. Virality in the short-term might be impacted, but in the long-run having more people active in our network of players if far more valuable from a social interaction point of view. So most of these metrics were very positive.
So that’s a short example of how we approach these sorts of things. And I want to now dive into a bit more detail, in various angles of this, and maybe try and connect it a bit with some of the world of software that some of you might be dealing with.
So what I’d like to talk about now is the early experience in the game. So a big problem with software of all kinds is how to teach people how to use it. People don’t like manuals, tutorials kind of suck, how does one get past this problem? And we have the same problem with our games, it’s not immediately obvious when you pick up a game how you’re gonna play. The games get progressively more, there’s more and more features get added as you progress, and so those need to be somehow introduced in a sensible way. And we have no tutorials of any kind in our game, so the idea is we need to design them so that you learn how to play the game as you go along. And there’s obviously a question as to how well people do learn how to play the game, and I’ll come to that later on.
So this particular example now is when we released, when we designed in play testing the sister title to Candy Crush, so Candy Crush Soda Saga, that we released late last year. So for all of last year we were doing lots and lots of work on that game. And obviously there’s a whole exciting story about how do you build something that is kind of familiar to existing Candy Crush Saga players but is sufficiently different that they’re gonna enjoy playing it and view it as a different experience. There’s lots of good stuff there. But once we cracked that, which took quite some time, then it’s all about the levels. And in particular, the sorts of early levels.
Here’s the first 60 levels of that game. We tend to view the first 20ish levels, give or take, as the on-boarding experience. That’s something that an engaged player might get through in their first day of playing, most people will take a bit longer than that. The question is how, at what stage should you introduce complications in there, so that they understand the barriers they gotta get through, the strategies they need to develop, in order to learn the game. So we experiment lots and lots and lots with this. And so we have in this particular case, we had a couple of game designers who were focusing on level design and we had a couple of data scientists. Those four people worked together very, very closely, over a six month period, designing these levels, iterating, testing, rebuilding new sequences of the first 20, 30 levels, testing them on players, and just lots and lots of iteration.
And that was all proceeding pretty well, and then quite late on in the process, someone had the bright idea of maybe testing some more difficult things. So there’s a question of all of our games have kind of spikes in difficulty, level 65 that we just saw is one such spike, but the question is, how much should those be spikes and how much should the general difficulty evolve over time. And so the particular on-boarding sequence had a few such spikes but the general difficulty was actually pretty low, although comparable with most of our games. And so someone had the bright idea of, let’s test some things that are harder. And so that team of four people created some new level sequences with some extra difficulties in them, and put them live for various populations of a small player base, and started to see, not too surprisingly, that the short-term retention, so the number we tend to look at there is second day retention, so people who installed the game today, how many of them come back tomorrow, that started to go down. That’s kind of a bad sign. All these people who want to play a game are not coming back to play it.
However, we kept this test running, and what we saw actually was that when you looked at one week, two week retention, those numbers were actually higher for these difficult, more difficult progressions. And that’s kind of surprising. So what it meant was that when the game was easier, more people got through to, levels two, three, four, five, six, seven, eight, nine, ten, and came back the next day, they were kind of having fun, but there wasn’t enough depth to the game, and they lost interest within the next week or two, or at least many of them did. Whereas by making the game harder, we frightened off more people in those first few days, but actually, a week or two down the line, there was a much larger, more engaged player base, quite significantly larger. And so that’s one of those weird, kind of like, counter-intuitive things.
What is your audience?
Our audience is those people that are going to be playing this game for weeks and months and years on end, and we shouldn’t mistake those people for people who’re going to play it just for a few days. Obviously it’s very hard to distinguish between them, but we need to design the game in a way that is suitable for those people who are going to be devoting many hours to the game over many months. And so that’s something to think about in the software industry, is make sure that your on-boarding, your tutorial and everything else, whatever it may be, is very suited to the people who you’re hoping to stick with the game, or will stick with your software over the long-run, and don’t get a distorted view of what other people might happen to be using your software, because they’ll give you just the wrong picture.
So, my next topic, I want to digress for a moment and just talk about the data for a little bit, and how we handle the data. So we get a fairly small amounts of data from lots and lots of people all over the world, and for those of you who care about systems and things, and this is maybe a case lesson in how to evolve ones’ systems over time. So just to set the scene, this is time, this is whatever four or so years ago, four and a half years ago, this is the beginning of this year, and this is how much data we pull into our data warehouse every day.
And it’s now very close to a terabyte of compressed data every day from all the people playing our games. So it’s quite large volumes.
Now there’s obviously a few things to point out here, so as with any company dealing with, I mean this is still quite large amounts of data in here I should say, at this stage we use a tool called Qlikview, which is not a data warehouse at all, it’s a business intelligence reporting tool, but we abused it because it was nice for reporting purposes, and actually it could handle quite a lot of data, so we were like oh we don’t need a data warehouse, we’ll just shove all the data in there, and that worked pretty well. And then at one point it was just bulging at the seams and giving at the ghost.
So we were like alright ok, let’s try this other thing, an Infobright database, it was quite a nice thing, it needed a slightly bigger server to run it on but that worked pretty well for a short period of time. And then around here obviously things started to really take off, a lot of things happened here, one of course that we launched Candy Crush around that time, and the other is that I joined King around that time. So I’ll let you work out the correlation causation conundrum there. So we started to experiment with Hadoop, put in a small Hadoop cluster, actually worked very well for us, very cost-efficient way to store very large amounts of data, and we’ve been expanding that ever since, as you can see. In fact, now we’re just about almost, like this week or next week we’re going to be up to about 250 nodes in that cluster.
And that’s working very well, however, another thing that’s worth pointing out is that there really isn’t a one size fits all in this space. There’s a need to organise the data in good ways. It’s very usual that the data you pull in from customers is not structured in a super fantastic way for actually understanding what’s going on. So this is also the raw data. It’s a bit of a kind of mess. And so we have a large data warehouse team who organises data and structure it and tame it, rather like these escaping animals need taming. And the key to that is giving you a nice structure to the data which is the line where the business questions you’re trying to ask. And when we try and ask the interesting business questions that we want to ask about say, level 65 or whatever, of this raw data, we have to write incredibly complicated SQL queries, very hard to understand, easy to make mistakes in, they take forever to run. Really cumbersome process.
Whereas if you structure your data in a way that makes sense for the business problems that you’re trying to ask, which say in this case might be structuring it around a player’s progression through the game, at what stage did they attempt what level, and progressed, succeed, failed, that kind of stuff, then you can write very simple queries to get the insights that you really need. And so that effort needed, to go from raw data to structured, nice, well-organised data, is massively worth the effort that you put into it.
I’ve already mentioned the amount of data that we pull in each day. The total data in our data warehouse is two and a half petabytes now, and we get, the actual number of events in the players is around 14 billion events. And the biggest tables in our database are now more than a trillion records. We recently hit the all-time milestone of more than a trillion levels played in our games by players world-wide, and so that obviously equates more or less directly to having a large table with a trillion or so rows in it.
So that causes some difficulties, and I think the key point that I think is worth making is that what we found over time is that we need different tools for different purposes. You know for us; Hadoop is great for storing lots of data very cheaply. It’s not all that great for manipulating the data and organising it in a nice structured way. So we actually use a separate system, EXASOL, it’s a fast analytics database, much more expensive to store lots of data in it, so we don’t store lots, but far more speedy at processing the data. So we move a bit of data across into EXASOL, process it very fast, and put the data back in Hadoop, and that works very well for us. And we also have other tools in the stack, we use Kafka and Cassandra for certain amount of real-time processing of data.
Learn how great SaaS & software companies are run
We produce exceptional conferences & content that will help you build better products & companies.
Join our friendly list for event updates, ideas & inspiration.
And another thing to think about, perhaps not when you’re a small start-up, but at some stage, particularly, now we’re a public company, you have very different uses for data. Your finance side of the business wants total governed, reliable, accurate stuff, and that’s great for them, but that puts so many complications and procedures and processes around what you do, that you need something else for the people who actually want to just get insight from the data. And you maybe need something else for people who need a real-time view of the data, whether it’s people or systems, in our case we have various systems that we want to use that need a much more real-time view. And so what I think is a good lesson to learn is you’re not going to be able to put one thing in and have it serve all of these different types of requirements in a good way. And so we’ve learned over time that we need a variety of systems in there, and we’re continuing to explore new ones.
I want to move to the next, let’s move away from how we store the data and now back to some of the insights we get from it. And I think one of the key things that we’ve learned, which in some sense is obvious, but games are about fun. And that means all those wonderful metrics I had up there earlier of retention and engagement and conversion and monetisation and virality, those are all great, they’re things that we measure in various ways, report on every day. I get emails on my phone each morning with how our metrics are doing, but those things don’t measure fun.
And fun kind of trumps everything.
And so to give you an example of this, sometimes you have an opportunity to run experiments due to just happenstance. And we had an opportunity of that kind, because the code base for Candy Crush on the Facebook platform, which is all in Flash, is a different code base to the one used for the mobile apps on iOS and Android, which are largely written in C++. So there’s two different code bases. As is usual when you write things two times, there are subtle differences in how things happen. In our case, there were subtle differences in the mechanics in how the candies fell down and under what circumstances they might turn into one of these nice power-up candies. And we noticed that in the metrics, that certain levels were harder or easier in one version of the game or the other. And eventually we decided that it would be worthwhile to just digging into the numbers, it looked like the mechanics on the mobile side, which produced a bit more of these things were the metrics looked better.
And so we ran an experiment of making some small changes to the Flash, to the Facebook platform game, to bring it a bit closer to the mobile one. And so that meant that more of these sorts of candies just happened to appear when stuff fell down, just kind of by chance, really, and the result was that levels became easier. Because you just get more of these things which helped you with your levels. Retention improved, which was interesting. And more surprising still, actually monetisation went up, as well. And so you might think from what I was saying earlier about level 65, if you make, if the game gets easier, surely monetisation gets worse. Well, the end result really was that actually, having more of these sorts of things in the game just made the game more fun. And lots and lots of people are spending money in the games, not because they’re stuck at some really hard thing, there’s some blocker, it’s just they’re having fun. They’re playing the game for however many minutes it is every day, and when you spend a bit of money you get more good stuff happening in the game, and so they were just enjoying themselves more.
So even though many of the metrics looked like things should get worse from a financial perspective, actually they got better. And so the fun of the game actually trumps all of those metrics, but of course that’s hard to plan in advance. But it’s a really important thing, we’ve encountered that in a few other situations, as well. That really, if you can just find a way to add something extra to the game, that you shouldn’t worry too much about what it does to your metrics. If it’s something that really just makes the whole experience better, that’s gonna trump everything else.
So, we started talking about level difficulty
So there’s a nice question that we’ve wrestled with for a while and have finally come up with an answer to, which is, how hard is Candy Crush? How much skill is required to play the game, and our other games, too? We’ve learned some really valuable lessons from this. So on the one hand you’ve got roulette here, that’s obviously basically a skill-free game, and then on the right-hand side here you’ve got chess, no luck involved at all, deterministic, strategic game. And, well Candy Crush presumably somewhere in between. I guess you could argue that it’s entirely luck, could argue there’s a bit of skill in there, and the question is how much? Is it closer to there or closer to there? And what can we as a business actually do, learn from that, and do to make this or other games better?
So the obvious thing to look at to try and begin to delve into how hard is this game is the pass rate that I talked about earlier. So here’s an example of a pass rate through time. This is a fairly hard level actually in a different game, from Farm Heroes. As you can see, the pass rate is just below 1%, so the average number of attempts is more than 100 on this level. Not quite as bad as that Candy Crush example, but it’s a very hard level. And the first thing you can see is, well, this is actually going down over time. Seems a bit weird, surely the pass rate should be pretty constant. Anyone care to hazard a guess as to why pass rate might go down over time? People are trying more and more? And why would that make a pass rate lower, though? But why aren’t they succeeding more quickly, why is the success rate somehow getting worse? Different profiles? Yup, alright, so we’ve got lots of different ingredients there.
So one key, there’s actually several explanations for this which I’m going to dive into. One key thing is, it turns out, that the profile of players changed over time, so late adopters are less skilful at the game. And this is actually really very important in many ways. If I just go back to that example of those early levels in Candy Crush Soda Saga, we put so much effort in and actually made it harder. One of the reasons why it was useful to make that harder was because actually the kinds of players who were going into that game were obviously already familiar with Candy Crush, the majority of them, and so they were, they already had a certain amount of expertise in how to play that kind of game. And so they could start at a kind of higher baseline of difficulty. However, that again needed to be diminished over time, so we started with the levels harder cause that’s what players wanted, and then over time we’ve made those levels a little bit easier because the later players into the game are less skilful, and if you give them really hard barriers, there’s just no way they’re going to stick with the game at all.
So this is one part to the story, but this doesn’t explain the whole thing about that previous graph. So the other thing is that this, a problem with, what this shows us obviously is that looking at the population as a whole is a bad idea. If there’s a profile change through time, then this thing, which is just looking at my whole population, what’s the pass rate on this particular day. That’s a blend of all sorts of people who install the game at all sorts of moments in time, and that blend is not a stunningly helpful thing.
So now let’s remove that blend from the picture and just focus on one cohort. So we look at a group of people who install the game at a particular week, or whatever, and we look at them at a particular level, and now we suddenly see two trends. Something different is happening. So again, this is the pass rate which, I should say, if this level was, you’d basically expect this pass rate to be pretty flat, kind of on all else being equal. But we see this pass rate improving after a few attempts and then there’s a general decline. And this is for the same cohort, so the profile of players is hopefully more constant. But clearly something’s not happening that we understand it.
It turns out that the first up slope is actually fairly easy to explain, this one is that each level has its particular trick, or knack, or strategy that you need to do to crack that level, and when you first try it, it’s not clear to you, oh should I try and do this first or that first, how do I get what is the strategy? And so it takes you a few attempts before you actually learn what the right approach is to dealing with that level, and so this is just a learning curve. And that’s actually part of what makes the games really fun. If the games have lots and lots of learning curves, people love learning, and so we can use that knowledge in our games to look at, hey what levels have good learning curves, what ones have like no learning curve at all? Maybe we should change those ones, people are probably not enjoying them so much?
But we still haven’t explained this down slope here. So that turns out to be the fact that even though we picked one group of players, that actually there’s still diversity amongst that group and their skill in the game. And so what happens here is, here’s our whole population of players, and as they make their attempts, the more skilled players succeed at the level, therefore their data is doesn’t show up here, and you’re left with a population of ever less and less and less skilful players. So obviously the measurement that you make of their pass rate is going to be worse. So the people who are still there, plugging away after 50 or 100 attempts are less skilled than the ones who were there beforehand. So what you need to do is actually slice and dice your population more.
So let’s slice them by skill. So if we do that, now we suddenly, we’ve segmented our players by skill, here’s the low-skilled players. Their pass rate is pretty constant through time after a short learning curve. Take these ones here, these are medium skilled players, again, pretty constant through time, and the higher skilled players here. And you can see there’s actually a huge difference, so there’s a factor of two or three difference in the number of attempts that a very skilled player will make versus the less skilled ones. So lots and lots of good stuff in there, and I suppose that’s a step back to one of the lessons that we need to learn from this, is that looking at populations as a whole and looking at averages, mostly pretty unhelpful. You need to drill down to the cohort and segment and understand your players. I think the point that Des made yesterday in the morning that people will, your users of your software will spend most of their time on some particular feature. They don’t spread their time across all the features, their time is concentrated, there’s always massive imbalances, and so you need to look at separately at different populations of people and what they’re doing in your software, in your game. And that’s super important. And then the other is just that’s an enormous diversity there of skill and we have to decide well, what can we, what do we want to do about that? And there’s some things that we’d like to do and others that we feel wouldn’t be appreciated by the players and that we shouldn’t do.
So that brings me to this general point about diversity in general, and this huge diversity, whether it’s in a massive population of players like we have or even much smaller ones. In all of the businesses I’ve been through, the amount of variation is absolutely huge in all of the customer bases that I’ve looked at and it’s just so easy to draw the wrong conclusions unless you dig into understanding those different players, different types of individuals and segmenting. And so we spend a lot of time doing that. And then all of this shifts through time as well, which means it’s just a constant battle to revisit it.
And a good example of that, actually, another thing that came out of all of the Candy Crush Soda Saga play-testing and then launch was that during the play-test we, our typical thing that we’ll do is we’ll cross-promote a few players from our existing games to a game that is in play-test, to get a few people into it, it’s a very easy way, you get a few thousand people into a game very quickly and at no cost. And we did that, but, that self-selected group of individuals who chose to click on this thing, hey, new game from King, do you want to install it, that self-selected group of individuals were not average in any way at all. And so, and that’s, we know these effects, but in the Soda case it was just totally extreme. And so all of the metrics that we observed in the early play-test were just kind of off the charts. Fortunately, they were so far off the charts that we realised we were just kind of fooling ourselves and we needed to get some other group of players into the game so we could actually measure how a more normal group of players would actually, their kind of retention and engagement and monetisation. And so actually what we had to do was we just did some normal paid digital advertising in a couple of markets just to get people who are the normal kind of reluctant people who are, occasionally will click on an advert eventually, and so you get, a much more normal population into the games, and that gave us figures that were much more helpful in understanding how well is this doing compared to our expectations compared to our other titles and so on. And so those kind of biases are just huge.
So the last topic I wanted to dive into quickly before opening things up for questions is a game comes back to the data and how you make sense of it, and it’s a little bit more of a technical thing, but we run lots of experiments in our game, lots of AB tests, that’s really the primary way in which you understand something new.
So here’s an AB test. Is the red pill or the blue pill better? So here’s our population of players we’ll slice them up, A and B, 50% on the left gets experience A, 50% on the right get experience B, and now we measure lots of stuff. And let’s say we measure that oh, the revenue per install is 4% higher on the right-hand side. That sounds fantastic, experience B must be much better, this is great. However, let’s say we dig in a bit, let’s see where’s that extra 4% coming from? And then we notice that there are these two people coming from population B. And then we think, well, hey, maybe that 4% isn’t that good, actually, since none of those people on the left-hand side. Our numbers which, in certainly in the games industry and also many of the other things I’ve been involved in, 80% of your revenue comes from 20% of the players, all this stuff. Every metric we look at is massively skewed. And so it’s quite easy, even with player bases of our crazy hundreds of millions of people, it’s quite easy for you to get the wrong conclusion because your samples are biased. And so you have to find out a way to deal with that.
And I won’t really go into detail of how we do that, but I think that an obvious thing that’s worth remembering is to understand what your underlying distributions look like. And there’s two typical kinds of cases. So this case in the middle with all the solid lines is a typical, it’s a normal distribution, but it’s pretty close to a bell curve. And all of your different definitions of an average, a mode, a median, a mean, they’re all pretty similar, and it actually makes sense to talk about an average. Some significant fraction of my population is within a small margin of error of the average. So when I talk about the average, I’m actually talking about lots and lots of people, so that’s something I can reason about well.
Now if I look at this very skewed example here in the dashed line, the mode, the median, the mean are miles apart, and whichever one of them I happen to decide I want to use, and then I look at, some error bound around them. If I’m reasoning about this, I’m talking about some teeny population of players, so the, any average you look at is not representative of your population as a whole. And so if you start to make conclusions from that average, you are very likely to make conclusions which are not correct for the population as a whole, and you’ll make the wrong decisions. So that’s an inevitable problem of all, almost all real-world data with real people. And you just gotta find ways to identify it and then solve it. And I think at that point I will wrap up. Yes, thank you.
Audience Question: I’m just wondering if you can tell us a little bit about your approach to collecting data. Do you just collect all the things or, what have, how have you decided how you’re actually going to instrument?
Vince Darley, King Games: So we actually collect a surprisingly small amount of data from the players. The vast majority of our data is people just playing levels, so it’s this player with you know this ID on this device at this time-stamp attempted level 65 and they succeeded, they failed, they got this many points, this is what happened in the level. So it’s those single-lined items that make up the vast majority of our data. We obviously collect data on the transactions that happen, and also the transaction success and failure and all that stuff. We collect a bit on the social interactions that happen within the game, through Facebook or through other platforms, and that’s pretty much it. So there’s more that we might like to gather, and in fact, there’s more we have access to, at least for players who Facebook connect in our games, and through Facebook’s APIs we can grab other stuff, but at least so far we’re kind of busy enough just dealing with that. But I’m sure over time there’ll be gaps that we’ll want to try to fill in in one way or another, to learn even more about how we should be treating players better.
Audience Question: Hi, that was really, really interesting. Does a guy that’s skinny and does ultra-marathons play Candy Crush?
Vince Darley, King Games: I do, I’m actually up, beyond level 1000, so. It’s a great game, it’s an astonishing thing how it’s taken over the world. So it’s a great privilege to be part of.
Audience Question: So you talked about what you measure and the importance of fun. How do you quantify fun, I mean, how do you measure that? Is it a function of engagement or how’re you looking at that to kind of maximise it?
Vince Darley, King Games: So there’s, we have put a little bit of effort from time to time into trying to see can we pull together different metrics and just kind of come up with one number that is the “fun,” and we failed to do that. So it’s a mixture of things there’s a bit of retention, there’s a bit of engagement, there’s a bit of how much do people, are people willing to, how many attempts are people willing to put in a level before they start dropping out in large numbers. There’s lots of different individual things, but unfortunately there is no “fun” metric that we’ve managed to put our hands on yet.
Audience Question: Ok so have you tested changing the game for the individual player based on the spending behaviour?
Vince Darley, King Games: We made a decision not to do that. And so there’s lots of things that we will change in the game, and obviously for experimentation purposes we’ll change all sorts of stuff to try and understand, but actually we view that having, say, a level, the difficulty of a level be something that is dynamically adjusted for different players, particularly based upon their spend would just not be acceptable to the players, and so we shouldn’t do it. And you know, it’s the age of the internet, people will discover anything that you might do, so we should make sure that we do stuff that people will actually think is good.
Audience Question: I’m afraid my question was similar. My son unfortunately has spent probably 10,000 hours in another game, but when it came with a new release, the challenge was just not high enough, and he said he had passed the levels in but a couple of hours, and he was no longer considering ever playing the game because it hadn’t adapted to his skills. But you mention here that you want to keep it democratic so that everyone has the same fair chance, what about these players that actually get bored?
Vince Darley, King Games: Yeah, that’s a really good question. So we, well today we don’t do anything about that, but we certainly have lots of ideas for stuff that we might want to do so maybe in those first 10, 20, 30 levels you should identify, here’s a player who’s just finding it too easy and you know, I should, if I don’t want to change the game itself, maybe I should just give him an opportunity to jump ahead to where there are harder levels, that kind of thing. So there’s different ways we feel we could try and deal with that, but yeah today we haven’t tackled that problem yet. But it’s genuinely a real problem.
Audience Question: Hi, I was interested in your inference process. So you have your, for example, going to the number of attempt curve, you have the beginning and then you have, it slopes down and you can actually see, maybe infer, that it is a sample issue causing variance. Is that how you get there or you just try to imagine why that could be?
Vince Darley, King Games: In that specific case, I wasn’t the person very directly involved in it so I’m not sure how that inference process worked. But you’re right, yeah, you can see by the variance that there’s clearly some sampling issue on the extreme there. And I suppose the other thing we’ve just, we got a general mind-set that a lot of things are well-explained by having a diverse population, and so that when we see trends, those trends are very often just the sum of a number of distinct populations which are themselves fairly fixed. So it was probably a bit of both, but I don’t actually know the specifics.
Audience Question: I was wondering if, when you do your AB testing, whether people detect that they are being experimented on and whether you get any pushback from that?
Vince Darley, King Games: So on occasion, yes people do. I mean mostly the experiments run for fairly short periods of time, just a few weeks, that kind of thing, so there isn’t too much likelihood of people discovering. And certainly there are plenty of people who would like us to change our game in various ways. So there’s lots of forums out there, some of which we host, some of which are just out on the internet, of people saying hey, this level’s no good, change it please, I mean all this stuff. And there’s a general acceptance that the game you know, should change over time, however, one of the things that we’ve been doing quite a lot more of this year, which is very an interesting learning experience which I haven’t touched on, is having special events in the games. So hey, come this weekend and play, and if you can match 1000 red candies we’ll give you some special prize.
And so we’ve been experimenting with these sorts of things to give players not just the kind of progression goal, you know just keep going through the levels, but to give them something else, which is kind of aligned with progression but is slightly different. And so we’re learning a lot about what kinds of those events work and don’t work. But part of that, obviously, we need to run experiments, and those things are a bit more visible. There’s a special event, you know sometimes we might want to support it with information on our fan page and stuff like that. So there’s a fuzzy line to tread there. And I’m sure that some people are noticing that they’ve not been part of this weekend’s thing that their friend has, and I suppose what we’d like to do is just, once we understand what kinds of things really work is make sure that all of our players are experiencing and offered some of those kinds of events but perhaps not all of them, and that should maybe solve that problem.
Audience Question: Since you’ve taken over the world, I wondered if there was any countries that are particularly resistant, and, to your sweetness, and also any that just love it more than everyone else?
Vince Darley, King Games: You know it’s pretty universal, actually. The difficulties that there are in some markets are more to do with the platform fragmentation than anything else. So China, from an Android perspective, is complicated for everyone because there isn’t a one big Google Play Android store, there’s 200 separate stores. And so that’s, it’s more of a go-to market strategy that might need to be different in different countries depending upon what platforms exist there. But the actual game, we’ve found, is pretty universal. So yeah, we do have discussions, should we, you know obviously the game is localised from just a language point of view for most countries in the world, but should we go beyond that, and actually tailor some aspect of the game? And so far we’ve not seen a great need for that.
Audience Question: So a question about monetisation. Do you have a rough idea, I mean a good idea, probably, about your most profitable segments? You know, where does most of the revenue comes from, I’m assuming it’s a 20-80 rule.
Vince Darley, King Games: So there’s obviously a broad sway that players in all freemium games, who don’t spend any money, and then there’s the group that are spending some money, and within that there’s a great, great variation. And both, obviously within the players but also the kinds of games they like and where they are in those games. So it’s, there’s no, I wouldn’t say there’s a particular sweet spot even at the very end of these games, the players, last time we looked, the players who were in the last episode of Candy Crush, 75% of them had never spent any money, so it’s, there’s no particular narrow thing we need to aim for. It’s a bit more complicated.
Audience Question: Are you mining the data server correlations and are you just looking for correlations there, and if so how do you avoid the problem of obscure correlations in such a massive data set?
Vince Darley, King Games: So we, we’re mostly not mining the data set correlations, we kind of tend to trigger everything. I suppose there’s too many big important business questions that we have that need answers and so those things drive all the analysis that we do. And so there’s very, yeah I’d say there’s almost none just mining the data set.
I suppose the closest thing I can think of is when we do lots of experiments on the levels in the games, and so at any given time, I mean it varies, it has its peaks and troughs, but at any given time there might be half a dozen or a dozen or so, sometimes even more, levels in the game that we have experiments running on to change their difficulty. And then you look at all the metrics that come out of that and you want to make some decisions. You know, which one of these should I change, which should I not? And obviously if you’re running lots of experiments in parallel, you’re gonna expect some of those to show significant, you know using 0.005 p-value, but if you’re running 20 experiments, one of them are going to be true by randomness.
So the approach we’ve adopted there actually is just not to worry about that too much, and to take risk-adjusted decisions. So rather than looking at a binary outcome, which is to say how confident are we that this change has made this level better, and if that confidence is sufficiently high, then we’ll just take that decision, and if we’re making 20 of those decisions, and a few of them turn out to be wrong, then that’s totally fine.
On aggregate, the overall situation will be better. Cause the only other thing you can do, you need to run your experiments for longer and longer and longer, and that’s we prefer to spend that time on other stuff. So this approach, of taking a kind of risk-adjusted view of the outcome and experiment, instead of a binary outcome, is actually a very helpful way of doing something repetitive. Obviously sometimes we’re doing the experiment where you absolutely do need a black or white answer, then you need to be very, very careful.
Audience Question: And you’ve got so much data, I mean I have to run my test in weeks, I guess you do yours in 10 minutes?
Vince Darley, King Games: Well you know, surprisingly not, in many cases. So we, I mean it depends what you’re, what outcome you’re trying to change. If your goal is to modify say a click-through rate on some pop-up, then yes, you could run that super fast. But most of the time, our goal in these experiments is some longer-term goal. And in that case, you’ve got little choice but to run the experiment at least for quite, for a few weeks. So, yeah, so it’s surprising, even with the data we’ve got, that actually sometimes our data seems too small to us.
Mark Littlewood: So I’ve got a question. In the 60s, people said, some of the finest minds on the planet were focused on putting a man on the moon. Some of the finest minds on the planet now are optimising funding Candy Crush, you’ve got 80 data scientists working with you. People can walk out of computer science or maths, go into all sorts of ad-tech businesses, so they’re getting people to click on pop-ups. How do we turn that round so that we’re solving great problems, not revenue generating-problems, for corporations?
Vince Darley, King Games: The big question of the day. So I think that there are loads of big complicated problems, which are, which cut across lots and lots of industries in this whole big data space. And there’s lots of companies out there trying to tackle those in different ways. They’re obviously all kind of biased by their particular perspective. Google’s very search focused. Facebook has its hoards of very diverse data, trying to understand something very, very different. But I think those, I see a lot of really interesting stuff emerging from the boundaries of what happens there. And it’s good to see that a lot of these big companies are actually quite open about some of the things that they do. Perhaps only a few years after the fact, but you know, there’s a lot of this deep learning stuff that’s going on and that Google and others have dived into. It’s a surprising amount that’s actually publicly available from that. So in that sense there’s I think a lot of these big problems are being tackled.
Audience Question: With the five main KPIs, to what were those actually intentioned, cause obviously, so retention and conversion earlier were directly in competition, but to what do we, are those five, are they actually intentioned across that graph?
Vince Darley, King Games: There’s certainly quite a lot, and I would say those aren’t even really, those are five main KPI areas, but as we saw, retention is not one number, there’s second day retention and two week retention were actually in conflict. So, there’s lots and lots of tension between those. I suppose if you were to boil it all down to a, I guess the thing that King cares about, if you want to boil all that together is, super long-term customer lifetime value across all of our games. But that’s kind of hard to measure. And, but sometimes we try and do that, I mean so the effect, the value of a player in a game today, we try, do try to quantify what’s the potential future value of that player in other games. How likely are they, we measure how likely are they to install one more game, two more games, over time, and what revenue stream might we get from that in the future, how long will they be around. And everything that we’ve observed really tells us that that long-term stuff is super, super important, and you’ve gotta take account of it when you make your decisions.
Audience Question: So do you have any, do you guys track anything on your demographics or have different segments, like, commuters versus weekend players or Americans versus Brazilians, or anything like that?
Vince Darley, King Games: So, we do, I mean certainly we look at different countries separately from time to time, and we do see some differences between them. We do try and obviously understand the player, we don’t really have the data within our games to give us a very, except on the Facebook platform, to give us very much guidance on the demographics, but we do a fair amount of external surveying and things like that to understand what kinds of people are playing the games, male, female, what age range, all those kinds of things. So that’s, we do make good use of that, but it’s much, it’s less kind of raw data from the raw data upwards, and its more conceptual stuff that we do to just understand the world out there and our players. You know, and how much of their time are they spending in our games versus someone else’s games and what’s the opportunity there. So there’s lots of stuff there that we do try and understand.
Mark Littlewood: Cool. Vince, thank you very much, indeed.
Vince Darley, King Games: Thank you.
Find out more about BoS
Get details about our next conference, subscribe to our newsletter, and watch more of the great BoS Talks you hear so much about.
Vince draws on 20 years of experience solving complex data problems. As founder of Eurobios, Head of Data Analytics at Ocado and most recently, Chief Scientist at King.com (Candy Crush etc) Vince has built and runs a team of over 100 data scientists managing over 160 Hadoop Nodes – Big Data by any measure.
Learn how great SaaS & software companies are run
We produce exceptional conferences & content that will help you build better products & companies.
Join our friendly list for event updates, ideas & inspiration.