How to Measure Whales

"$10.00 LTV am I right?"
“$10.00 LTV am I right?”

You’ve soft launched your game, done a UA push, and a string of hope appears. Against all odds, a dominant cohorted ARPU curve emerges! Is this this an anomaly or have you caught a whale?

The first way to examine this is to perform cointegration tests between the cohorted ARPU curves, testing for stastistical significance. It may be true the difference in the curves are real, but that doesn’t answer if you’ve caught a whale.

In 1905, Michael Lorenz developed a method for measuring relative inequality between nations known as the Lorenz curve.

Just keep saying what % of the population owns what % of the wealth and it'll make sense.
Just keep saying what % of the population owns what % of the wealth and it’ll make sense.

The F2P application is to define wealth as revenue (either on a daily or game level) and players as the population in the context of free to play games. By measuring how bent inwards a cohorted Lorenz curve is relative to other cohorted Lorenz curves we can measure the ‘whali-ness’™ of different cohorts. Even better is how this reduces to a single metric – the gini coefficient. A gini coefficient of zero indicates a perfectly equal distribution of income, 10% of the population owns 10% of the wealth, 20% of the population owns 20% of the wealth and so on and so forth. A gini coefficient of 1 is the exact opposite – a single person owns 100% of the wealth.

This translates to what % of players are responsible what % of the revenue. Measuring gini coefficients across games rather than cohorts gives more insight into how a particular game monetizes – whether it’d be whale, dolphin, or minow driven.

Actionable insights might include how effective introducing ads could be. A high gini coefficient (very few players are responsible for revenue) might mean there’s a more fertile base to monetize on.

The main insight, however, is further understanding. It’s clear that success can come about in drastically different ways in free to play games, the gini coefficient is simple way to measure that.

Get more life out of your Lifetime Value Model! A discussion of methods.

Customer-Lifetime-Value

Predicting the average cumulative spending behavior or Lifetime Value (LTV) for players is incredibly valuable. Being able to do so helps figure out what to spend on User Acquisition (UA). If a cohort of players has an LTV of $1.90 and took $1 to acquire then we’ve made money! This helps evaluate how effective particular channels of advertising are as we’d expect different cohorts of players to have different values. Someone acquired via Facebook may be worth more then some acquired via Adcolony.

But wait there’s more!

My argument in this post is that LTV has great deal of value outside of marketing. In fact, LTV might have parts more valuable then the whole. How to predict LTV can adopt numerous approaches and each approach has associated benefits. Remember, there doesn’t have to be just one LTV model!

Consider four requirements we’d want out of an LTV model:

1. Accuracy

LTV predicted should be the LTV realized. Figuring out upward and downward bias in your coefficients is important here. This gives insight into the maximum or the minimum  to spend on UA depending on the direction you suspect your coefficients are biased towards.1

2. Portability

Creating models is labor intensive and even more so when doing so for multiple games. There are particular LTV models that sweep this aside called Pareto/Negative Binomial Distribution Models (NBD). Since they’re based only on the # of transactions as well as transaction recency they don’t require game specific information. This means you can apply them anywhere!

3. Interpretability

This one’s big and perhaps the most overlooked. Consider the Linear * Survival Analysis model approach to LTV. The first part is to predict when a particular player will churn. By including variables like rank, frustration rate (attempts on particular level), or social engagement we gain insight in what’s retaining players. This type of information is incredibly valuable.

  1. Scalability

If it’s F2P then there are going to hundreds of thousands to millions of players (you hope). I’ve seen some LTV approaches that would take eons of time to apply to a player pool of this size, our LTV should scale easily.

So how do the different approaches stack against one another?

Accuracy Portability Interpretability Scalability
Pareto/NBD2  / x x
ARPDAU * Retention3 x x
Linear * Survival Analysis4 x x x
Wooga + Excel5 x
Hazard Model6 x x x

Parteo/NBD is great, but it’s hard to incorporate a spend feature (it just predicts # of transactions).7 A small standard deviation in transaction value gives this model a great deal of value and something to benchmark against. This model also makes sense if data science labor is few and far in between.

ARPDAU * Retention is probably the approach you’re using; it’s a great starter LTV. If marketing/player behavior becomes more important, the gains to scale from a approach beyond this start to make more sense.

Wooga + Excel just doesn’t scale which kills its viability, but it’s conceptually useful to understand.

Linear * Survival Analysis  gives a great deal of interpretability that also sub-predicts customer churn time. This means testing whether the purchase of a particular item or mode increases churn time is done within the model. The interpretability of linear models also means it’s easy to see different LTV values for variables like country or device.

There are many, many different approaches beyond what’s been laid out here. Don’t settle on using just one model, each has costs and benefits that shouldn’t be ignored.