Predicting customer lifetime value using the Pareto/NBD model and Gamma-Gamma model

Predicting customer lifetime value using the Pareto/NBD model and Gamma-Gamma model

In this blog post, I am going to show you how to combine the Pareto/NBD model (which predict the number of future transactions) with Gamma-Gamma model (that model predicts the value of future transactions) to estimate the customer lifetime value.

Because in the previous article, I described the method of building the Pareto/NBD model, I am going to assume that we already have built that model. If this is not the case, please take a look at my previous blog post.

As in the Pareto/NBD article, we begin with a file containing a transaction log of a customer’s cohort.

1
2
3
4
5
6
7
import pandas as pd
import matplotlib.pyplot as plt
import lifetimes

data = pd.read_csv("data.csv", header = 0)
data['date'] = pd.to_datetime(data['date'])
data.head()

We must create a summary dataset which contains information about every customer. We can use the summary_data_from_transaction_data function to generate a summary data frame.

The result contains four columns:

  • recency — the time between the first and the last transaction

  • frequency — the number of purchases beyond the initial one

  • T — the time between the first purchase and the end of the calibration period

  • monetary value — the arithmetic mean of customer’s transactions in the calibration period

1
2
summary = lifetimes.utils.summary_data_from_transaction_data(data, 'cust', 'date', 'sales')
summary = summary.reset_index()

Note that, the Gamma-Gamma model is based on the assumption that the number of transactions does not depend on their monetary value.
The frequency and monetary value are not correlated if the output is close to zero. Let’s look at the result.

1
summary[['monetary_value', 'frequency']].corr()

It seems that we can use the summary to estimate CLV.

Gamma-Gamma model needs data to forecast CLV. If we have some customers who did not buy anything in the calibration period and their monetary value in the summary dataset is zero, we must remove them. After that we can build the model.

1
2
3
4
5
summary = summary[summary['monetary_value'] > 0]
from lifetimes import GammaGammaFitter

gg_model = GammaGammaFitter()
gg_model.fit(summary['frequency'], summary['monetary_value'])

Now, we can calculate customer lifetime value.

1
2
3
4
5
6
7
8
9
10
gg_model.customer_lifetime_value(
    pareto_nbd_model,
    summary['frequency'],
    summary['recency'],
    summary['T'],
    summary['monetary_value'],
    time=30, # days
    freq = 'D', # days, because we used the default 'freq' for summary_data_from_transaction_data
    discount_rate=0.003 # daily discount rate
)

Let’s have a look at the “freq” parameter. I set it to days because when I was generating the summary data frame, I did not specify I it at all. It means that the default value (days) were used.

If you change it, it is crucial to remember to set it in every function that needs such a parameter. Also, the “time” and “discount_rate” parameters must be adjusted accordingly.


Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you watch programming live streams, check out my YouTube channel.
You can also follow me on Twitter: @mikulskibartosz

For business inquiries, send me a message on LinkedIn or Twitter.


Bartosz Mikulski
Bartosz Mikulski * data scientist / software engineer * conference speaker * organizer of School of A.I. meetups in Poznań * co-founder of Software Craftsmanship Poznan & Poznan Scala User Group