Predicting customer lifetime value using the Pareto/NBD model and GammaGamma model
In this blog post, I am going to show you how to combine the Pareto/NBD model (which predict the number of future transactions) with GammaGamma model (that model predicts the value of future transactions) to estimate the customer lifetime value.
Because in the previous article, I described the method of building the Pareto/NBD model, I am going to assume that we already have built that model. If this is not the case, please take a look at my previous blog post.
As in the Pareto/NBD article, we begin with a file containing a transaction log of a customer’s cohort.
1
2
3
4
5
6
7
import pandas as pd
import matplotlib.pyplot as plt
import lifetimes
data = pd.read_csv("data.csv", header = 0)
data['date'] = pd.to_datetime(data['date'])
data.head()
We must create a summary dataset which contains information about every customer. We can use the summary_data_from_transaction_data function to generate a summary data frame.
The result contains four columns:

recency — the time between the first and the last transaction

frequency — the number of purchases beyond the initial one

T — the time between the first purchase and the end of the calibration period

monetary value — the arithmetic mean of customer’s transactions in the calibration period
1
2
summary = lifetimes.utils.summary_data_from_transaction_data(data, 'cust', 'date', 'sales')
summary = summary.reset_index()
Note that, the GammaGamma model is based on the assumption that the number of transactions does not depend on their monetary value.
The frequency and monetary value are not correlated if the output is close to zero. Let’s look at the result.
1
summary[['monetary_value', 'frequency']].corr()
It seems that we can use the summary to estimate CLV.
Do you want to show your product/service to 25000 data science enthusiasts every month? I am looking for companies which would like to become a partner of this blog.
Are you interested? Is your employer interested? Here are the details of the offer.
GammaGamma model needs data to forecast CLV. If we have some customers who did not buy anything in the calibration period and their monetary value in the summary dataset is zero, we must remove them. After that we can build the model.
1
2
3
4
5
summary = summary[summary['monetary_value'] > 0]
from lifetimes import GammaGammaFitter
gg_model = GammaGammaFitter()
gg_model.fit(summary['frequency'], summary['monetary_value'])
Now, we can calculate customer lifetime value.
1
2
3
4
5
6
7
8
9
10
gg_model.customer_lifetime_value(
pareto_nbd_model,
summary['frequency'],
summary['recency'],
summary['T'],
summary['monetary_value'],
time=30, # days
freq = 'D', # days, because we used the default 'freq' for summary_data_from_transaction_data
discount_rate=0.003 # daily discount rate
)
Let’s have a look at the “freq” parameter. I set it to days because when I was generating the summary data frame, I did not specify I it at all. It means that the default value (days) were used.
If you change it, it is crucial to remember to set it in every function that needs such a parameter. Also, the “time” and “discount_rate” parameters must be adjusted accordingly.
Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.
If you watch programming live streams, check out my YouTube channel.
You can also follow me on Twitter: @mikulskibartosz
If you want to hire me, send me a message on LinkedIn or Twitter.