Predicting customer lifetime value using the Pareto/NBD model and GammaGamma model
In this blog post, I am going to show you how to combine the Pareto/NBD model (which predict the number of future transactions) with GammaGamma model (that model predicts the value of future transactions) to estimate the customer lifetime value.
Because in the previous article, I described the method of building the Pareto/NBD model, I am going to assume that we already have built that model. If this is not the case, please take a look at my previous blog post.
As in the Pareto/NBD article, we begin with a file containing a transaction log of a customer’s cohort.
1
2
3
4
5
6
7
import pandas as pd
import matplotlib.pyplot as plt
import lifetimes
data = pd.read_csv("data.csv", header = 0)
data['date'] = pd.to_datetime(data['date'])
data.head()
We must create a summary dataset which contains information about every customer. We can use the summary_data_from_transaction_data function to generate a summary data frame.
The result contains four columns:

recency — the time between the first and the last transaction

frequency — the number of purchases beyond the initial one

T — the time between the first purchase and the end of the calibration period

monetary value — the arithmetic mean of customer’s transactions in the calibration period
1
2
summary = lifetimes.utils.summary_data_from_transaction_data(data, 'cust', 'date', 'sales')
summary = summary.reset_index()
Note that, the GammaGamma model is based on the assumption that the number of transactions does not depend on their monetary value.
The frequency and monetary value are not correlated if the output is close to zero. Let’s look at the result.
1
summary[['monetary_value', 'frequency']].corr()
It seems that we can use the summary to estimate CLV.
GammaGamma model needs data to forecast CLV. If we have some customers who did not buy anything in the calibration period and their monetary value in the summary dataset is zero, we must remove them. After that we can build the model.
1
2
3
4
5
summary = summary[summary['monetary_value'] > 0]
from lifetimes import GammaGammaFitter
gg_model = GammaGammaFitter()
gg_model.fit(summary['frequency'], summary['monetary_value'])
Now, we can calculate customer lifetime value.
1
2
3
4
5
6
7
8
9
10
gg_model.customer_lifetime_value(
pareto_nbd_model,
summary['frequency'],
summary['recency'],
summary['T'],
summary['monetary_value'],
time=30, # days
freq = 'D', # days, because we used the default 'freq' for summary_data_from_transaction_data
discount_rate=0.003 # daily discount rate
)
Let’s have a look at the “freq” parameter. I set it to days because when I was generating the summary data frame, I did not specify I it at all. It means that the default value (days) were used.
If you change it, it is crucial to remember to set it in every function that needs such a parameter. Also, the “time” and “discount_rate” parameters must be adjusted accordingly.
Did you enjoy reading this article?
Would you like to learn more about software craft in data engineering and MLOps?
Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!
You may also like
Bartosz Mikulski
 Data/MLOps engineer by day
 DevRel/copywriter by night
 Python and data engineering trainer
 Conference speaker
 Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
 Twitter: @mikulskibartosz