Predicting customer lifetime value using the Pareto/NBD model and Gamma-Gamma model
In this blog post, I am going to show you how to combine the Pareto/NBD model (which predict the number of future transactions) with Gamma-Gamma model (that model predicts the value of future transactions) to estimate the customer lifetime value.
Because in the previous article, I described the method of building the Pareto/NBD model, I am going to assume that we already have built that model. If this is not the case, please take a look at my previous blog post.
As in the Pareto/NBD article, we begin with a file containing a transaction log of a customer’s cohort.
1 2 3 4 5 6 7 import pandas as pd import matplotlib.pyplot as plt import lifetimes data = pd.read_csv("data.csv", header = 0) data['date'] = pd.to_datetime(data['date']) data.head()
We must create a summary dataset which contains information about every customer. We can use the summary_data_from_transaction_data function to generate a summary data frame.
The result contains four columns:
recency — the time between the first and the last transaction
frequency — the number of purchases beyond the initial one
T — the time between the first purchase and the end of the calibration period
monetary value — the arithmetic mean of customer’s transactions in the calibration period
1 2 summary = lifetimes.utils.summary_data_from_transaction_data(data, 'cust', 'date', 'sales') summary = summary.reset_index()
Note that, the Gamma-Gamma model is based on the assumption that the number of transactions does not depend on their monetary value.
The frequency and monetary value are not correlated if the output is close to zero. Let’s look at the result.
1 summary[['monetary_value', 'frequency']].corr()
It seems that we can use the summary to estimate CLV.
Gamma-Gamma model needs data to forecast CLV. If we have some customers who did not buy anything in the calibration period and their monetary value in the summary dataset is zero, we must remove them. After that we can build the model.
1 2 3 4 5 summary = summary[summary['monetary_value'] > 0] from lifetimes import GammaGammaFitter gg_model = GammaGammaFitter() gg_model.fit(summary['frequency'], summary['monetary_value'])
Now, we can calculate customer lifetime value.
1 2 3 4 5 6 7 8 9 10 gg_model.customer_lifetime_value( pareto_nbd_model, summary['frequency'], summary['recency'], summary['T'], summary['monetary_value'], time=30, # days freq = 'D', # days, because we used the default 'freq' for summary_data_from_transaction_data discount_rate=0.003 # daily discount rate )
Let’s have a look at the “freq” parameter. I set it to days because when I was generating the summary data frame, I did not specify I it at all. It means that the default value (days) were used.
If you change it, it is crucial to remember to set it in every function that needs such a parameter. Also, the “time” and “discount_rate” parameters must be adjusted accordingly.
Did you enjoy reading this article?
Would you like to learn more about software craft in data engineering and MLOps?
Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!
You may also like
- Data/MLOps engineer by day
- DevRel/copywriter by night
- Python and data engineering trainer
- Conference speaker
- Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
- Twitter: @mikulskibartosz