Predicting customer lifetime value using the Pareto/NBD model and Gamma-Gamma model

In this blog post, I am going to show you how to combine the Pareto/NBD model (which predict the number of future transactions) with Gamma-Gamma model (that model predicts the value of future transactions) to estimate the customer lifetime value.

Because in the previous article, I described the method of building the Pareto/NBD model, I am going to assume that we already have built that model. If this is not the case, please take a look at my previous blog post.

As in the Pareto/NBD article, we begin with a file containing a transaction log of a customer’s cohort.

1
2
3
4
5
6
7
import pandas as pd
import matplotlib.pyplot as plt
import lifetimes

data = pd.read_csv("data.csv", header = 0)
data['date'] = pd.to_datetime(data['date'])
data.head()

We must create a summary dataset which contains information about every customer. We can use the summary_data_from_transaction_data function to generate a summary data frame.

The result contains four columns:

  • recency — the time between the first and the last transaction

  • frequency — the number of purchases beyond the initial one

  • T — the time between the first purchase and the end of the calibration period

  • monetary value — the arithmetic mean of customer’s transactions in the calibration period

1
2
summary = lifetimes.utils.summary_data_from_transaction_data(data, 'cust', 'date', 'sales')
summary = summary.reset_index()

Note that, the Gamma-Gamma model is based on the assumption that the number of transactions does not depend on their monetary value.
The frequency and monetary value are not correlated if the output is close to zero. Let’s look at the result.

1
summary[['monetary_value', 'frequency']].corr()

It seems that we can use the summary to estimate CLV.

Gamma-Gamma model needs data to forecast CLV. If we have some customers who did not buy anything in the calibration period and their monetary value in the summary dataset is zero, we must remove them. After that we can build the model.

1
2
3
4
5
summary = summary[summary['monetary_value'] > 0]
from lifetimes import GammaGammaFitter

gg_model = GammaGammaFitter()
gg_model.fit(summary['frequency'], summary['monetary_value'])

Now, we can calculate customer lifetime value.

1
2
3
4
5
6
7
8
9
10
gg_model.customer_lifetime_value(
    pareto_nbd_model,
    summary['frequency'],
    summary['recency'],
    summary['T'],
    summary['monetary_value'],
    time=30, # days
    freq = 'D', # days, because we used the default 'freq' for summary_data_from_transaction_data
    discount_rate=0.003 # daily discount rate
)

Let’s have a look at the “freq” parameter. I set it to days because when I was generating the summary data frame, I did not specify I it at all. It means that the default value (days) were used.

If you change it, it is crucial to remember to set it in every function that needs such a parameter. Also, the “time” and “discount_rate” parameters must be adjusted accordingly.

Did you enjoy reading this article?
Would you like to learn more about software craft in data engineering and MLOps?

Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!

Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.

Bartosz Mikulski

Bartosz Mikulski

  • Data/MLOps engineer by day
  • DevRel/copywriter by night
  • Python and data engineering trainer
  • Conference speaker
  • Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
  • Twitter: @mikulskibartosz
Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.