Understanding uncertainty intervals generated by Prophet

When we predict a value of something using Prophet we get not only the estimated value but also the lower and upper bound of the uncertainty interval. To make them more useful for us, we should dig a little bit into the details and see how Prophet produces them.

Let’s look at the examples. In all of them, I use the same dataset as the input. This dataset is a collection of Bitcoin prices between 2018.10.01 and 2018.11.11. It looks like this.

1
2
3
4
5
6
7
8
import pandas as pd
from fbprophet import Prophet
data = pd.read_csv('../input/bitstampUSD_1-min_data_2012-01-01_to_2018-11-11.csv')
data['date'] = pd.to_datetime(data['Timestamp'], unit="s")
input_data = data[["date", "Close"]]
input_data = input_data.rename(columns={"date": "ds", "Close": "y"})
subset = input_data[input_data["ds"] >= "2018-10-01"]
subset.plot(x = "ds", y = "y")
Bitcoin price (in USD) between 2018.10.01 and 2018.11.11
Bitcoin price (in USD) between 2018.10.01 and 2018.11.11

In all cases, I am going to generate predictions of the Bitcoin price for the next 72 hours, so for this dataset for days between 2018.11.12 and 2018.11.14.

If we check the real Bitcoin prices on that days, we see that my predictions are very, very wrong. It is okay because this blog post is about tweaking uncertainty intervals, not about improving the accuracy of estimations ;)

In the first example, I am going to generate predictions using the default parameters.

1
2
3
4
5
m = Prophet()
m.fit(subset)
future = m.make_future_dataframe(periods=72, freq="H")
forecast = m.predict(future)
fig1 = m.plot(forecast)
Forecast plot generated using default settings.
Forecast plot generated using default settings.

Prophet estimates the uncertainty intervals using Monte Carlo simulation. The “uncertainty_samples” parameter controls the simulation. It is the number of samples used to estimate the uncertainty interval (by default 1000).

We can reduce that number to speed up Prophet, but such a reduction gives us uncertainty which has a higher variance

Let’s see what happens when I reduce the number of samples from 1000 to 100. Look at the shape of the plotted interval.

1
2
3
4
5
m = Prophet(uncertainty_samples = 100)
m.fit(subset)
future = m.make_future_dataframe(periods=72, freq="H")
forecast = m.predict(future)
fig1 = m.plot(forecast)
uncertainty_samples = 100
uncertainty_samples = 100

It is not as smooth as the one generated using 1000 samples and the variance is larger than in the previous example.

Due to the default settings, the uncertainty interval covers 80% of the samples generated by the Monte Carlo simulation. For me, it was counterintuitive because I expected 95% uncertainty interval.

We can set the width of the interval using the “interval_width” parameter. If we set it to 0.95, the generated uncertainty interval is going to be enormous ;)

1
2
3
4
5
m = Prophet(uncertainty_samples = 100, interval_width = 0.95)
m.fit(subset)
future = m.make_future_dataframe(periods=72, freq="H")
forecast = m.predict(future)
fig1 = m.plot(forecast)
uncertainty_samples = 100 and interval_width = 0.95
uncertainty_samples = 100 and interval_width = 0.95

What else does affect the uncertainty interval?
Seasonality. By default, the uncertainty interval of the calculated seasonality is not taken into account, but Prophet calculates it when we set the “mcmc_samples” parameter. We see the difference in both the forecast plot and the plot of the components.

1
2
3
4
5
m = Prophet(uncertainty_samples = 100, mcmc_samples=100)
m.fit(subset)
future = m.make_future_dataframe(periods=72, freq="H")
forecast = m.predict(future)
fig1 = m.plot(forecast)
uncertainty_samples = 100, mcmc_samples=100
uncertainty_samples = 100, mcmc_samples=100
1
fig2 = m.plot_components(forecast)
uncertainty_samples = 100, mcmc_samples=100
uncertainty_samples = 100, mcmc_samples=100

Did you enjoy reading this article?
Would you like to learn more about leveraging AI to drive growth and innovation, software craft in data engineering, and MLOps?

Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!

Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.

Bartosz Mikulski

Bartosz Mikulski

  • MLOps engineer by day
  • AI and data engineering consultant by night
  • Python and data engineering trainer
  • Conference speaker
  • Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
  • Twitter: @mikulskibartosz
  • Mastodon: @mikulskibartosz@mathstodon.xyz
Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.