Understanding uncertainty intervals generated by Prophet

When we predict a value of something using Prophet we get not only the estimated value but also the lower and upper bound of the uncertainty interval. To make them more useful for us, we should dig a little bit into the details and see how Prophet produces them.

Let’s look at the examples. In all of them, I use the same dataset as the input. This dataset is a collection of Bitcoin prices between 2018.10.01 and 2018.11.11. It looks like this.

1
2
3
4
5
6
7
8
import pandas as pd
from fbprophet import Prophet
data = pd.read_csv('../input/bitstampUSD_1-min_data_2012-01-01_to_2018-11-11.csv')
data['date'] = pd.to_datetime(data['Timestamp'], unit="s")
input_data = data[["date", "Close"]]
input_data = input_data.rename(columns={"date": "ds", "Close": "y"})
subset = input_data[input_data["ds"] >= "2018-10-01"]
subset.plot(x = "ds", y = "y")
Bitcoin price (in USD) between 2018.10.01 and 2018.11.11
Bitcoin price (in USD) between 2018.10.01 and 2018.11.11

In all cases, I am going to generate predictions of the Bitcoin price for the next 72 hours, so for this dataset for days between 2018.11.12 and 2018.11.14.

If we check the real Bitcoin prices on that days, we see that my predictions are very, very wrong. It is okay because this blog post is about tweaking uncertainty intervals, not about improving the accuracy of estimations ;)

In the first example, I am going to generate predictions using the default parameters.

1
2
3
4
5
m = Prophet()
m.fit(subset)
future = m.make_future_dataframe(periods=72, freq="H")
forecast = m.predict(future)
fig1 = m.plot(forecast)
Forecast plot generated using default settings.
Forecast plot generated using default settings.

Prophet estimates the uncertainty intervals using Monte Carlo simulation. The “uncertainty_samples” parameter controls the simulation. It is the number of samples used to estimate the uncertainty interval (by default 1000).

Do you want to show your product/service to 25000 data science enthusiasts every month? I am looking for companies which would like to become a partner of this blog.

Are you interested? Is your employer interested? Here are the details of the offer.

We can reduce that number to speed up Prophet, but such a reduction gives us uncertainty which has a higher variance

Let’s see what happens when I reduce the number of samples from 1000 to 100. Look at the shape of the plotted interval.

1
2
3
4
5
m = Prophet(uncertainty_samples = 100)
m.fit(subset)
future = m.make_future_dataframe(periods=72, freq="H")
forecast = m.predict(future)
fig1 = m.plot(forecast)
uncertainty_samples = 100
uncertainty_samples = 100

It is not as smooth as the one generated using 1000 samples and the variance is larger than in the previous example.

Due to the default settings, the uncertainty interval covers 80% of the samples generated by the Monte Carlo simulation. For me, it was counterintuitive because I expected 95% uncertainty interval.

We can set the width of the interval using the “interval_width” parameter. If we set it to 0.95, the generated uncertainty interval is going to be enormous ;)

1
2
3
4
5
m = Prophet(uncertainty_samples = 100, interval_width = 0.95)
m.fit(subset)
future = m.make_future_dataframe(periods=72, freq="H")
forecast = m.predict(future)
fig1 = m.plot(forecast)
uncertainty_samples = 100 and interval_width = 0.95
uncertainty_samples = 100 and interval_width = 0.95

What else does affect the uncertainty interval?
Seasonality. By default, the uncertainty interval of the calculated seasonality is not taken into account, but Prophet calculates it when we set the “mcmc_samples” parameter. We see the difference in both the forecast plot and the plot of the components.

1
2
3
4
5
m = Prophet(uncertainty_samples = 100, mcmc_samples=100)
m.fit(subset)
future = m.make_future_dataframe(periods=72, freq="H")
forecast = m.predict(future)
fig1 = m.plot(forecast)
uncertainty_samples = 100, mcmc_samples=100
uncertainty_samples = 100, mcmc_samples=100
1
fig2 = m.plot_components(forecast)
uncertainty_samples = 100, mcmc_samples=100
uncertainty_samples = 100, mcmc_samples=100

Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you watch programming live streams, check out my YouTube channel.
You can also follow me on Twitter: @mikulskibartosz

If you want to hire me, send me a message on LinkedIn or Twitter.


If this article was helpful, consider donating to WWF or any other charity of your choice.
Bartosz Mikulski
Bartosz Mikulski * data scientist / software engineer * conference speaker * organizer of School of A.I. meetups in Poznań * co-founder of Software Craftsmanship Poznan & Poznan Scala User Group