Prophet plot explained
When I looked at a plot generated by Prophet for the first time, I was lost. I started looking for the explanation in the documentation. There were none. Great, so what now? I googled it. Now I was sure that I was the only person who did not understand the plot because I could not find anything (not even a StackOverflow question or someone asking for an explanation).
I had two options. I could either give up or start digging in the source code. Fortunately, when you look at the source code of the Prophet plot function, everything starts to be obvious and easy.
Let’s begin at the beginning ;) In the documentation, they use a time series of the log daily page views for the Wikipedia page for Peyton Manning as the input dataset.
In the first step, I am going to download the dataset and plot a line plot of the dataset.
1 2 3 4 5 6 7 8 9 10 import fbprophet import matplotlib.pyplot as plt import pandas as pd !curl -O https://raw.githubusercontent.com/facebook/prophet/master/examples/example_wp_log_peyton_manning.csv data = pd.read_csv("example_wp_log_peyton_manning.csv") data["ds"] = pd.to_datetime(data["ds"]) fig = plt.figure(facecolor='w', figsize=(10, 6)) plt.plot(data.ds, data.y)
In the picture, I cannot spot the individual data points. All I have is a weird broad blue line. It is not an error! It looks like this because there are many data points and they get plotted close to each other. That observation is going to be important later ;)
In the second step, I am going to fit a Prophet model to the data and generate the prediction. When the prediction is ready, I will plot it using the Prophet’s plot function:
1 2 3 4 5 6 7 model = fbprophet.Prophet() model.fit(data) future = model.make_future_dataframe(periods=365) forecast = model.predict(future) plot = model.plot(forecast)
When I looked at that for the first time, I could not understand anything. What is the dark blue area? Is it the uncertainty interval? What is the light blue area? Why do I see the black dots on the plot?
Now it is time to look at the source code and run the function. Here is the source code of the plot function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 def plot( m, fcst, ax=None, uncertainty=True, plot_cap=True, xlabel='ds', ylabel='y', figsize=(10, 6) ): if ax is None: fig = plt.figure(facecolor='w', figsize=figsize) ax = fig.add_subplot(111) else: fig = ax.get_figure() fcst_t = fcst['ds'].dt.to_pydatetime() ax.plot(m.history['ds'].dt.to_pydatetime(), m.history['y'], 'k.') ax.plot(fcst_t, fcst['yhat'], ls='-', c='#0072B2') if 'cap' in fcst and plot_cap: ax.plot(fcst_t, fcst['cap'], ls='--', c='k') if m.logistic_floor and 'floor' in fcst and plot_cap: ax.plot(fcst_t, fcst['floor'], ls='--', c='k') if uncertainty: ax.fill_between(fcst_t, fcst['yhat_lower'], fcst['yhat_upper'], color='#0072B2', alpha=0.2) ax.grid(True, which='major', c='gray', ls='-', lw=1, alpha=0.2) ax.set_xlabel(xlabel) ax.set_ylabel(ylabel) fig.tight_layout() return fig
Let’s run it step by step. I have not specified the ‘ax’ parameter, so the function is going to create a new plot:
1 2 3 4 figsize=(10, 6), xlabel='ds', ylabel='y' fig = plt.figure(facecolor='w', figsize=figsize) ax = fig.add_subplot(111)
In the next step, it is going to plot the black dots which indicate the data points used to train the model.
1 2 fcst_t = fcst['ds'].dt.to_pydatetime() ax.plot(model.history['ds'].dt.to_pydatetime(), model.history['y'], 'k.')
The next line plots the prediction.
1 ax.plot(fcst_t, fcst['yhat'], ls='-', c='#0072B2')
Once again, it was supposed to be a line plot, but it looks like a weird wide blue area.
At the beginning of this blog post, I have displayed a plot of the input data. When I scroll back and compare those two plots, it is apparent that the forecast plot looks like this because there are so many data points.
It does plot a line plot, but it cannot fit it in the plot area. Therefore it looks like this!
What happens after that? I have not specified the ‘cap’ and the ‘floor,’ so in the next step, the function is going to plot the uncertainty intervals.
1 ax.fill_between(fcst_t, fcst['yhat_lower'], fcst['yhat_upper'], color='#0072B2', alpha=0.2)
Finally, it draws the grid and the label axis:
1 2 3 4 ax.grid(True, which='major', c='gray', ls='-', lw=1, alpha=0.2) ax.set_xlabel(xlabel) ax.set_ylabel(ylabel) fig.tight_layout()
Like most of the plots, the Prophet prediction plot gets easier to read when you look at its parts separately ;)
Did you enjoy reading this article?
Would you like to learn more about software craft in data engineering and MLOps?
Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!
You may also like
- Fill missing values in Pandas
- How to save a machine learning model into a file
- How to install scikit-automl in a Kaggle notebook
- How To Avoid Data Leakage While Building A Machine Learning Model
- A comprehensive guide to putting a machine learning model in production using Flask, Docker, and Kubernetes