Using Exponentially Weighted Moving Average for anomaly detection

In this article, I am going to describe how to use an exponentially weighted moving average for anomaly detection. It certainly is one of the dullest methods to do it, but in some cases, the moving average may be enough.

The method works well if we can make two assumptions about data:

  • The values are Gaussian distributed around the mean.

  • There is no seasonality.

In my example, I used the daily number of visitors on my website. I had to split the data into workday and weekend data because of the huge difference in values.

First, I have to choose the length of the moving window. A longer window is less susceptible to changes in values and may detect more outliers. Possibly some of them may be false positives.

A shorter moving window adapts quickly to changing values, which in consequence may cause missed alarms afterward.

I decided to use a three-day window. Now, I have to calculate the moving average and moving standard deviation. In addition to that, I should determine what constitutes to be an outlier. In this example, an outlier is a value that differs by more than one standard deviation from the mean.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import pandas as pd
import matplotlib.pyplot as plt

window_size = 3
mean = work_days['Users'].ewm(window_size).mean()
std = work_days['Users'].ewm(window_size).std()
std[0] = 0 #the first value turns into NaN because of no data

mean_plus_std = mean + std
mean_minus_std = mean - std

is_outlier = (work_days['Users'] > mean_plus_std) | (work_days['Users'] < mean_minus_std)
outliers = work_days[is_outlier]

plt.plot(work_days['Users'], c = 'b', label = 'Actual Values')
plt.plot(mean, c = 'r', label = 'Exponentially Weighted Moving Average')
plt.plot(mean_plus_std, 'k--', label = 'Prediction Bounds')
plt.plot(mean_minus_std, 'k--')
plt.scatter(outliers.index, outliers['Users'], c = 'r', marker = 'o', s = 120, label = 'Outliers')
plt.legend()

Did you enjoy reading this article?
Would you like to learn more about leveraging AI to drive growth and innovation, software craft in data engineering, and MLOps?

Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!

Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.

Bartosz Mikulski

Bartosz Mikulski

  • MLOps engineer by day
  • AI and data engineering consultant by night
  • Python and data engineering trainer
  • Conference speaker
  • Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
  • Twitter: @mikulskibartosz
  • Mastodon: @mikulskibartosz@mathstodon.xyz
Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.