Box and whiskers plot

We can effortlessly visualize the dispersion and skewness of data using the box and whiskers plot.

1
2
3
4
5
6
7
8
9
import seaborn as sns
data = sns.load_dataset('titanic')
data = data.dropna()

from matplotlib.pyplot import boxplot
import matplotlib.pyplot as plt

boxplot(data['age'], labels = ['age'])
plt.title("Titanic passenger's age - bars and whiskers")

The plot consists of 3 elements:

  • The line inside the rectangle indicates the median of data.

  • The rectangle shows the interquartile range (IQR). Its lower edge is placed at the 25% percentile (1st quartile). The upper edge is at the 75% percentile (3rd quartile).

  • The T-shaped lines are the whiskers. Normally the range of the whiskers shows values which are between the 1st quartile (Q1) and a number (Q1 — IQR1.5). The upper whisker ends at the value = Q3 + IQR1.5.

In case of this plot, the whiskers end at the minimal and the maximal values.

Outliers

If we limit the whiskers range to 1*IQR we will see another part of the plot. The circles indicate outliers.

1
2
3
4
5
from matplotlib.pyplot import boxplot
import matplotlib.pyplot as plt

boxplot(data['age'], whis = 1, labels = ['age'])
plt.title("Titanic passenger's age - bars and whiskers")

We can also limit the whiskers to given percentiles. The plot will display value lower than the n-th percentile and larger than k-th percentile as outliers.

1
2
3
4
5
from matplotlib.pyplot import boxplot
import matplotlib.pyplot as plt

boxplot(data['age'], whis = [5, 95], labels = ['age'])
plt.title("Titanic passenger's age - bars and whiskers")

Did you enjoy reading this article?
Would you like to learn more about software craft in data engineering and MLOps?

Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!

Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.

Bartosz Mikulski

Bartosz Mikulski

  • Data/MLOps engineer by day
  • DevRel/copywriter by night
  • Python and data engineering trainer
  • Conference speaker
  • Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
  • Twitter: @mikulskibartosz
Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.