Box and whiskers plot
We can effortlessly visualize the dispersion and skewness of data using the box and whiskers plot.
1
2
3
4
5
6
7
8
9
import seaborn as sns
data = sns.load_dataset('titanic')
data = data.dropna()
from matplotlib.pyplot import boxplot
import matplotlib.pyplot as plt
boxplot(data['age'], labels = ['age'])
plt.title("Titanic passenger's age  bars and whiskers")
The plot consists of 3 elements:

The line inside the rectangle indicates the median of data.

The rectangle shows the interquartile range (IQR). Its lower edge is placed at the 25% percentile (1st quartile). The upper edge is at the 75% percentile (3rd quartile).

The Tshaped lines are the whiskers. Normally the range of the whiskers shows values which are between the 1st quartile (Q1) and a number (Q1 — IQR1.5). The upper whisker ends at the value = Q3 + IQR1.5.
In case of this plot, the whiskers end at the minimal and the maximal values.
Are you interested in data engineering?
Check out my other blog https://easydata.engineering
Outliers
If we limit the whiskers range to 1*IQR we will see another part of the plot. The circles indicate outliers.
1
2
3
4
5
from matplotlib.pyplot import boxplot
import matplotlib.pyplot as plt
boxplot(data['age'], whis = 1, labels = ['age'])
plt.title("Titanic passenger's age  bars and whiskers")
We can also limit the whiskers to given percentiles. The plot will display value lower than the nth percentile and larger than kth percentile as outliers.
1
2
3
4
5
from matplotlib.pyplot import boxplot
import matplotlib.pyplot as plt
boxplot(data['age'], whis = [5, 95], labels = ['age'])
plt.title("Titanic passenger's age  bars and whiskers")
Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.
If you watch programming live streams, check out my YouTube channel.
You can also follow me on Twitter: @mikulskibartosz
If you want to hire me, send me a message on LinkedIn or Twitter.