Minkowski Distance Explained: Understanding Distance Metrics in Machine Learning

Sometimes we want to measure how much things are similar to each other or how different they are. It happens not only when we use algorithms like k-NN classification or clustering.

Table of Contents

Minkowski distance
1. Normed vector space
Minkowski distance - requirements
Minkowski distance types

When we measure the performance of any other machine learning algorithm or neural network which returns a complex value that can be “partially correct.” In those cases, we want to know how close the result is to the correct answer.

In this article, I am going to explain a few distance metrics. First, I am going to start with metrics based on Minkowski distance because we all understand them intuitively. In the upcoming articles, I will also show you how to measure the “distance” between sets of values and distance between sequences.

Minkowski distance

When we think about distance, we usually imagine distances between cities. That is the most intuitive understanding of the distance concept. Fortunately, this example is perfect for explaining the constraints of Minkowski distances.

Normed vector space

We can calculate Minkowski distance only in a normed vector space, which is a fancy way of saying: “in a space where distances can be represented as a vector that has a length.”

Let’s start by proving that a map is a vector space. If we take a map, we see that distances between cities are normed vector space because we can draw a vector that connects two cities on the map. We can combine multiple vectors to create a route that connects more than two cities. Now, the adjective “normed.” It means that the vector has its length and no vector has a negative length. That constraint is met too because if we draw a line between cities on the map, we can measure its length.

Minkowski distance - requirements

The zero vector, 0, has zero length; every other vector has a positive length. If we look at a map, it is obvious. The distance from a city to the same city is zero because we don’t need to travel at all. The distance from a city to any other city is positive because we can’t travel -20 km.
Multiplying a vector by a positive number changes its length without changing its direction We traveled 50 km North. If we travel 50 km more in the same direction, we will end up 100 km North. The direction does not change. Easy, isn’t it?
The shortest distance between any two points is a straight line (this is called Triangle inequality). I believe it is self-explanatory.

Minkowski distance types

There is only one equation for Minkowski distance, but we can parameterize it to get slightly different results.

\[D\left(X,Y\right)=\left(\sum_{i=1}^n |x_i-y_i|^p\right)^{1/p}\]

Manhattan distance

It is the sum of absolute differences of all coordinates. It is a perfect distance measure for our example. When we can use a map of a city, we can give direction by telling people that they should walk/drive two city blocks North, then turn left and travel another three city blocks. In total they will travel five city blocks, that is the Manhattan distance between the starting point and their destination.

\[D\left(X,Y\right)=\sum_{i=1}^n |x_i-y_i|\]

Euclidean distance

If we look again at the city block example used to explain the Manhattan distance, we see that the traveled path consists of two straight lines. When we draw another straight line that connects the starting point and the destination, we end up with a triangle. In this case, the distance between the points can be calculated using the Pythagorean theorem.

\[D\left(X,Y\right)=\sqrt{\sum_{i=1}^n (x_i-y_i)^2}\]

Chebyshev distance

It is the extreme case of Minkowski distance. When we use infinity as the value of the parameter p, we end up with a metric that defines distance as the maximal absolute difference between coordinates:

\[D_{\rm Chebyshev}(x,y) := \max_i(|x_i -y_i|)\]

I wondered how it is used in practice and I found one example. In a warehouse, the distance between locations can be represented as Chebyshev distance if an overhead crane is used because the crane moves on both axes at the same time with the same speed.