All about Manhattan Distance
In Machine Learning Algorithms, we use distance metrics such as Euclidean, Manhattan, Minkowski, and Hamming.
In this article, we will briefly discuss one such metric, i.e., Manhattan Distance.
Different distance metrics are used in the machine-learning model; These metrics are the foundation of different machine-learning algorithms, whether it is a supervised (k-nearest neighbor) or unsupervised learning (k-mean clustering) algorithm. This article will discuss one such distance metric, i.e., Manhattan Distance Metric.
Must Check: Machine Learning Online Courses and Certification
Table of Content
- Types of Distance Matrices in Machine Learning
- Manhattan Distance
- Example
- Properties of Manhattan Distance
- Conclusion
In k-mean clustering and k-Nearest Neighbor Algorithm, while creating clusters, you have to find the value of k and the data points that are closed enough to be considered as nearest neighbors; we use different distance metrics like Euclidean, Manhattan, Minkowski, or Hamming.
Best-suited Data Science courses for you
Learn Data Science with these high-rated online courses
Types of Distance Matrices in Machine Learning
Four distance metrics are mainly used in Machine Learning.
Euclidean
It is one of the most common distance metrics which is very often used in machine learning algorithms that calculates the distance between two real-valued vectors. It is the shortest distance between two points.
Mathematical Formula
- The formula for Euclidean Distance (2-D):
d = [(x1 – y1 )2+ (x2 – y2)2]1/2
- Generalize Formula (n-D):
d = [(x1 – y1)2 + (x2 – y2)2 + (x3 – y3)2 + …….. + (xn – yn)2]½
Manhattan Distance
It is the sum of the absolute differences between points across all the dimensions.
- It calculates the distance between real vectors.
- It is also called Taxicab distance or City Block Distance.
Mathematical Formula
- 2-D
d = |x1 – y1| + |x2 – y2|
- General Formula (n-D)
d = |x1 – y1| + |x2 – y2| + |x3 – y3| + |x4 – y4| + …… + |xn – yn|
Minkowski Distance
It is the generalization of Euclidean and Manhattan Distance.
Mathematical Formula
d = [|x1 – y1|p + |x2 – y2|p + |x3 – y3|p + ….. + |xn – yn|p]1/p
Where p is the Order of Norm.
Hamming Distance
Hamming distance between two strings (of equal length) is the number of positions at which the corresponding alphabet or symbols differ.
- In simple terms, the number of substitutes required to change one string to another.
Example:
Let there be two strings, “Naukri” and “Pujari”.
Since both the strings are of the same length, so we can calculate the Hamming Distance.
The first four places in both the strings differ, and the last two places have the same characters.
Naukri and Pujari
Hence, the hamming distance here will be 4.
Note: The larger Hamming distance value implies maximum dissimilarities between the two strings and vice versa.
Now, we will briefly discuss Manhattan Distance.
Manhattan Distance
Manhattan distance between two points X (x1, x2, x3, ….., xn) and Y (y1, y2, y3, ….., yn) in n-dimensional is the sum of the distance in each dimension.
It is called the Manhattan distance because it is the distance a car would drive in a city (e.g., Manhattan), where the buildings are laid out in square blocks, and the straight streets intersect at right angles.
Now, you also know why it is called a taxicab and city block distance.
Manhattan Distance using Python:
Calculating the Manhattan distance by defining a function
from math import sqrt
#define a manhattan function using sqrt function
def manhattan(a, b): return sum(abs(v1 - v2) for v1, v2 in zip (a, b))
#define the pointsX = [1, 2, 3, 4, 5]Y = [6, 7, 8, 9, 10]
#calculate the distancemanhattan (X, Y)
Output
25
Note: You can also calculate the Manhattan distance using the scikit-learn library of Python.
from sklearn.metrics.pairwise import manhattan_distances
Properties of Manhattan Distance
- There are finite paths between two points whose length is equal to the Manhattan distance.
- For a given point, the other point at a given Manhattan distance lies in the square.
- A straight path with a length equal to Manhattan distance has only two permitted moves:
- Horizontal
- Vertical
- Manhattan distance is a particular case of Minkowski Distance
- For p = 1, Manhattan Distance = Minkowski Distance
- Manhattan Distance metric is preferred over Euclidean Distance when there is a high dimensionality in the data.
Conclusion
In this article, we have discussed the different types of distance metrics that are used in Machine Learning. We also covered Manhattan Distance in complete detail, with its properties and example in Python.
Hope you will like the article.
Keep Learning!!
Keep Sharing!!
Vikram has a Postgraduate degree in Applied Mathematics, with a keen interest in Data Science and Machine Learning. He has experience of 2+ years in content creation in Mathematics, Statistics, Data Science, and Mac... Read Full Bio