Matching Strings: An Introduction to Knuth-Morris-Pratt (KMP) Algorithm

Q: How does the KMP algorithm work?

The KMP algorithm works by precomputing a "failure function" or "longest prefix suffix" array for the pattern string. This array helps determine the number of characters to skip when a mismatch occurs during the matching process. By utilizing this information, the algorithm avoids rechecking previously matched characters.

Q: What is the time complexity of the KMP algorithm?

The time complexity of the KMP algorithm is O(n + m), where n is the length of the text string and m is the length of the pattern string. It achieves linear time complexity by avoiding unnecessary character comparisons during the matching process.

Q: Are there any limitations to the KMP algorithm?

The KMP algorithm is primarily designed for exact string matching. It does not handle approximate or fuzzy matching scenarios, where slight variations in the pattern are allowed. For approximate matching, other algorithms like the Boyer-Moore algorithm or the Rabin-Karp algorithm may be more suitable.

6 mins read2.4K Views Comment

Download as PDF

Shiksha Online

Updated on Jul 7, 2023 13:13 IST

Knuth-Morris-Pratt or KMP is a linear string matching algorithm with low time complexity of O(n+m), where n is the string length and m is the pattern length. In this article, we will what KMP algorithm is, how it works, algorithm for LPP table.

In this article, we will understand the working of the KMP string-matching algorithm. We will also solve an example using the KMP algorithm. Before starting this article, we will first understand the need for the KMP algorithm and what were the problems of the naïve string-matching algorithm.

Table of Content

Introduction
What is the Need for the KMP Algorithm
What is KMP Algorithm
Working of KMP Algorithm with an Example
Algorithm for the LLP Table
Step-by-Step KMP Algorithm

Introduction

KMP is a linear string-matching algorithm with low time complexity. It is the solution to the drawbacks of previous text-matching algorithms. You probably have used the advantages of the string-matching algorithm in your life. For example, while working on Microsoft applications like Microsoft Word, you might use the “Find” option to identify similar words. A pattern-matching algorithm supports the functionality of this Microsoft Word feature. In this article, we will understand the working of the KMP string-matching algorithm. We will also solve an example using the KMP algorithm.

To begin with, we should first understand why the KMP algorithm was needed and what the naive string-matching algorithms had problems with.

Know more about Data Structure and Algorithms

Must Check: Data Structure and Algorithms Online Courses and Certificates

Recommended online courses

Best-suited Data Structures and Algorithms courses for you

Learn Data Structures and Algorithms with these high-rated online courses

Python Foundation with Data Structures & Algorithms

Coding NinjasCertificate

4.6

Total Fees

– / –

Duration

4 months

Foundation of Data Structures

IIT BombayCertificate

5.0

Total Fees

– / –

Duration

16 weeks

Parallel Algorithm

IIT KanpurCertificate

Total Fees

Free

Duration

– / –

Programming and Data Structures

IIT KharagpurCertificate

4.4

Total Fees

– / –

Duration

– / –

DSA to Development A Complete Guide

GeeksforGeeksCertificate

Total Fees

– / –

Duration

150 hours

Data Structures & Algorithm

Coding NinjasCertificate

5.0

Total Fees

– / –

Duration

6 months

Foundation for Competitive Programming

Coding NinjasCertificate

5.0

Total Fees

– / –

Duration

4 months

Numerical Methods

NPTELCertificate

Total Fees

– / –

Duration

8 weeks

Programming for Problem Solving (KCS101/KCS201) (M)

IIT KanpurCertificate

Total Fees

₹4.24 K

Duration

6 weeks

Data Structures

Online Cell- Centre for Distance EducationCertificate

Total Fees

– / –

Duration

12 weeks

What is the Need for the KMP Algorithm?

To match a pattern and a string an application use string-matching algorithms. The simplest algorithm is the basic or naïve string-matching algorithm.

The basic algorithms have some drawbacks and are not effective enough to be used for string-matching applications.

The basic string-matching algorithms consume more time.
They use back-tracking whenever there is a mismatch in the string character and a matching pattern character.
With the increasing length of the sample pattern and original string, it is not feasible to repeat the matching process whenever there is a mismatch.
The repeated process consumes more memory and time.
Their time complexity is O(n*m), where n is the length of the text or string and m is the length of the pattern.

To overcome the problems with the naïve algorithm. There was a need for a fruitful and effective string-matching algorithm. This requirement gives birth to the KMP algorithm.

Adjacency Matrix For Graphs

An Adjacency Matrix is a method of representing graphs in matrix form. The adjacency matrix plays a vital role in describing finite graphs, making them easier to understand and compact...read more

Read Later

Understanding Adjacency List

In this article, we will discuss an adjacency list representation of its implementation with an example; we will also discuss applications of an adjacency list along with its advantages and...read more

Read Later

Difference between Stack and Queue

This article compares stacks and queues in programming. Stacks use LIFO to undo mechanisms and function calls, while queues use FIFO to manage processes, such as printing jobs or CPU...read more

Read Later

What is KMP Algorithm?

KMP is the fastest string-matching algorithm. Its full form is Knuth Morris Pratt Algorithm. The KMP algorithm got this name from the name of its inventors. In 1970, James H Morris, Vaughan Pratt, and Donald Knuth invented this linear string-matching algorithm.

The KMP is the only string-matching algorithm with a time complexity of O(n+m), where n is the string length and m is the pattern length.

In string-matching algorithms, there are two terminologies: sting or text and pattern.

String or text is the original string that is used for matching.
The pattern is the sample text which is to be matched.

All About Kruskal’s Algorithm

Kruskal’s algorithm is based on a greedy approach, whose goal is to find the shortest path in a graph with a minimum cost.

Read Later

Understanding Dijkstra’s Algorithm

Dijkstra’s algorithm or Dijkstra’s shortest path algorithm or single source shortest path algorithm is used in identifying the shortest path from starting node to a destination node in a weighted...read more

Read Later

Introduction to Bellman Ford Algorithm

Bellman Ford Algorithm is a Dynamic programming-based single source shortest path algorithm. Bellman Ford begins with a vertex and finds the shortest distance between other vertices.

Read Later

Working of KMP Algorithm with an Example

In this section, we will discuss the working of the KMP algorithm with an example. In the KMP algorithm, an index number is used to track each character of the string and pattern. A variable is used to trace the index numbers.

The matching of each character of text and pattern occurs between two strings.

There is a pie table or Longest Proper Prefix (LPP) table. For making an NPP table, there are two terms known as prefixes and suffixes in a string.

There is a pie table or Longest Proper Prefix (LPP) table. For making an NPP table, there are two terms prefixes, and suffixes in a string.

For example, in a string:

a b a b

The prefix for the above string can be: a, ab, or aba. The prefix excludes the last string character.
The suffix for the above string can be: b, ba, or bab. It excludes the first character of the string.

Consider an example to make an LPP table.

String = ababcb

And the LPP table is shown below:

Index value: 1 2 3 4 5

a	b	a	b	d
0	0	1	2	0

Sample string

LLP values

We check that, in the string the first “a” comes for the first time so its value is 0.
“b” comes the first time in the string so its value is 0.
The next character in the string is “a” it is already present with an index value of 1 so its value will be 1.
The next character “b” is repeating and it is present at index value 2, so its value will be 2.
String character “c” comes the first time in the string, so its value will be 0.

This is how you can make an LPP table by using the same index values for repeating characters.

All About Floyd-Warshall Algorithm

Floyd Warshall is a dynamic programming algorithm used to solve all pair shortest path problems. Dynamic Programming is an approach used in data structure and algorithms for solving problems efficiently,...read more

Read Later

Introduction to Greedy Algorithm

The greedy algorithm in data science is an approach to problem-solving that chooses the best choice based on the circumstances at hand. In this article, we will briefly discuss Greedy...read more

Read Later

Breadth-First Search Algorithm

A graph traversal algorithm called breadth-first search investigates every node in the graph starting with the root node. In this article, we will briefly discuss BFS algorithm.

Read Later

Working of the Knuth Morris Pratt Algorithm

String = ababcabcab

We take two variables “i” and “j” respectively for the index values of the string and pattern. The value of variable j is 0. With every value of variable i, the next value of j will be used for comparison.

When i = 1, j= 0.

When i =2, j = 1.

When i = 3, j = 2.

When i = 4, j = 3.

When i = 5, j = 4.

So, the value of variable i = 5 and j at 5 denotes the value 0 as per LLP table value of character d. The value of variable i will increase and it does not start from the starting point like in a naïve string-matching algorithm.
Now, i = 6, j = 0

When i = 7, j = 1.

When i = 8, j = 2.

It is a mismatch at j = 2. Looking at its value in the LLP table. The value of j = 0 and i will increase to the next value.
When i = 9, j = 0.

Comparing the next value with the “i” value.

When i =10, j = 1.

In this way, we can match strings.

Algorithm for the LLP Table

Take two variables i and j to trace the pattern.
Initially, i <- 0, and j <- 2.
Set LLP [0] = 0. The LLP value of the first character is always 0.
if pattern [i] == pattern [j+1]

LLP [q] <- j+1

if pattern [i]! = pattern[j+1]

LLP [q] <- 0

Step by Step Knuth Morris Pratt Algorithm

Take two variables x and y to trace the index values of the string and pattern respectively.
x <- 1, y <- 0, len <- length of the string.
While x < string[len]

If string [x] == pattern [y + 1]

It is a match.

x <- x+1 and y <- y+1.

else If string [x]!= pattern [y+1]

It is a mismatch.

x <- x+1 and y <- LLP[y]

Conclusion

The KMP algorithm is used in plagiarism, DNA sequencing, checking word spelling, and many more. It is the fastest string-matching algorithm with an O(m+n) time complexity. The KMP algorithm is a little harder to understand than the naïve algorithm.

Contributed By: Sonal Meenu Singh

FAQs

How does the KMP algorithm work?

The KMP algorithm works by precomputing a "failure function" or "longest prefix suffix" array for the pattern string. This array helps determine the number of characters to skip when a mismatch occurs during the matching process. By utilizing this information, the algorithm avoids rechecking previously matched characters.

What is the time complexity of the KMP algorithm?

The time complexity of the KMP algorithm is O(n + m), where n is the length of the text string and m is the length of the pattern string. It achieves linear time complexity by avoiding unnecessary character comparisons during the matching process.

Are there any limitations to the KMP algorithm?

The KMP algorithm is primarily designed for exact string matching. It does not handle approximate or fuzzy matching scenarios, where slight variations in the pattern are allowed. For approximate matching, other algorithms like the Boyer-Moore algorithm or the Rabin-Karp algorithm may be more suitable.

About the Author

Shiksha Online

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio

Matching Strings: An Introduction to Knuth-Morris-Pratt (KMP) Algorithm

Introduction

Best-suited Data Structures and Algorithms courses for you

Python Foundation with Data Structures & Algorithms

Foundation of Data Structures

Parallel Algorithm

Programming and Data Structures

DSA to Development A Complete Guide

Data Structures & Algorithm

Foundation for Competitive Programming

Numerical Methods

Programming for Problem Solving (KCS101/KCS201) (M)

Data Structures

What is the Need for the KMP Algorithm?

What is KMP Algorithm?

Working of KMP Algorithm with an Example

Working of the Knuth Morris Pratt Algorithm

Algorithm for the LLP Table

Step by Step Knuth Morris Pratt Algorithm

Conclusion

FAQs

Top Picks & New Arrivals