Knuth–Morris–Pratt (KMP) Pattern Matching Substring Search - First Occurrence Of Substring

preview_player
Показать описание
📹 Intuitive Video Explanations
🏃 Run Code As You Learn
💾 Save Progress
❓New Unseen Questions
🔎 Get All Solutions

Question: Given a string s and a pattern p, determine if the pattern exists in the string. Return the index of where the first occurrence starts.

The Brute Force

The naive approach to solving this is looking in s for the first character in p.

If a match is found we begin a search from that index, call it i (for intersect).

We compare the 2nd character of p with index i + 1 of s
We compare the 3rd character of p with index i + 2 of s

...and so on until we have matched to all of p without either having overrun s or having found a mismatch between characters being compared.

We can then return i as our answer.

It doesn’t work well in cases where we see many matching characters followed by a mismatching character.

Complexities:

Time: O( len(s) * len(p) )

In a simple worst case we can have
s = "aaaaaab"
p = "aaab"

The problem is that for each first character match we have the potential to naively go into a search on a string that would never yield a correct answer repeatedly.

Other Algorithms

There are three linear time string matching algorithms: KMP (nuth–Morris–Pratt), Boyer-Moore, and Rabin-Karp.

Of these, Rabin-Karp is by far the simplest to understand and implement

Analysis

The time complexity of the KMP algorithm is O(len(s) + len(p)) "linear" in the worst case.

The key behind KMP is that it takes advantage of the succesful character checks during an unsuccessful pattern comparison subroutine.

We may have a series of many comparisons that succeed and then even if one fails at the end, we should not repeat the comparison work done since we already saw that a series matched.

What we will do is very similar to the naive algorithm, it is just that we save comparisons by tracking the longest propert prefixes of pattern that are also suffixes.

The key is that everytime we have a mismatch we try our best to prevent going backwards in s and repeating comparisons.

Algorithm Steps

We will preprocess the pattern string and create an array that indicates the longest proper prefix which is also suffix at each point in the pattern string.

A proper prefix does not include the original string.

For example, prefixes of “ABC” are “”, “A”, “AB” and “ABC”. Proper prefixes are “”, “A” and “AB”.

For example, suffixes of "ABC" are, "", "C", "BC", and "ABC". Proper prefixes are "", "C", and "BC".

Why do we care about these??

We know all characters behind our mismatch character match.

If we can find the length of the longest prefix that matches a suffix to that point, we can skip len(prefix) comparisons at the beginning.

The key reason we care about the prefix to suffix is because we want to "teleport" back to as early in the string to the point that we still know that there is a match.

Our goal is to minimize going backwards in our search string.

Complexities:

Time: O( len(p) + len(s) )

We spend len(p) time to build the prefix-suffix table and we spend len(s) time for the traversal on average.

Space: O( len(p) )

Our prefix-suffix table is going to be the length of the pattern string.

++++++++++++++++++++++++++++++++++++++++++++++++++

++++++++++++++++++++++++++++++++++++++++++++++++++

This question is number 7.13 in "Elements of Programming Interviews" by Adnan Aziz (they do Rabin-Karp but same problem, different algorithm)
Рекомендации по теме
Комментарии
Автор

Table of Contents: (I'm literally screaming. I'm sorry. I want to redo this video to make it better.)

Introducing The Creators The KMP Algorithm* 0:00 - 0:15
Problem Introduction With The Naive Approach 0:15 - 2:30
Why The Naive Approach Is Not Good 2:30 - 2:45
Walkthrough How The Algorithm Works 2:45 - 8:14
Building The Suffix-Proper-Prefix Table 8:14 - 12:34
Using The Suffix-Proper-Prefix Table In Traversal Walkthrough 12:34 - 15:08
Time & Space For The Table Build Step 15:08 - 15:51
Time & Space For The Traversal Step 15:51 - 16:08
Overall Time & Space 16:08 - 16:25
Summarizing Our Learnings 16:25 - 16:40
Wrap Up (Begging For More Subs) 16:40 - 17:05

*All 3 of the creators published the final paper together.

BackToBackSWE
Автор

An algorithm invented by 3 people and they expect us to come up with this in 45 min. Makes sense

ahmedsyed
Автор

BRO STOP YELLING AT ME I ALREADY SUBBED I KNOW YOU'RE GOOD

dimejifaluyi
Автор

You're the best explainer of algorithms I've ever seen. The extra very visible colored text on the screen while you're explaining also makes a huge difference.

nhk_kakin_futuremvrcreator
Автор

I never knew I needed this kind of teaching which makes me feel like im scolded in order to understand why a box is incremented.

shinra
Автор

I trust This dude's Teaching skills with my career, He's so CLEAR, To the Point and explains in Simple Language that no YouTube Channel Does that. Absolutely Loved it.

akaraze
Автор

people commenting "stop yelling" and all I see is the passion with which you are trying to teach us in the simplest way possible. I came here after watching 2-3 vids but your video cleared all my doubts. Thank you for making this video.

tanmaymalhotra
Автор

Now that's a cool explanation of a seemingly difficult concept. The thing is you cannot watch this video without some sort of preparation. You have to understand the limitations of the naive approach and then think of a logical method/alternative to overcome that. Only after that, when you watch the video and implement the algo, would it become clear to you...


Good job with the video, guys!!

rahulsaxena
Автор

after lots of videos about Knuth-Morris-Pratt algorithm i just understood how it works...when we have dismatch we just go one step behind and we look at the value of that step and thats it!this is where the i goes(index)..pfff at YOU BOY

bryanparis
Автор

Thanks, man!
This is the Best ever explanation on YouTube. I was stuck on this algorithm for hours

khushdevpandit
Автор

I had a hard time understanding KMP but you broke it down so well and I get it now. This goes for all the videos I've watch from you, thank you!

BroodWarEver
Автор

I really like your passion and felt very involved during the video
So far it’s the most approachable KMP video I’ve seen.
I vote for keeping the speed and voice as is, they allow the video to stand out in a good way

nono-ipfv
Автор

After reading a bit on Internet and then coming to your video. I got it straight into my head. Really well explained but people really need to have some background before watching such algorithms to just grasp it at one go.
Thanks man

pulkitjain
Автор

Thanks to you I'm one less algoritm away towards achieving my dream. So thanks for the good work and keep it up ;))

lamihh
Автор

Your explanations are amazing! This is the first time I fully understand the kmp algorithm. Thanks a lot! Keep up the good work ;)

sanskrititomar
Автор

if there is any topic i am searching on utube nd u have a video on it. i just simply watch it first, and then continue my research(that most of the time is not even needed, as u clear all the related concepts). thanks

anshu
Автор

Thank you from my heart.
I was about to give up on this algorithm, until I found this video, I appreciate this work so much.

hussainalramadan
Автор

Awesome explanation! Don't worry, the way you're explaining is not screaming. It is rather the enthusiasm with which you explain a problem and I have seen that enthusiasm in almost every video of yours.

Looking forward to more of your explanations. Kudos to you!

jalanshmunshi
Автор

Your explanation is the best among many YouTubers.

EdwardLawPhD
Автор

Didn't understand this during college... but you sir made me understand this within minutes... Great explanation :)

deepakbhoria