KMP Algo
Problem Statement
Given a text string and a pattern string, find all occurrences of the pattern in the text using the KMP (Knuth-Morris-Pratt) algorithm. This algorithm uses the information about previously matched characters to avoid redundant comparisons.
Example:
- Input:
text = "AABAACAADAABAAABAA",pattern = "AABA" - Output:
[0, 9, 13](pattern found at indices 0, 9, and 13)
Algorithm Overview
The KMP algorithm preprocesses the pattern to create a Longest Proper Prefix which is also Suffix (LPS) array. This array helps us skip characters in the text that we know will match, avoiding backtracking.
Key Insight
When a mismatch occurs at position j in the pattern, instead of starting over from position 0, we can use the LPS array to know the longest prefix of the pattern that is also a suffix of the matched portion. We can continue matching from there.
Understanding LPS Array
The LPS array stores, for each position i in the pattern, the length of the longest proper prefix that is also a suffix of pattern[0..i].
Example: Pattern = "AABAACAABAA"
Index: 0 1 2 3 4 5 6 7 8 9 10
Pattern: A A B A A C A A B A A
LPS: 0 1 0 1 2 0 1 2 3 4 5
- LPS[0] = 0 (single char, no proper prefix)
- LPS[1] = 1 ("AA" → "A" is both prefix and suffix)
- LPS[2] = 0 ("AAB" → no match)
- LPS[4] = 2 ("AABAA" → "AA" is both prefix and suffix)
Approach 1: Naive Pattern Matching
Algorithm
For each position in text, try to match the entire pattern.
Complexity
- Time: O(n × m)
- Space: O(1)
Approach 2: KMP Algorithm (Optimal)
Step 1: Build LPS Array
public int[] computeLPS(String pattern) {
int m = pattern.length();
int[] lps = new int[m];
int len = 0;
int i = 1;
while (i < m) {
if (pattern.charAt(i) == pattern.charAt(len)) {
len++;
lps[i] = len;
i++;
} else {
if (len != 0) {
len = lps[len - 1];
} else {
lps[i] = 0;
i++;
}
}
}
return lps;
}Step 2: KMP Search
import java.util.*;
public class Solution {
public List<Integer> kmpSearch(String text, String pattern) {
List<Integer> result = new ArrayList<>();
int n = text.length();
int m = pattern.length();
if (m == 0 || m > n) return result;
int[] lps = computeLPS(pattern);
int i = 0; // Index for text
int j = 0; // Index for pattern
while (i < n) {
if (text.charAt(i) == pattern.charAt(j)) {
i++;
j++;
}
if (j == m) {
result.add(i - j);
j = lps[j - 1];
} else if (i < n && text.charAt(i) != pattern.charAt(j)) {
if (j != 0) {
j = lps[j - 1];
} else {
i++;
}
}
}
return result;
}
}Complexity Analysis
- Time Complexity: O(n + m) - O(m) for LPS computation + O(n) for search
- Space Complexity: O(m) - For the LPS array
Visual Example
Text: "AABAACAADAABAAABAA", Pattern: "AABA"
LPS for "AABA": [0, 1, 0, 1]
Step 1: Match at position 0
Text: AABAACAADAABAAABAA
Pattern: AABA
^^^^
Match found at index 0
Step 2: Using LPS, j = lps[3] = 1, continue from position 4
Text: AABAACAADAABAAABAA
Pattern: AABA
^^^^
No match, j = lps[0] = 0
... (continue)
Found at indices: 0, 9, 13
Complete Implementation
Applications of KMP
1. Check if String is Rotation of Another
2. Find Shortest Palindrome (Add characters to front)
3. Count Occurrences of Pattern
Key Takeaways
- LPS Array is the key preprocessing step
- KMP guarantees O(n + m) time complexity
- No backtracking in the text - the index
inever decreases - The pattern index
jcan decrease, but total movements are bounded - KMP is optimal for single pattern matching
- Understanding LPS is crucial - it captures the structure of the pattern
- Used in text editors, DNA sequence matching, plagiarism detection
Comparison with Other Algorithms
| Algorithm | Preprocessing | Search | Best Use Case | |-----------|--------------|--------|---------------| | Naive | O(1) | O(n × m) | Small inputs | | KMP | O(m) | O(n) | Single pattern | | Rabin-Karp | O(m) | O(n) avg | Multiple patterns | | Boyer-Moore | O(m + σ) | O(n/m) | Large alphabet | | Z-Algorithm | O(n + m) | O(n) | Alternative to KMP |