StringProblem 19 of 43

KMP Algo

Brute Force
Time: O(n × m)
Space: O(1)
Optimal
Time: O(n + m)
Space: O(m)

Problem Statement

Given a text string and a pattern string, find all occurrences of the pattern in the text using the KMP (Knuth-Morris-Pratt) algorithm. This algorithm uses the information about previously matched characters to avoid redundant comparisons.

Example:

  • Input: text = "AABAACAADAABAAABAA", pattern = "AABA"
  • Output: [0, 9, 13] (pattern found at indices 0, 9, and 13)

Algorithm Overview

The KMP algorithm preprocesses the pattern to create a Longest Proper Prefix which is also Suffix (LPS) array. This array helps us skip characters in the text that we know will match, avoiding backtracking.

Key Insight

When a mismatch occurs at position j in the pattern, instead of starting over from position 0, we can use the LPS array to know the longest prefix of the pattern that is also a suffix of the matched portion. We can continue matching from there.


Understanding LPS Array

The LPS array stores, for each position i in the pattern, the length of the longest proper prefix that is also a suffix of pattern[0..i].

Example: Pattern = "AABAACAABAA"

Index: 0 1 2 3 4 5 6 7 8 9 10 Pattern: A A B A A C A A B A A LPS: 0 1 0 1 2 0 1 2 3 4 5
  • LPS[0] = 0 (single char, no proper prefix)
  • LPS[1] = 1 ("AA" → "A" is both prefix and suffix)
  • LPS[2] = 0 ("AAB" → no match)
  • LPS[4] = 2 ("AABAA" → "AA" is both prefix and suffix)

Approach 1: Naive Pattern Matching

Algorithm

For each position in text, try to match the entire pattern.

Complexity

  • Time: O(n × m)
  • Space: O(1)

Approach 2: KMP Algorithm (Optimal)

Step 1: Build LPS Array

java
public int[] computeLPS(String pattern) {
    int m = pattern.length();
    int[] lps = new int[m];
    
    int len = 0;
    int i = 1;
    
    while (i < m) {
        if (pattern.charAt(i) == pattern.charAt(len)) {
            len++;
            lps[i] = len;
            i++;
        } else {
            if (len != 0) {
                len = lps[len - 1];
            } else {
                lps[i] = 0;
                i++;
            }
        }
    }
    
    return lps;
}

Step 2: KMP Search

java
import java.util.*;

public class Solution {
    public List<Integer> kmpSearch(String text, String pattern) {
        List<Integer> result = new ArrayList<>();
        int n = text.length();
        int m = pattern.length();
        
        if (m == 0 || m > n) return result;
        
        int[] lps = computeLPS(pattern);
        
        int i = 0;  // Index for text
        int j = 0;  // Index for pattern
        
        while (i < n) {
            if (text.charAt(i) == pattern.charAt(j)) {
                i++;
                j++;
            }
            
            if (j == m) {
                result.add(i - j);
                j = lps[j - 1];
            } else if (i < n && text.charAt(i) != pattern.charAt(j)) {
                if (j != 0) {
                    j = lps[j - 1];
                } else {
                    i++;
                }
            }
        }
        
        return result;
    }
}

Complexity Analysis

  • Time Complexity: O(n + m) - O(m) for LPS computation + O(n) for search
  • Space Complexity: O(m) - For the LPS array

Visual Example

Text: "AABAACAADAABAAABAA", Pattern: "AABA"

LPS for "AABA": [0, 1, 0, 1]

Step 1: Match at position 0 Text: AABAACAADAABAAABAA Pattern: AABA ^^^^ Match found at index 0 Step 2: Using LPS, j = lps[3] = 1, continue from position 4 Text: AABAACAADAABAAABAA Pattern: AABA ^^^^ No match, j = lps[0] = 0 ... (continue) Found at indices: 0, 9, 13

Complete Implementation


Applications of KMP

1. Check if String is Rotation of Another

2. Find Shortest Palindrome (Add characters to front)

3. Count Occurrences of Pattern


Key Takeaways

  1. LPS Array is the key preprocessing step
  2. KMP guarantees O(n + m) time complexity
  3. No backtracking in the text - the index i never decreases
  4. The pattern index j can decrease, but total movements are bounded
  5. KMP is optimal for single pattern matching
  6. Understanding LPS is crucial - it captures the structure of the pattern
  7. Used in text editors, DNA sequence matching, plagiarism detection

Comparison with Other Algorithms

| Algorithm | Preprocessing | Search | Best Use Case | |-----------|--------------|--------|---------------| | Naive | O(1) | O(n × m) | Small inputs | | KMP | O(m) | O(n) | Single pattern | | Rabin-Karp | O(m) | O(n) avg | Multiple patterns | | Boyer-Moore | O(m + σ) | O(n/m) | Large alphabet | | Z-Algorithm | O(n + m) | O(n) | Alternative to KMP |