DSA 450 Notes

Problem Statement

Given a text string and a pattern string, find all occurrences of the pattern in the text using the KMP (Knuth-Morris-Pratt) algorithm. This algorithm uses the information about previously matched characters to avoid redundant comparisons.

Example:

Input: text = "AABAACAADAABAAABAA", pattern = "AABA"
Output: [0, 9, 13] (pattern found at indices 0, 9, and 13)

Algorithm Overview

The KMP algorithm preprocesses the pattern to create a Longest Proper Prefix which is also Suffix (LPS) array. This array helps us skip characters in the text that we know will match, avoiding backtracking.

Key Insight

When a mismatch occurs at position j in the pattern, instead of starting over from position 0, we can use the LPS array to know the longest prefix of the pattern that is also a suffix of the matched portion. We can continue matching from there.

Understanding LPS Array

The LPS array stores, for each position i in the pattern, the length of the longest proper prefix that is also a suffix of pattern[0..i].

Example: Pattern = "AABAACAABAA"

Index:    0 1 2 3 4 5 6 7 8 9 10
Pattern:  A A B A A C A A B A A
LPS:      0 1 0 1 2 0 1 2 3 4 5

LPS[0] = 0 (single char, no proper prefix)
LPS[1] = 1 ("AA" → "A" is both prefix and suffix)
LPS[2] = 0 ("AAB" → no match)
LPS[4] = 2 ("AABAA" → "AA" is both prefix and suffix)

Approach 1: Naive Pattern Matching

Algorithm

For each position in text, try to match the entire pattern.

Complexity

Time: O(n × m)
Space: O(1)

Approach 2: KMP Algorithm (Optimal)

Step 1: Build LPS Array

java

public int[] computeLPS(String pattern) {
    int m = pattern.length();
    int[] lps = new int[m];
    
    int len = 0;
    int i = 1;
    
    while (i < m) {
        if (pattern.charAt(i) == pattern.charAt(len)) {
            len++;
            lps[i] = len;
            i++;
        } else {
            if (len != 0) {
                len = lps[len - 1];
            } else {
                lps[i] = 0;
                i++;
            }
        }
    }
    
    return lps;
}

Step 2: KMP Search

java

import java.util.*;

public class Solution {
    public List<Integer> kmpSearch(String text, String pattern) {
        List<Integer> result = new ArrayList<>();
        int n = text.length();
        int m = pattern.length();
        
        if (m == 0 || m > n) return result;
        
        int[] lps = computeLPS(pattern);
        
        int i = 0;  // Index for text
        int j = 0;  // Index for pattern
        
        while (i < n) {
            if (text.charAt(i) == pattern.charAt(j)) {
                i++;
                j++;
            }
            
            if (j == m) {
                result.add(i - j);
                j = lps[j - 1];
            } else if (i < n && text.charAt(i) != pattern.charAt(j)) {
                if (j != 0) {
                    j = lps[j - 1];
                } else {
                    i++;
                }
            }
        }
        
        return result;
    }
}

Complexity Analysis

Time Complexity: O(n + m) - O(m) for LPS computation + O(n) for search
Space Complexity: O(m) - For the LPS array

Visual Example

Text: "AABAACAADAABAAABAA", Pattern: "AABA"

LPS for "AABA": [0, 1, 0, 1]

Step 1: Match at position 0
Text:    AABAACAADAABAAABAA
Pattern: AABA
         ^^^^
         Match found at index 0

Step 2: Using LPS, j = lps[3] = 1, continue from position 4
Text:    AABAACAADAABAAABAA
Pattern:  AABA
          ^^^^
          No match, j = lps[0] = 0

... (continue)

Found at indices: 0, 9, 13

Complete Implementation

Applications of KMP

1. Check if String is Rotation of Another

2. Find Shortest Palindrome (Add characters to front)

3. Count Occurrences of Pattern

Key Takeaways

LPS Array is the key preprocessing step
KMP guarantees O(n + m) time complexity
No backtracking in the text - the index i never decreases
The pattern index j can decrease, but total movements are bounded
KMP is optimal for single pattern matching
Understanding LPS is crucial - it captures the structure of the pattern
Used in text editors, DNA sequence matching, plagiarism detection

Comparison with Other Algorithms

| Algorithm | Preprocessing | Search | Best Use Case | |-----------|--------------|--------|---------------| | Naive | O(1) | O(n × m) | Small inputs | | KMP | O(m) | O(n) | Single pattern | | Rabin-Karp | O(m) | O(n) avg | Multiple patterns | | Boyer-Moore | O(m + σ) | O(n/m) | Large alphabet | | Z-Algorithm | O(n + m) | O(n) | Alternative to KMP |