StringProblem 18 of 43

Rabin Karp Algo

Brute Force
Time: O(n × m)
Space: O(m)
Optimal
Time: O(n + m)
Space: O(m)

Problem Statement

Given a text string and a pattern string, find all occurrences of the pattern in the text using the Rabin-Karp algorithm. This algorithm uses hashing to find pattern matches efficiently.

Example:

  • Input: text = "AABAACAADAABAAABAA", pattern = "AABA"
  • Output: [0, 9, 13] (pattern found at indices 0, 9, and 13)

Algorithm Overview

The Rabin-Karp algorithm uses a rolling hash function to compare the hash of the pattern with the hash of substrings of the text. If hashes match, we verify character by character to avoid false positives (hash collisions).

Key Concepts

  1. Hash Function: Convert string to a numeric value
  2. Rolling Hash: Efficiently compute hash of next window using previous hash
  3. Modular Arithmetic: Prevent integer overflow

Approach 1: Basic Implementation

Algorithm

  1. Calculate hash of pattern
  2. Calculate hash of first window in text
  3. Slide the window, updating hash using rolling hash formula
  4. When hashes match, verify character by character
java
import java.util.*;

public class Solution {
    private static final int d = 256;  // Number of characters
    private static final int q = 101;  // Prime number
    
    public List<Integer> rabinKarp(String text, String pattern) {
        List<Integer> result = new ArrayList<>();
        int n = text.length();
        int m = pattern.length();
        
        if (m > n) return result;
        
        int patternHash = 0;
        int textHash = 0;
        int h = 1;
        
        // Calculate h = pow(d, m-1) % q
        for (int i = 0; i < m - 1; i++) {
            h = (h * d) % q;
        }
        
        // Calculate initial hash values
        for (int i = 0; i < m; i++) {
            patternHash = (d * patternHash + pattern.charAt(i)) % q;
            textHash = (d * textHash + text.charAt(i)) % q;
        }
        
        // Slide pattern over text
        for (int i = 0; i <= n - m; i++) {
            if (patternHash == textHash) {
                // Verify character by character
                boolean match = true;
                for (int j = 0; j < m; j++) {
                    if (text.charAt(i + j) != pattern.charAt(j)) {
                        match = false;
                        break;
                    }
                }
                if (match) {
                    result.add(i);
                }
            }
            
            // Calculate hash for next window
            if (i < n - m) {
                textHash = (d * (textHash - text.charAt(i) * h) + text.charAt(i + m)) % q;
                if (textHash < 0) {
                    textHash += q;
                }
            }
        }
        
        return result;
    }
}

Complexity Analysis

  • Time Complexity: O(n + m) average, O(n × m) worst case (many hash collisions)
  • Space Complexity: O(1) - excluding result storage

Rolling Hash Explanation

The hash of a string s[0..m-1] is computed as:

hash(s) = (s[0] * d^(m-1) + s[1] * d^(m-2) + ... + s[m-1] * d^0) % q

To compute hash of next window s[1..m] from s[0..m-1]:

hash(s[1..m]) = (d * (hash(s[0..m-1]) - s[0] * d^(m-1)) + s[m]) % q

This allows O(1) hash computation for each window after the first.


Approach 2: Multiple Pattern Matching

Intuition

Rabin-Karp excels when searching for multiple patterns simultaneously.


Approach 3: Using Polynomial Hash with Double Hashing

Intuition

Use two different hash functions to reduce collision probability.


Key Takeaways

  1. Rolling Hash enables O(1) hash computation per window
  2. Prime modulus helps distribute hash values uniformly
  3. Spurious hits (false positives) require character-by-character verification
  4. Multiple pattern matching is where Rabin-Karp shines
  5. Double hashing reduces collision probability significantly
  6. Average case O(n + m), worst case O(n × m) with many collisions
  7. More practical than KMP for plagiarism detection and DNA sequence matching
  8. Choice of base d and prime q affects collision rate

Comparison with Other Algorithms

| Algorithm | Preprocessing | Search | Use Case | |-----------|--------------|--------|----------| | Naive | O(1) | O(n × m) | Small texts | | Rabin-Karp | O(m) | O(n + m) avg | Multiple patterns | | KMP | O(m) | O(n) | Single pattern | | Boyer-Moore | O(m + σ) | O(n/m) best | Large alphabets |