DSA 450 Notes

Problem Statement

Given a text string and a pattern string, find all occurrences of the pattern in the text using the Rabin-Karp algorithm. This algorithm uses hashing to find pattern matches efficiently.

Example:

Input: text = "AABAACAADAABAAABAA", pattern = "AABA"
Output: [0, 9, 13] (pattern found at indices 0, 9, and 13)

Algorithm Overview

The Rabin-Karp algorithm uses a rolling hash function to compare the hash of the pattern with the hash of substrings of the text. If hashes match, we verify character by character to avoid false positives (hash collisions).

Key Concepts

Hash Function: Convert string to a numeric value
Rolling Hash: Efficiently compute hash of next window using previous hash
Modular Arithmetic: Prevent integer overflow

Approach 1: Basic Implementation

Algorithm

Calculate hash of pattern
Calculate hash of first window in text
Slide the window, updating hash using rolling hash formula
When hashes match, verify character by character

java

import java.util.*;

public class Solution {
    private static final int d = 256;  // Number of characters
    private static final int q = 101;  // Prime number
    
    public List<Integer> rabinKarp(String text, String pattern) {
        List<Integer> result = new ArrayList<>();
        int n = text.length();
        int m = pattern.length();
        
        if (m > n) return result;
        
        int patternHash = 0;
        int textHash = 0;
        int h = 1;
        
        // Calculate h = pow(d, m-1) % q
        for (int i = 0; i < m - 1; i++) {
            h = (h * d) % q;
        }
        
        // Calculate initial hash values
        for (int i = 0; i < m; i++) {
            patternHash = (d * patternHash + pattern.charAt(i)) % q;
            textHash = (d * textHash + text.charAt(i)) % q;
        }
        
        // Slide pattern over text
        for (int i = 0; i <= n - m; i++) {
            if (patternHash == textHash) {
                // Verify character by character
                boolean match = true;
                for (int j = 0; j < m; j++) {
                    if (text.charAt(i + j) != pattern.charAt(j)) {
                        match = false;
                        break;
                    }
                }
                if (match) {
                    result.add(i);
                }
            }
            
            // Calculate hash for next window
            if (i < n - m) {
                textHash = (d * (textHash - text.charAt(i) * h) + text.charAt(i + m)) % q;
                if (textHash < 0) {
                    textHash += q;
                }
            }
        }
        
        return result;
    }
}

Complexity Analysis

Time Complexity: O(n + m) average, O(n × m) worst case (many hash collisions)
Space Complexity: O(1) - excluding result storage

Rolling Hash Explanation

The hash of a string s[0..m-1] is computed as:

hash(s) = (s[0] * d^(m-1) + s[1] * d^(m-2) + ... + s[m-1] * d^0) % q

To compute hash of next window s[1..m] from s[0..m-1]:

hash(s[1..m]) = (d * (hash(s[0..m-1]) - s[0] * d^(m-1)) + s[m]) % q

This allows O(1) hash computation for each window after the first.

Approach 2: Multiple Pattern Matching

Intuition

Rabin-Karp excels when searching for multiple patterns simultaneously.

Approach 3: Using Polynomial Hash with Double Hashing

Intuition

Use two different hash functions to reduce collision probability.

Key Takeaways

Rolling Hash enables O(1) hash computation per window
Prime modulus helps distribute hash values uniformly
Spurious hits (false positives) require character-by-character verification
Multiple pattern matching is where Rabin-Karp shines
Double hashing reduces collision probability significantly
Average case O(n + m), worst case O(n × m) with many collisions
More practical than KMP for plagiarism detection and DNA sequence matching
Choice of base d and prime q affects collision rate

Comparison with Other Algorithms

| Algorithm | Preprocessing | Search | Use Case | |-----------|--------------|--------|----------| | Naive | O(1) | O(n × m) | Small texts | | Rabin-Karp | O(m) | O(n + m) avg | Multiple patterns | | KMP | O(m) | O(n) | Single pattern | | Boyer-Moore | O(m + σ) | O(n/m) best | Large alphabets |

Rabin Karp Algo

Problem Statement

Algorithm Overview

Key Concepts

Approach 1: Basic Implementation

Algorithm

Complexity Analysis

Rolling Hash Explanation

Approach 2: Multiple Pattern Matching

Intuition

Approach 3: Using Polynomial Hash with Double Hashing

Intuition

Key Takeaways

Comparison with Other Algorithms