Rabin Karp Algo
Problem Statement
Given a text string and a pattern string, find all occurrences of the pattern in the text using the Rabin-Karp algorithm. This algorithm uses hashing to find pattern matches efficiently.
Example:
- Input:
text = "AABAACAADAABAAABAA",pattern = "AABA" - Output:
[0, 9, 13](pattern found at indices 0, 9, and 13)
Algorithm Overview
The Rabin-Karp algorithm uses a rolling hash function to compare the hash of the pattern with the hash of substrings of the text. If hashes match, we verify character by character to avoid false positives (hash collisions).
Key Concepts
- Hash Function: Convert string to a numeric value
- Rolling Hash: Efficiently compute hash of next window using previous hash
- Modular Arithmetic: Prevent integer overflow
Approach 1: Basic Implementation
Algorithm
- Calculate hash of pattern
- Calculate hash of first window in text
- Slide the window, updating hash using rolling hash formula
- When hashes match, verify character by character
import java.util.*;
public class Solution {
private static final int d = 256; // Number of characters
private static final int q = 101; // Prime number
public List<Integer> rabinKarp(String text, String pattern) {
List<Integer> result = new ArrayList<>();
int n = text.length();
int m = pattern.length();
if (m > n) return result;
int patternHash = 0;
int textHash = 0;
int h = 1;
// Calculate h = pow(d, m-1) % q
for (int i = 0; i < m - 1; i++) {
h = (h * d) % q;
}
// Calculate initial hash values
for (int i = 0; i < m; i++) {
patternHash = (d * patternHash + pattern.charAt(i)) % q;
textHash = (d * textHash + text.charAt(i)) % q;
}
// Slide pattern over text
for (int i = 0; i <= n - m; i++) {
if (patternHash == textHash) {
// Verify character by character
boolean match = true;
for (int j = 0; j < m; j++) {
if (text.charAt(i + j) != pattern.charAt(j)) {
match = false;
break;
}
}
if (match) {
result.add(i);
}
}
// Calculate hash for next window
if (i < n - m) {
textHash = (d * (textHash - text.charAt(i) * h) + text.charAt(i + m)) % q;
if (textHash < 0) {
textHash += q;
}
}
}
return result;
}
}Complexity Analysis
- Time Complexity: O(n + m) average, O(n × m) worst case (many hash collisions)
- Space Complexity: O(1) - excluding result storage
Rolling Hash Explanation
The hash of a string s[0..m-1] is computed as:
hash(s) = (s[0] * d^(m-1) + s[1] * d^(m-2) + ... + s[m-1] * d^0) % q
To compute hash of next window s[1..m] from s[0..m-1]:
hash(s[1..m]) = (d * (hash(s[0..m-1]) - s[0] * d^(m-1)) + s[m]) % q
This allows O(1) hash computation for each window after the first.
Approach 2: Multiple Pattern Matching
Intuition
Rabin-Karp excels when searching for multiple patterns simultaneously.
Approach 3: Using Polynomial Hash with Double Hashing
Intuition
Use two different hash functions to reduce collision probability.
Key Takeaways
- Rolling Hash enables O(1) hash computation per window
- Prime modulus helps distribute hash values uniformly
- Spurious hits (false positives) require character-by-character verification
- Multiple pattern matching is where Rabin-Karp shines
- Double hashing reduces collision probability significantly
- Average case O(n + m), worst case O(n × m) with many collisions
- More practical than KMP for plagiarism detection and DNA sequence matching
- Choice of base
dand primeqaffects collision rate
Comparison with Other Algorithms
| Algorithm | Preprocessing | Search | Use Case | |-----------|--------------|--------|----------| | Naive | O(1) | O(n × m) | Small texts | | Rabin-Karp | O(m) | O(n + m) avg | Multiple patterns | | KMP | O(m) | O(n) | Single pattern | | Boyer-Moore | O(m + σ) | O(n/m) best | Large alphabets |