DSA 450 Notes

Problem Statement

Given a text string and a pattern string, find all occurrences of the pattern in the text using the Boyer-Moore algorithm. This algorithm is one of the most efficient string matching algorithms for practical use cases.

Example:

Input: text = "ABAAABCDABC", pattern = "ABC"
Output: Pattern found at indices [4, 8]
Explanation: "ABC" appears at positions 4 and 8 in the text

Approach 1: Brute Force (Naive Pattern Matching)

Intuition

Slide the pattern over the text one character at a time and check for a match at each position.

Algorithm

For each position i from 0 to n-m in text
Check if pattern matches starting at position i
If all characters match, record the position
Move to next position

java

import java.util.*;

public class Solution {
    public List<Integer> naiveSearch(String text, String pattern) {
        List<Integer> result = new ArrayList<>();
        int n = text.length();
        int m = pattern.length();
        
        for (int i = 0; i <= n - m; i++) {
            int j;
            for (j = 0; j < m; j++) {
                if (text.charAt(i + j) != pattern.charAt(j)) {
                    break;
                }
            }
            if (j == m) {
                result.add(i);
            }
        }
        
        return result;
    }
}

Complexity Analysis

Time Complexity: O(n * m) - For each position, we may compare up to m characters
Space Complexity: O(1) - No extra space needed

Approach 2: Boyer-Moore Algorithm (Optimal)

Intuition

Boyer-Moore works by comparing the pattern from right to left and using two heuristics to skip alignments:

Bad Character Rule: When a mismatch occurs, shift the pattern to align the bad character in text with its rightmost occurrence in the pattern
Good Suffix Rule: When a mismatch occurs, use information about the matched suffix to determine the shift

Algorithm

Preprocess pattern for bad character heuristic
Preprocess pattern for good suffix heuristic
Align pattern at the beginning of text
Compare from right to left
On mismatch, shift by max of both heuristics
Repeat until pattern goes past text end

java

import java.util.*;

public class BoyerMoore {
    private static final int ALPHABET_SIZE = 256;
    
    private int[] preprocessBadCharacter(String pattern) {
        int m = pattern.length();
        int[] badChar = new int[ALPHABET_SIZE];
        
        Arrays.fill(badChar, -1);
        
        for (int i = 0; i < m; i++) {
            badChar[pattern.charAt(i)] = i;
        }
        
        return badChar;
    }
    
    private int[] preprocessGoodSuffix(String pattern) {
        int m = pattern.length();
        int[] shift = new int[m + 1];
        int[] borderPos = new int[m + 1];
        
        int i = m, j = m + 1;
        borderPos[i] = j;
        
        while (i > 0) {
            while (j <= m && pattern.charAt(i - 1) != pattern.charAt(j - 1)) {
                if (shift[j] == 0) {
                    shift[j] = j - i;
                }
                j = borderPos[j];
            }
            i--;
            j--;
            borderPos[i] = j;
        }
        
        // Case 2: Prefix matches suffix
        j = borderPos[0];
        for (i = 0; i <= m; i++) {
            if (shift[i] == 0) {
                shift[i] = j;
            }
            if (i == j) {
                j = borderPos[j];
            }
        }
        
        return shift;
    }
    
    public List<Integer> search(String text, String pattern) {
        List<Integer> result = new ArrayList<>();
        int n = text.length();
        int m = pattern.length();
        
        if (m == 0 || n < m) return result;
        
        int[] badChar = preprocessBadCharacter(pattern);
        int[] shift = preprocessGoodSuffix(pattern);
        
        int s = 0;
        
        while (s <= n - m) {
            int j = m - 1;
            
            while (j >= 0 && pattern.charAt(j) == text.charAt(s + j)) {
                j--;
            }
            
            if (j < 0) {
                result.add(s);
                s += shift[0];
            } else {
                int badCharShift = j - badChar[text.charAt(s + j)];
                int goodSuffixShift = shift[j + 1];
                s += Math.max(badCharShift, goodSuffixShift);
            }
        }
        
        return result;
    }
}

Complexity Analysis

Time Complexity:
- Average case: O(n/m) - Sublinear due to skipping
- Worst case: O(n * m) - When pattern has many repeated characters
Space Complexity: O(m + σ) - Where σ is alphabet size (256 for ASCII)

Boyer-Moore: Why It Works

Bad Character Rule

When we find a mismatch at position j in pattern:

If the mismatched character doesn't exist in pattern, we can shift past it entirely
If it exists, we align it with its rightmost occurrence

Text:    ...XABCDEF...
Pattern:    ABXABC
                ^mismatch at 'D' vs 'B'
                
'D' not in pattern, shift entire pattern past 'D'

Good Suffix Rule

When a suffix of the pattern has matched but there's a mismatch:

Look for another occurrence of the matched suffix in the pattern
Align that occurrence with the matched text

Text:    ...CABAB...
Pattern:   BABAB
              ^^matched "AB"
              
Shift to align another "AB" in pattern

Key Takeaways

Boyer-Moore is often the fastest string matching algorithm in practice
Right-to-left comparison allows larger shifts on mismatch
Two heuristics complement each other: Bad character for random text, good suffix for repetitive patterns
Preprocessing is O(m + σ) but pays off for long texts
Used in text editors, search tools (grep), and databases
For single pattern search in large texts, Boyer-Moore often outperforms KMP

Boyer Moore Algorithm for Pattern Searching

Problem Statement

Approach 1: Brute Force (Naive Pattern Matching)

Intuition

Algorithm

Complexity Analysis

Approach 2: Boyer-Moore Algorithm (Optimal)

Intuition

Algorithm

Complexity Analysis

Boyer-Moore: Why It Works

Bad Character Rule

Good Suffix Rule

Key Takeaways