GreedyProblem 3 of 35

Huffman Coding

Brute Force
Time: O(n^2)
Space: O(n)
Optimal
Time: O(n log n)
Space: O(n)

Problem Statement

Given a set of characters and their frequencies, build a Huffman Tree and generate prefix-free binary codes for each character such that the total number of bits used is minimized.

Huffman coding is a lossless data compression algorithm that assigns variable-length codes to characters based on their frequencies - more frequent characters get shorter codes.

Constraints:

  • 1 ≤ n ≤ 10^5
  • 1 ≤ frequency[i] ≤ 10^6

Example:

  • Input: chars = ['a', 'b', 'c', 'd', 'e', 'f'], freq = [5, 9, 12, 13, 16, 45]
  • Output: Codes like f=0, c=100, d=101, a=1100, b=1101, e=111
  • Explanation: Total bits = 5×4 + 9×4 + 12×3 + 13×3 + 16×3 + 45×1 = 224 bits

Approach 1: Brute Force (Linear Search for Minimum)

Intuition

Instead of using a heap, use a list and find the two minimum elements by linear search each time.

Algorithm

  1. Create leaf nodes for all characters
  2. Repeatedly find two nodes with minimum frequency using linear search
  3. Create a new internal node with these two as children
  4. Continue until only one node remains (root)
java
import java.util.*;

class Node {
    char ch;
    int freq;
    Node left, right;
    
    Node(char ch, int freq) {
        this.ch = ch;
        this.freq = freq;
        this.left = null;
        this.right = null;
    }
}

public class HuffmanBrute {
    private void generateCodes(Node root, String code, Map<Character, String> codes) {
        if (root == null) return;
        
        if (root.left == null && root.right == null) {
            codes.put(root.ch, code.isEmpty() ? "0" : code);
            return;
        }
        
        generateCodes(root.left, code + "0", codes);
        generateCodes(root.right, code + "1", codes);
    }
    
    public Map<Character, String> buildHuffman(char[] chars, int[] freq) {
        int n = chars.length;
        
        List<Node> nodes = new ArrayList<>();
        for (int i = 0; i < n; i++) {
            nodes.add(new Node(chars[i], freq[i]));
        }
        
        while (nodes.size() > 1) {
            // Sort to get minimum two
            Collections.sort(nodes, (a, b) -> a.freq - b.freq);
            
            Node left = nodes.remove(0);
            Node right = nodes.remove(0);
            
            Node newNode = new Node('\0', left.freq + right.freq);
            newNode.left = left;
            newNode.right = right;
            
            nodes.add(newNode);
        }
        
        Map<Character, String> codes = new HashMap<>();
        generateCodes(nodes.get(0), "", codes);
        return codes;
    }
}

Complexity Analysis

  • Time Complexity: O(n^2) - Finding minimum takes O(n), done n times
  • Space Complexity: O(n) - For storing nodes

Approach 2: Using Min Heap (Optimal)

Intuition

Use a min-heap (priority queue) to efficiently get the two minimum frequency nodes in O(log n) time.

Algorithm

  1. Create a min-heap and insert all characters with frequencies
  2. Extract two minimum nodes
  3. Create an internal node with combined frequency
  4. Insert the new node back into heap
  5. Repeat until one node remains
  6. Traverse tree to generate codes
java
import java.util.*;

class HuffmanNode implements Comparable<HuffmanNode> {
    char ch;
    int freq;
    HuffmanNode left, right;
    
    HuffmanNode(char ch, int freq) {
        this.ch = ch;
        this.freq = freq;
    }
    
    @Override
    public int compareTo(HuffmanNode other) {
        return this.freq - other.freq;
    }
}

public class HuffmanCoding {
    private HuffmanNode root;
    
    public HuffmanNode buildTree(char[] chars, int[] freq) {
        PriorityQueue<HuffmanNode> minHeap = new PriorityQueue<>();
        
        // Create leaf nodes
        for (int i = 0; i < chars.length; i++) {
            minHeap.offer(new HuffmanNode(chars[i], freq[i]));
        }
        
        // Build Huffman tree
        while (minHeap.size() > 1) {
            HuffmanNode left = minHeap.poll();
            HuffmanNode right = minHeap.poll();
            
            // Create internal node
            HuffmanNode internal = new HuffmanNode('\0', left.freq + right.freq);
            internal.left = left;
            internal.right = right;
            
            minHeap.offer(internal);
        }
        
        root = minHeap.poll();
        return root;
    }
    
    private void generateCodes(HuffmanNode node, String code, 
                               Map<Character, String> codes) {
        if (node == null) return;
        
        // Leaf node
        if (node.left == null && node.right == null) {
            codes.put(node.ch, code.isEmpty() ? "0" : code);
            return;
        }
        
        generateCodes(node.left, code + "0", codes);
        generateCodes(node.right, code + "1", codes);
    }
    
    public Map<Character, String> getHuffmanCodes(char[] chars, int[] freq) {
        buildTree(chars, freq);
        Map<Character, String> codes = new HashMap<>();
        generateCodes(root, "", codes);
        return codes;
    }
    
    public String encode(String text, Map<Character, String> codes) {
        StringBuilder sb = new StringBuilder();
        for (char c : text.toCharArray()) {
            sb.append(codes.get(c));
        }
        return sb.toString();
    }
    
    public String decode(String encoded) {
        StringBuilder decoded = new StringBuilder();
        HuffmanNode curr = root;
        
        for (char bit : encoded.toCharArray()) {
            curr = (bit == '0') ? curr.left : curr.right;
            
            // Leaf node
            if (curr.left == null && curr.right == null) {
                decoded.append(curr.ch);
                curr = root;
            }
        }
        
        return decoded.toString();
    }
    
    public int totalBits(char[] chars, int[] freq, Map<Character, String> codes) {
        int total = 0;
        for (int i = 0; i < chars.length; i++) {
            total += freq[i] * codes.get(chars[i]).length();
        }
        return total;
    }
    
    public static void main(String[] args) {
        HuffmanCoding hc = new HuffmanCoding();
        char[] chars = {'a', 'b', 'c', 'd', 'e', 'f'};
        int[] freq = {5, 9, 12, 13, 16, 45};
        
        Map<Character, String> codes = hc.getHuffmanCodes(chars, freq);
        
        System.out.println("Huffman Codes:");
        for (char c : chars) {
            System.out.println(c + ": " + codes.get(c));
        }
        
        System.out.println("\nTotal bits: " + hc.totalBits(chars, freq, codes));
    }
}

Dry Run Example

chars = ['a', 'b', 'c', 'd', 'e', 'f'] freq = [ 5, 9, 12, 13, 16, 45] Initial heap: [(a,5), (b,9), (c,12), (d,13), (e,16), (f,45)] Step 1: Extract (a,5) and (b,9) Create internal node (ab,14) Heap: [(c,12), (d,13), (ab,14), (e,16), (f,45)] Step 2: Extract (c,12) and (d,13) Create internal node (cd,25) Heap: [(ab,14), (e,16), (cd,25), (f,45)] Step 3: Extract (ab,14) and (e,16) Create internal node (abe,30) Heap: [(cd,25), (abe,30), (f,45)] Step 4: Extract (cd,25) and (abe,30) Create internal node (cdabe,55) Heap: [(f,45), (cdabe,55)] Step 5: Extract (f,45) and (cdabe,55) Create root node (100) Heap: [(root,100)] Final Tree: (100) / \ (f,45) (55) / \ (25) (30) / \ / \ (c,12)(d,13)(14)(e,16) / \ (a,5)(b,9) Codes: f = 0 c = 100 d = 101 a = 1100 b = 1101 e = 111 Total bits = 45×1 + 12×3 + 13×3 + 5×4 + 9×4 + 16×3 = 45 + 36 + 39 + 20 + 36 + 48 = 224 bits

Complexity Analysis

  • Time Complexity: O(n log n)
    • Building initial heap: O(n)
    • n-1 iterations, each with 2 extract and 1 insert: O(log n) each
    • Total: O(n log n)
  • Space Complexity: O(n) - For heap and tree nodes

Properties of Huffman Coding

  1. Prefix-Free: No code is a prefix of another code
  2. Optimal: Produces minimum weighted path length for given frequencies
  3. Greedy: Always combines two smallest frequencies
  4. Variable Length: More frequent characters get shorter codes

Key Takeaways

  1. Greedy Choice: Always combine two minimum frequency nodes
  2. Min Heap: Essential for efficient O(log n) extraction
  3. Prefix-Free Codes: Tree structure guarantees no code is prefix of another
  4. Optimality: Proven optimal for symbol-by-symbol encoding
  5. Applications: File compression (ZIP, GZIP), JPEG, MP3
  6. Encoding: Traverse tree, 0 for left, 1 for right
  7. Decoding: Follow bits in tree until leaf node
  8. Edge Case: Single character needs at least "0" as code