GreedyProblem 3 of 35
Huffman Coding
Problem Statement
Given a set of characters and their frequencies, build a Huffman Tree and generate prefix-free binary codes for each character such that the total number of bits used is minimized.
Huffman coding is a lossless data compression algorithm that assigns variable-length codes to characters based on their frequencies - more frequent characters get shorter codes.
Constraints:
- 1 ≤ n ≤ 10^5
- 1 ≤ frequency[i] ≤ 10^6
Example:
- Input:
chars = ['a', 'b', 'c', 'd', 'e', 'f'],freq = [5, 9, 12, 13, 16, 45] - Output: Codes like
f=0, c=100, d=101, a=1100, b=1101, e=111 - Explanation: Total bits = 5×4 + 9×4 + 12×3 + 13×3 + 16×3 + 45×1 = 224 bits
Approach 1: Brute Force (Linear Search for Minimum)
Intuition
Instead of using a heap, use a list and find the two minimum elements by linear search each time.
Algorithm
- Create leaf nodes for all characters
- Repeatedly find two nodes with minimum frequency using linear search
- Create a new internal node with these two as children
- Continue until only one node remains (root)
java
import java.util.*;
class Node {
char ch;
int freq;
Node left, right;
Node(char ch, int freq) {
this.ch = ch;
this.freq = freq;
this.left = null;
this.right = null;
}
}
public class HuffmanBrute {
private void generateCodes(Node root, String code, Map<Character, String> codes) {
if (root == null) return;
if (root.left == null && root.right == null) {
codes.put(root.ch, code.isEmpty() ? "0" : code);
return;
}
generateCodes(root.left, code + "0", codes);
generateCodes(root.right, code + "1", codes);
}
public Map<Character, String> buildHuffman(char[] chars, int[] freq) {
int n = chars.length;
List<Node> nodes = new ArrayList<>();
for (int i = 0; i < n; i++) {
nodes.add(new Node(chars[i], freq[i]));
}
while (nodes.size() > 1) {
// Sort to get minimum two
Collections.sort(nodes, (a, b) -> a.freq - b.freq);
Node left = nodes.remove(0);
Node right = nodes.remove(0);
Node newNode = new Node('\0', left.freq + right.freq);
newNode.left = left;
newNode.right = right;
nodes.add(newNode);
}
Map<Character, String> codes = new HashMap<>();
generateCodes(nodes.get(0), "", codes);
return codes;
}
}Complexity Analysis
- Time Complexity: O(n^2) - Finding minimum takes O(n), done n times
- Space Complexity: O(n) - For storing nodes
Approach 2: Using Min Heap (Optimal)
Intuition
Use a min-heap (priority queue) to efficiently get the two minimum frequency nodes in O(log n) time.
Algorithm
- Create a min-heap and insert all characters with frequencies
- Extract two minimum nodes
- Create an internal node with combined frequency
- Insert the new node back into heap
- Repeat until one node remains
- Traverse tree to generate codes
java
import java.util.*;
class HuffmanNode implements Comparable<HuffmanNode> {
char ch;
int freq;
HuffmanNode left, right;
HuffmanNode(char ch, int freq) {
this.ch = ch;
this.freq = freq;
}
@Override
public int compareTo(HuffmanNode other) {
return this.freq - other.freq;
}
}
public class HuffmanCoding {
private HuffmanNode root;
public HuffmanNode buildTree(char[] chars, int[] freq) {
PriorityQueue<HuffmanNode> minHeap = new PriorityQueue<>();
// Create leaf nodes
for (int i = 0; i < chars.length; i++) {
minHeap.offer(new HuffmanNode(chars[i], freq[i]));
}
// Build Huffman tree
while (minHeap.size() > 1) {
HuffmanNode left = minHeap.poll();
HuffmanNode right = minHeap.poll();
// Create internal node
HuffmanNode internal = new HuffmanNode('\0', left.freq + right.freq);
internal.left = left;
internal.right = right;
minHeap.offer(internal);
}
root = minHeap.poll();
return root;
}
private void generateCodes(HuffmanNode node, String code,
Map<Character, String> codes) {
if (node == null) return;
// Leaf node
if (node.left == null && node.right == null) {
codes.put(node.ch, code.isEmpty() ? "0" : code);
return;
}
generateCodes(node.left, code + "0", codes);
generateCodes(node.right, code + "1", codes);
}
public Map<Character, String> getHuffmanCodes(char[] chars, int[] freq) {
buildTree(chars, freq);
Map<Character, String> codes = new HashMap<>();
generateCodes(root, "", codes);
return codes;
}
public String encode(String text, Map<Character, String> codes) {
StringBuilder sb = new StringBuilder();
for (char c : text.toCharArray()) {
sb.append(codes.get(c));
}
return sb.toString();
}
public String decode(String encoded) {
StringBuilder decoded = new StringBuilder();
HuffmanNode curr = root;
for (char bit : encoded.toCharArray()) {
curr = (bit == '0') ? curr.left : curr.right;
// Leaf node
if (curr.left == null && curr.right == null) {
decoded.append(curr.ch);
curr = root;
}
}
return decoded.toString();
}
public int totalBits(char[] chars, int[] freq, Map<Character, String> codes) {
int total = 0;
for (int i = 0; i < chars.length; i++) {
total += freq[i] * codes.get(chars[i]).length();
}
return total;
}
public static void main(String[] args) {
HuffmanCoding hc = new HuffmanCoding();
char[] chars = {'a', 'b', 'c', 'd', 'e', 'f'};
int[] freq = {5, 9, 12, 13, 16, 45};
Map<Character, String> codes = hc.getHuffmanCodes(chars, freq);
System.out.println("Huffman Codes:");
for (char c : chars) {
System.out.println(c + ": " + codes.get(c));
}
System.out.println("\nTotal bits: " + hc.totalBits(chars, freq, codes));
}
}Dry Run Example
chars = ['a', 'b', 'c', 'd', 'e', 'f']
freq = [ 5, 9, 12, 13, 16, 45]
Initial heap: [(a,5), (b,9), (c,12), (d,13), (e,16), (f,45)]
Step 1: Extract (a,5) and (b,9)
Create internal node (ab,14)
Heap: [(c,12), (d,13), (ab,14), (e,16), (f,45)]
Step 2: Extract (c,12) and (d,13)
Create internal node (cd,25)
Heap: [(ab,14), (e,16), (cd,25), (f,45)]
Step 3: Extract (ab,14) and (e,16)
Create internal node (abe,30)
Heap: [(cd,25), (abe,30), (f,45)]
Step 4: Extract (cd,25) and (abe,30)
Create internal node (cdabe,55)
Heap: [(f,45), (cdabe,55)]
Step 5: Extract (f,45) and (cdabe,55)
Create root node (100)
Heap: [(root,100)]
Final Tree:
(100)
/ \
(f,45) (55)
/ \
(25) (30)
/ \ / \
(c,12)(d,13)(14)(e,16)
/ \
(a,5)(b,9)
Codes:
f = 0
c = 100
d = 101
a = 1100
b = 1101
e = 111
Total bits = 45×1 + 12×3 + 13×3 + 5×4 + 9×4 + 16×3
= 45 + 36 + 39 + 20 + 36 + 48
= 224 bits
Complexity Analysis
- Time Complexity: O(n log n)
- Building initial heap: O(n)
- n-1 iterations, each with 2 extract and 1 insert: O(log n) each
- Total: O(n log n)
- Space Complexity: O(n) - For heap and tree nodes
Properties of Huffman Coding
- Prefix-Free: No code is a prefix of another code
- Optimal: Produces minimum weighted path length for given frequencies
- Greedy: Always combines two smallest frequencies
- Variable Length: More frequent characters get shorter codes
Key Takeaways
- Greedy Choice: Always combine two minimum frequency nodes
- Min Heap: Essential for efficient O(log n) extraction
- Prefix-Free Codes: Tree structure guarantees no code is prefix of another
- Optimality: Proven optimal for symbol-by-symbol encoding
- Applications: File compression (ZIP, GZIP), JPEG, MP3
- Encoding: Traverse tree, 0 for left, 1 for right
- Decoding: Follow bits in tree until leaf node
- Edge Case: Single character needs at least "0" as code