Stacks & QueuesProblem 36 of 38

Minimum sum of squares of character counts in a given string after removing k characters

Brute Force
Time: O(n * k)
Space: O(n)
Optimal
Time: O(n log n)
Space: O(n)

Problem Statement

Given a string of lowercase characters, find the minimum sum of squares of character counts after removing k characters from the string. The goal is to remove characters strategically to minimize the final sum of squares.

Example:

Input: str = "abccc", k = 1 Character counts: a=1, b=1, c=3 Sum of squares = 1² + 1² + 3² = 1 + 1 + 9 = 11 After removing one 'c': a=1, b=1, c=2 Sum of squares = 1² + 1² + 2² = 1 + 1 + 4 = 6 Output: 6

Approach 1: Brute Force (Greedy with Linear Search)

Intuition

To minimize the sum of squares, we should remove characters with the highest frequency. Each removal reduces the square term significantly. Removing one occurrence from frequency f reduces sum by f² - (f-1)² = 2f - 1.

Algorithm

  1. Count frequency of each character
  2. Repeat k times:
    • Find the character with maximum frequency
    • Decrement its count by 1
  3. Calculate sum of squares of final frequencies
java
public class MinSumSquares {
    
    public static int minSumSquaresBruteForce(String str, int k) {
        // Count frequencies
        int[] freq = new int[26];
        for (char c : str.toCharArray()) {
            freq[c - 'a']++;
        }
        
        // Remove k characters
        for (int i = 0; i < k; i++) {
            // Find max frequency
            int maxIdx = 0;
            for (int j = 1; j < 26; j++) {
                if (freq[j] > freq[maxIdx]) {
                    maxIdx = j;
                }
            }
            
            if (freq[maxIdx] > 0) {
                freq[maxIdx]--;
            }
        }
        
        // Calculate sum of squares
        int sum = 0;
        for (int f : freq) {
            sum += f * f;
        }
        
        return sum;
    }
    
    public static void main(String[] args) {
        System.out.println("Minimum sum of squares: " + minSumSquaresBruteForce("abccc", 1));  // 6
        System.out.println("Minimum sum of squares: " + minSumSquaresBruteForce("aaab", 2));   // 2
    }
}

Complexity Analysis

  • Time Complexity: O(n + k * 26) = O(n * k) in worst case
  • Space Complexity: O(26) = O(1) for frequency array

Approach 2: Optimal (Using Max Heap / Priority Queue)

Intuition

Instead of linear search for maximum each time, use a max heap. This reduces the time to find and update the maximum frequency from O(26) to O(log 26).

Algorithm

  1. Count frequencies
  2. Add all non-zero frequencies to max heap
  3. Repeat k times:
    • Pop maximum
    • Decrement and push back if still positive
  4. Calculate sum of squares from heap
java
import java.util.*;

public class MinSumSquares {
    
    public static int minSumSquares(String str, int k) {
        // Count frequencies
        int[] freq = new int[26];
        for (char c : str.toCharArray()) {
            freq[c - 'a']++;
        }
        
        // Add to max heap
        PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Collections.reverseOrder());
        for (int f : freq) {
            if (f > 0) {
                maxHeap.offer(f);
            }
        }
        
        // Remove k characters
        for (int i = 0; i < k && !maxHeap.isEmpty(); i++) {
            int maxFreq = maxHeap.poll();
            maxFreq--;
            
            if (maxFreq > 0) {
                maxHeap.offer(maxFreq);
            }
        }
        
        // Calculate sum of squares
        int sum = 0;
        while (!maxHeap.isEmpty()) {
            int f = maxHeap.poll();
            sum += f * f;
        }
        
        return sum;
    }
    
    public static void main(String[] args) {
        System.out.println("abccc, k=1: " + minSumSquares("abccc", 1));  // 6
        System.out.println("aaab, k=2: " + minSumSquares("aaab", 2));    // 2
        System.out.println("aaaaabbbbcc, k=3: " + minSumSquares("aaaaabbbbcc", 3));  // 22
    }
}

Complexity Analysis

  • Time Complexity: O(n + k log m) where m is number of distinct characters (at most 26)
  • Space Complexity: O(m) for heap

Approach 3: Optimized with Frequency Count

Intuition

Since frequencies are bounded, we can use a frequency of frequencies approach. Count how many characters have each frequency. Then greedily reduce from the highest frequency.

Algorithm

  1. Count character frequencies
  2. Count frequency of frequencies
  3. Process from highest frequency down:
    • Remove as many as possible from current frequency
    • Move those counts to frequency-1
java
public class MinSumSquares {
    
    public static int minSumSquaresOptimized(String str, int k) {
        int n = str.length();
        
        // Count character frequencies
        int[] charFreq = new int[26];
        for (char c : str.toCharArray()) {
            charFreq[c - 'a']++;
        }
        
        // Count frequency of frequencies
        int[] freqCount = new int[n + 1];
        for (int f : charFreq) {
            if (f > 0) {
                freqCount[f]++;
            }
        }
        
        // Remove k characters starting from highest frequency
        for (int f = n; f >= 1 && k > 0; f--) {
            if (freqCount[f] == 0) continue;
            
            int remove = Math.min(k, freqCount[f]);
            k -= remove;
            
            freqCount[f] -= remove;
            freqCount[f - 1] += remove;
        }
        
        // Calculate sum of squares
        int sum = 0;
        for (int f = 1; f <= n; f++) {
            sum += freqCount[f] * f * f;
        }
        
        return sum;
    }
    
    public static void main(String[] args) {
        System.out.println("abccc, k=1: " + minSumSquaresOptimized("abccc", 1));  // 6
        System.out.println("aaab, k=2: " + minSumSquaresOptimized("aaab", 2));    // 2
    }
}

Complexity Analysis

  • Time Complexity: O(n) for counting + O(n) for processing = O(n)
  • Space Complexity: O(n) for frequency count array

Key Takeaways

  1. Greedy Strategy: Always remove from highest frequency to minimize squares
  2. Math Insight: Removing from freq f reduces sum by 2f-1, so higher f gives bigger reduction
  3. Priority Queue: Efficient for repeated max finding
  4. Frequency of Frequencies: Advanced optimization technique
  5. Bounded Alphabet: 26 letters allows O(1) instead of O(log n) approaches
  6. Similar pattern in: Reorganize String, Task Scheduler problems