Minimum sum of squares of character counts in a given string after removing k characters
Problem Statement
Given a string of lowercase characters, find the minimum sum of squares of character counts after removing k characters from the string. The goal is to remove characters strategically to minimize the final sum of squares.
Example:
Input: str = "abccc", k = 1
Character counts: a=1, b=1, c=3
Sum of squares = 1² + 1² + 3² = 1 + 1 + 9 = 11
After removing one 'c': a=1, b=1, c=2
Sum of squares = 1² + 1² + 2² = 1 + 1 + 4 = 6
Output: 6
Approach 1: Brute Force (Greedy with Linear Search)
Intuition
To minimize the sum of squares, we should remove characters with the highest frequency. Each removal reduces the square term significantly. Removing one occurrence from frequency f reduces sum by f² - (f-1)² = 2f - 1.
Algorithm
- Count frequency of each character
- Repeat k times:
- Find the character with maximum frequency
- Decrement its count by 1
- Calculate sum of squares of final frequencies
public class MinSumSquares {
public static int minSumSquaresBruteForce(String str, int k) {
// Count frequencies
int[] freq = new int[26];
for (char c : str.toCharArray()) {
freq[c - 'a']++;
}
// Remove k characters
for (int i = 0; i < k; i++) {
// Find max frequency
int maxIdx = 0;
for (int j = 1; j < 26; j++) {
if (freq[j] > freq[maxIdx]) {
maxIdx = j;
}
}
if (freq[maxIdx] > 0) {
freq[maxIdx]--;
}
}
// Calculate sum of squares
int sum = 0;
for (int f : freq) {
sum += f * f;
}
return sum;
}
public static void main(String[] args) {
System.out.println("Minimum sum of squares: " + minSumSquaresBruteForce("abccc", 1)); // 6
System.out.println("Minimum sum of squares: " + minSumSquaresBruteForce("aaab", 2)); // 2
}
}Complexity Analysis
- Time Complexity: O(n + k * 26) = O(n * k) in worst case
- Space Complexity: O(26) = O(1) for frequency array
Approach 2: Optimal (Using Max Heap / Priority Queue)
Intuition
Instead of linear search for maximum each time, use a max heap. This reduces the time to find and update the maximum frequency from O(26) to O(log 26).
Algorithm
- Count frequencies
- Add all non-zero frequencies to max heap
- Repeat k times:
- Pop maximum
- Decrement and push back if still positive
- Calculate sum of squares from heap
import java.util.*;
public class MinSumSquares {
public static int minSumSquares(String str, int k) {
// Count frequencies
int[] freq = new int[26];
for (char c : str.toCharArray()) {
freq[c - 'a']++;
}
// Add to max heap
PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Collections.reverseOrder());
for (int f : freq) {
if (f > 0) {
maxHeap.offer(f);
}
}
// Remove k characters
for (int i = 0; i < k && !maxHeap.isEmpty(); i++) {
int maxFreq = maxHeap.poll();
maxFreq--;
if (maxFreq > 0) {
maxHeap.offer(maxFreq);
}
}
// Calculate sum of squares
int sum = 0;
while (!maxHeap.isEmpty()) {
int f = maxHeap.poll();
sum += f * f;
}
return sum;
}
public static void main(String[] args) {
System.out.println("abccc, k=1: " + minSumSquares("abccc", 1)); // 6
System.out.println("aaab, k=2: " + minSumSquares("aaab", 2)); // 2
System.out.println("aaaaabbbbcc, k=3: " + minSumSquares("aaaaabbbbcc", 3)); // 22
}
}Complexity Analysis
- Time Complexity: O(n + k log m) where m is number of distinct characters (at most 26)
- Space Complexity: O(m) for heap
Approach 3: Optimized with Frequency Count
Intuition
Since frequencies are bounded, we can use a frequency of frequencies approach. Count how many characters have each frequency. Then greedily reduce from the highest frequency.
Algorithm
- Count character frequencies
- Count frequency of frequencies
- Process from highest frequency down:
- Remove as many as possible from current frequency
- Move those counts to frequency-1
public class MinSumSquares {
public static int minSumSquaresOptimized(String str, int k) {
int n = str.length();
// Count character frequencies
int[] charFreq = new int[26];
for (char c : str.toCharArray()) {
charFreq[c - 'a']++;
}
// Count frequency of frequencies
int[] freqCount = new int[n + 1];
for (int f : charFreq) {
if (f > 0) {
freqCount[f]++;
}
}
// Remove k characters starting from highest frequency
for (int f = n; f >= 1 && k > 0; f--) {
if (freqCount[f] == 0) continue;
int remove = Math.min(k, freqCount[f]);
k -= remove;
freqCount[f] -= remove;
freqCount[f - 1] += remove;
}
// Calculate sum of squares
int sum = 0;
for (int f = 1; f <= n; f++) {
sum += freqCount[f] * f * f;
}
return sum;
}
public static void main(String[] args) {
System.out.println("abccc, k=1: " + minSumSquaresOptimized("abccc", 1)); // 6
System.out.println("aaab, k=2: " + minSumSquaresOptimized("aaab", 2)); // 2
}
}Complexity Analysis
- Time Complexity: O(n) for counting + O(n) for processing = O(n)
- Space Complexity: O(n) for frequency count array
Key Takeaways
- Greedy Strategy: Always remove from highest frequency to minimize squares
- Math Insight: Removing from freq f reduces sum by 2f-1, so higher f gives bigger reduction
- Priority Queue: Efficient for repeated max finding
- Frequency of Frequencies: Advanced optimization technique
- Bounded Alphabet: 26 letters allows O(1) instead of O(log n) approaches
- Similar pattern in: Reorganize String, Task Scheduler problems