Pick K letters to build as many strings as possible

263 views Asked by At

I stumbled upon this question and I'm unable to solve it. Any help is much appreciated.

You are given an array S made of N strings and an integer K. Choose at most K letters from the alphabet that will allow you to build as many strings from array S as possible. Any of the chosen letters can be used multiple times when building the strings. What is the maximum number of strings that can be built?

Example: S = ["a","aa","ab","bb","bc","bd"] K = 1 answer = 2; pick letter 'a' to build strings ["a","aa"]

I seriously have no idea how to solve this...

2

There are 2 answers

0
Matt Timmermans On

I think this problem would be NP-hard, except that there are only 26 letters. That bound makes the problem easy.

The first step is to pick either the problem or its complement to solve. Let's say there are U letters that are actually used in strings.

If K <= U/2, then just try all C(U,K) ways to choose K of U letters and find the largest number of words you can build. There are at most 10,600,400 combinations.

If K > U/2. then try all C(U,U-K) ways to choose U-K letters, and find the combination that is used in the fewest number of words. The remaining words are the ones that can be built with the letters you don't choose. Again there are at most 10,600,400 combinations to try.

This takes linear time. There's a constant factor of 10 million, but with careful optimization you can make this quite fast.

I would probably transform each word into a bit-mask of the letters that it uses, make an array of bit masks for each chosen combination, and then match each word against each mask. If I'm trying to build words, I add 1 for every mask where (word&mask)==word. If I'm counting hits, I add 1 for every mask where (word&mask)!=0. Precalculating some mask subsets can speed this up.

0
Cary Swoveland On

This can be formulated as integer programming (IP) optimization. If it be solved in a reasonable amount of time--which depends on the number and size of words and the IP solver used--it will provide an optimal solution.

There are two types of variables, both of which are binary-valued, meaning each variable can only take on the value zero or one.

li = 1 if letter i is used; else 0

wj = 1 if word j can be formed; else 0

The objective is:

max Σjwj

The constraints are:

Σili <= k

wj <= li for each word j and each unique letter i in word j

The first constraint limits the number of letters chosen for use to be no more than k, which is given.

There is one constraint of the latter type for each unique letter in each word. They prevent word j from being formed unless li = 1 for all letters i in the word.