Calculate Nth multiset combination (with repetition) based only on index

2.2k views Asked by At

How can i calculate the Nth combo based only on it's index. There should be (n+k-1)!/(k!(n-1)!) combinations with repetitions.

with n=2, k=5 you get:

0|{0,0,0,0,0}
1|{0,0,0,0,1}
2|{0,0,0,1,1}
3|{0,0,1,1,1}
4|{0,1,1,1,1}
5|{1,1,1,1,1}

So black_magic_function(3) should produce {0,0,1,1,1}.

This will be going into a GPU shader, so i want each work-group/thread to be able to figure out their subset of permutations without having to store the sequence globally.

with n=3, k=5 you get:
i=0, {0,0,0,0,0}
i=1, {0,0,0,0,1}
i=2, {0,0,0,0,2}
i=3, {0,0,0,1,1}
i=4, {0,0,0,1,2}
i=5, {0,0,0,2,2}
i=6, {0,0,1,1,1}
i=7, {0,0,1,1,2}
i=8, {0,0,1,2,2}
i=9, {0,0,2,2,2}
i=10, {0,1,1,1,1}
i=11, {0,1,1,1,2}
i=12, {0,1,1,2,2}
i=13, {0,1,2,2,2}
i=14, {0,2,2,2,2}
i=15, {1,1,1,1,1}
i=16, {1,1,1,1,2}
i=17, {1,1,1,2,2}
i=18, {1,1,2,2,2}
i=19, {1,2,2,2,2}
i=20, {2,2,2,2,2}

The algorithm for generating it can be seen as MBnext_multicombination at http://www.martinbroadhurst.com/combinatorial-algorithms.html

Update:

So i thought i'd replace the binomial coefficient in pascals triangle with (n+k-1)!/(k!(n-1)!) to see how it looks.

(* Mathematica code to display pascal and other triangle *)
t1 = Table[Binomial[n, k], {n, 0, 8}, {k, 0, n}];
t2 = Table[(n + k - 1)!/(k! (n - 1)!), {n, 0, 8}, {k, 0, n}];
(*display*)
{Row[#, "\t"]} & /@ t1 // Grid
{Row[#, "\t"]} & /@ t2 // Grid

T1:

                1
              1   1
            1   2   1
          1   3   3   1
        1   4   6   4   1
      1   5   10  10  5   1
    1   6   15  20  15  6   1
  1   7   21  35  35  21  7   1
1   8   28  56  70  56  28  8   1

T2:

           Indeterminate
               1   1
             1   2   3
           1   3   6   10
         1   4   10  20   35
       1   5   15  35   70   126
     1   6   21  56  126   252  462
   1   7   28  84  210   462  924   1716
1   8   36  120  330   792  1716  3432  6435

Comparing with the n=3,k=5 console output at the start of this post: the third diagonal {3,6,10,15,21,28,36} gives the index of each roll-over point {0,0,0,1,1} -> {0,0,1,1,1} -> {0,1,1,1,1}, etc. And the diagonal to the left of it seems to show how many values are contained in the previous block (diagonal[2][i] == diagonal[3][i] - diagonal[3][i-1])). And if you read the 5th row of the pyramid horizontally you get the max amount of combinations for increasing values of N in (n+k-1)!/(k!(n-1)!) where K=5.

There is probably a way to use this information to determine the exact combo for an arbitrary index, without enumerating the whole set, but i'm not sure if i need to go that far. The original problem was just to decompose the full combo space into equal subsets, that can be generated locally, and worked on in parallel by the GPU. So the triangle above gives us the starting index of every block, of which the combo can be trivially derived, and all its successive elements incrementally enumerated. It also gives us the block size, and how many total combinations we have. So now it becomes a packing problem of how to fit unevenly sized blocks into groups of equal work load across X amount of threads.

3

There are 3 answers

1
Jean-Baptiste On BEST ANSWER

See the example at: https://en.wikipedia.org/wiki/Combinatorial_number_system#Finding_the_k-combination_for_a_given_number

Just replace the binomial Coefficient with (n+k-1)!/(k!(n-1)!).


Assuming n=3,k=5, let's say we want to calculate the 19th combination (id=19).

 id=0, {0,0,0,0,0}
 id=1, {0,0,0,0,1}
 id=2, {0,0,0,0,2}
 ...
 id=16, {1,1,1,1,2}
 id=17, {1,1,1,2,2}
 id=18, {1,1,2,2,2}
 id=19, {1,2,2,2,2}
 id=20, {2,2,2,2,2}

The result we're looking for is {1,2,2,2,2}.

Examining our 'T2' triangle: n=3,k=5 points to 21, being the 5th number (top to bottom) of the third diagonal (left to right).

            Indeterminate
                1   1
              1   2   3
            1   3   6   10
          1   4   10  20   35
        1   5   15  35   70   126
      1   6   21  56  126   252  462
    1   7   28  84  210   462  924   1716
 1   8   36  120  330   792  1716  3432  6435

We need to find the largest number in this row (horizontally, not diagonally) that does not exceed our id=19 value. So moving left from 21 we arrive at 6 (this operation is performed by the largest function below). Since 6 is the 2nd number in this row it corresponds to n==2 (or g[2,5] == 6 from the code below).

Now that we've found the 5th number in the combination, we move up a floor in the pyramid, so k-1=4. We also subtract the 6 we encountered below from id, so id=19-6=13. Repeating the entire process we find 5 (n==2 again) to be the largest number less than 13 in this row.

Next: 13-5=8, Largest is 4 in this row (n==2 yet again).

Next: 8-4=4, Largest is 3 in this row (n==2 one more time).

Next: 4-3=1, Largest is 1 in this row (n==1)

So collecting the indices at each stage we get {1,2,2,2,2}


The following Mathematica code does the job:

g[n_, k_] := (n + k - 1)!/(k! (n - 1)!)

largest[i_, nn_, kk_] := With[
    {x = g[nn, kk]}, 
    If[x > i, largest[i, nn-1, kk], {nn,x}]
]

id2combo[id_, n_, 0]  := {}
id2combo[id_, n_, k_] := Module[
    {val, offset}, 
    {val, offset} = largest[id, n, k];
    Append[id2combo[id-offset, n, k-1], val]
]

Update: The order that the combinations were being generated by MBnext_multicombination wasn't matching id2combo, so i don't think they were lexicographic. The function below generates them in the same order as id2combo and matches the order of mathematica's Sort[]function on a list of lists.

void next_combo(unsigned int *ar, unsigned int n, unsigned int k)
{
    unsigned int i, lowest_i;

    for (i=lowest_i=0; i < k; ++i)
        lowest_i = (ar[i] < ar[lowest_i]) ? i : lowest_i;

    ++ar[lowest_i];

    i = (ar[lowest_i] >= n) 
        ? 0           // 0 -> all combinations have been exhausted, reset to first combination.
        : lowest_i+1; // _ -> base incremented. digits to the right of it are now zero.

    for (; i<k; ++i)
        ar[i] = 0;  
}
0
Bob Bryan On

I have done some preliminary analysis on the problem. Before I talk about the inefficient solution I found, let me give you a link to a paper I wrote on how to translate the k-indexes (or combination) to the rank or lexigraphic index to the combinations associated with the binomial coefficient:

http://tablizingthebinomialcoeff.wordpress.com/

I started out the same way in trying to solve this problem. I came up with the following code that uses one loop for each value of k in the formula (n+k-1)!/k!(n-1)! when k = 5. As written, this code will generate all combinations for the case of n choose 5:

private static void GetCombos(int nElements)
{
   // This code shows how to generate all the k-indexes or combinations for any number of elements when k = 5.
   int k1, k2, k3, k4, k5;
   int n = nElements;
   int i = 0;
   for (k5 = 0; k5 < n; k5++)
   {
      for (k4 = k5; k4 < n; k4++)
      {
         for (k3 = k4; k3 < n; k3++)
         {
            for (k2 = k3; k2 < n; k2++)
            {
               for (k1 = k2; k1 < n; k1++)
               {
                  Console.WriteLine("i = " + i.ToString() + ", " + k5.ToString() + " " + k4.ToString() + 
                     " " + k3.ToString() + " " + k2.ToString() + " " + k1.ToString() + " ");
                  i++;
               }
            }
         }
      }
   }
}

The output from this method is:

i = 0, 0 0 0 0 0
i = 1, 0 0 0 0 1
i = 2, 0 0 0 0 2
i = 3, 0 0 0 1 1
i = 4, 0 0 0 1 2
i = 5, 0 0 0 2 2
i = 6, 0 0 1 1 1
i = 7, 0 0 1 1 2
i = 8, 0 0 1 2 2
i = 9, 0 0 2 2 2
i = 10, 0 1 1 1 1
i = 11, 0 1 1 1 2
i = 12, 0 1 1 2 2
i = 13, 0 1 2 2 2
i = 14, 0 2 2 2 2
i = 15, 1 1 1 1 1
i = 16, 1 1 1 1 2
i = 17, 1 1 1 2 2
i = 18, 1 1 2 2 2
i = 19, 1 2 2 2 2
i = 20, 2 2 2 2 2

This is the same values as you gave in your edited answer. I also have tried it with 4 choose 5 as well, and it looks like it generates the correct combinations as well.

I wrote this in C#, but you should be able to use it with other languages like C/C++, Java, or Python without too many edits.

One idea for a somewhat inefficient solution is to modify GetCombos to accept k as an input as well. Since k is limited to 6, it would then be possible to put in a test for k. So the code to generate all possible combinations for an n choose k case would then look like this:

private static void GetCombos(int k, int nElements)
{
   // This code shows how to generate all the k-indexes or combinations for any n choose k, where k <= 6.
   //
   int k1, k2, k3, k4, k5, k6;
   int n = nElements;
   int i = 0;
   if (k == 6)
   {
      for (k6 = 0; k6 < n; k6++)
      {
         for (k5 = 0; k5 < n; k5++)
         {
            for (k4 = k5; k4 < n; k4++)
            {
               for (k3 = k4; k3 < n; k3++)
               {
                  for (k2 = k3; k2 < n; k2++)
                  {
                     for (k1 = k2; k1 < n; k1++)
                     {
                        Console.WriteLine("i = " + i.ToString() + ", " + k5.ToString() + " " + k4.ToString() +
                           " " + k3.ToString() + " " + k2.ToString() + " " + k1.ToString() + " ");
                        i++;
                     }
                  }
               }
            }
         }
      }
   }
   else if (k == 5)
   {
      for (k5 = 0; k5 < n; k5++)
      {
         for (k4 = k5; k4 < n; k4++)
         {
            for (k3 = k4; k3 < n; k3++)
            {
               for (k2 = k3; k2 < n; k2++)
               {
                  for (k1 = k2; k1 < n; k1++)
                  {
                     Console.WriteLine("i = " + i.ToString() + ", " + k5.ToString() + " " + k4.ToString() +
                        " " + k3.ToString() + " " + k2.ToString() + " " + k1.ToString() + " ");
                     i++;
                  }
               }
            }
         }
      }
   }
   else if (k == 4)
   {
      // One less loop than k = 5.
   }
   else if (k == 3)
   {
      // One less loop than k = 4.
   }
   else if (k == 2)
   {
      // One less loop than k = 3.
   }
   else
   {
      // k = 1 - error?
   }
}

So, we now have a method that will generate all the combinations of interest. But, the problem is to obtain a specific combination from the lexigraphic order or rank of where that combination lies within the set. So, this can accomplished by a simple count and then returning the proper combination when it hits the specified value. So, to accommodate this an extra parameter that represents the rank needs to be added to the method. So, a new function to do this looks like this:

private static int[] GetComboOfRank(int k, int nElements, int Rank)
{
   // Gets the combination for the rank using the formula (n+k-1)!/k!(n-1)! where k <= 6.
   int k1, k2, k3, k4, k5, k6;
   int n = nElements;
   int i = 0;
   int[] ReturnArray = new int[k];
   if (k == 6)
   {
      for (k6 = 0; k6 < n; k6++)
      {
         for (k5 = 0; k5 < n; k5++)
         {
            for (k4 = k5; k4 < n; k4++)
            {
               for (k3 = k4; k3 < n; k3++)
               {
                  for (k2 = k3; k2 < n; k2++)
                  {
                     for (k1 = k2; k1 < n; k1++)
                     {
                        if (i == Rank)
                        {
                           ReturnArray[0] = k1;
                           ReturnArray[1] = k2;
                           ReturnArray[2] = k3;
                           ReturnArray[3] = k4;
                           ReturnArray[4] = k5;
                           ReturnArray[5] = k6;
                           return ReturnArray;
                        }
                        i++;
                     }
                  }
               }
            }
         }
      }
   }
   else if (k == 5)
   {
      for (k5 = 0; k5 < n; k5++)
      {
         for (k4 = k5; k4 < n; k4++)
         {
            for (k3 = k4; k3 < n; k3++)
            {
               for (k2 = k3; k2 < n; k2++)
               {
                  for (k1 = k2; k1 < n; k1++)
                  {
                     if (i == Rank)
                     {
                        ReturnArray[0] = k1;
                        ReturnArray[1] = k2;
                        ReturnArray[2] = k3;
                        ReturnArray[3] = k4;
                        ReturnArray[4] = k5;
                        return ReturnArray;
                     }
                     i++;
                  }
               }
            }
         }
      }
   }
   else if (k == 4)
   {
      // Same code as in the other cases, but with one less loop than k = 5.
   }
   else if (k == 3)
   {
      // Same code as in the other cases, but with one less loop than k = 4.
   }
   else if (k == 2)
   {
      // Same code as in the other cases, but with one less loop than k = 3.
   }
   else
   {
      // k = 1 - error?
   }
   // Should not ever get here.  If we do - it is some sort of error.
   throw ("GetComboOfRank - did not find rank");
}

ReturnArray returns the combination associated with the rank. So, this code should work for you. However, it will be much slower than what could be achieved if a table lookup was done. The problem with 300 choose 6 is that:

300 choose 6 = 305! / (6!(299!) = 305*304*303*302*301*300 / 6! = 1,064,089,721,800

That is probably way too much data to store in memory. So, if you could get n down to 20, through preprocessing then you would be looking at a total of:

20 choose 6 = 25! / (6!(19!)) = 25*24*23*22*21*20 / 6! = 177,100
20 choose 5 = 24! / (5!(19!)) = 24*23*22*21,20 / 5!    =  42,504
20 choose 4 = 23! / (4!(19!)) = 23*22*21*20 / 4!       =   8,855
20 choose 3 = 22! / (3!(19!)) = 22*21*20 / 3!          =   1,540
20 choose 2 = 21! / (2!(19!)) = 22*21 / 2!             =     231
                                                         =======
                                                         230,230

If one byte is used for each value of the combination, then the total number of bytes used to store a table (via a jagged array or perhaps 5 separate tables) in memory could be calculated as:

177,100 * 6 = 1,062,600
 42,504 * 5 =   212,520
  8,855 * 4 =    35,420
  1,540 * 3 =     4,620
    231 * 2 =       462
              =========
              1,315,622

It depends on the target machine and how much memory is available, but 1,315,622 bytes is not that much memory when many machines today have gigabytes of memory available.

0
Vladimir Panteleev On

Here is a combinatorial number system implementation which handles combinations with and without repetition (i.e multiset), and can optionally produce lexicographic ordering.

/// Combinatorial number system encoder/decoder
/// https://en.wikipedia.org/wiki/Combinatorial_number_system
struct CNS(
    /// Type for packed representation
    P,
    /// Type for one position in unpacked representation
    U,
    /// Number of positions in unpacked representation
    size_t N,
    /// Cardinality (maximum value plus one) of one position in unpacked representation
    U unpackedCard,
    /// Produce lexicographic ordering?
    bool lexicographic,
    /// Are repetitions representable? (multiset support)
    bool multiset,
)
{
static:
    /// Cardinality (maximum value plus one) of the packed representation
    static if (multiset)
        enum P packedCard = multisetCoefficient(unpackedCard, N);
    else
        enum P packedCard = binomialCoefficient(unpackedCard, N);
    alias Index = P;

    private P summand(U value, Index i)
    {
        static if (lexicographic)
        {
            value = cast(U)(unpackedCard-1 - value);
            i = cast(Index)(N-1 - i);
        }
        static if (multiset)
            value += i;
        return binomialCoefficient(value, i + 1);
    }

    P pack(U[N] values)
    {
        P packed = 0;
        foreach (Index i, value; values)
        {
            static if (!multiset)
                assert(i == 0 || value > values[i-1]);
            else
                assert(i == 0 || value >= values[i-1]);

            packed += summand(value, i);
        }

        static if (lexicographic)
            packed = packedCard-1 - packed;
        return packed;
    }

    U[N] unpack(P packed)
    {
        static if (lexicographic)
            packed = packedCard-1 - packed;

        void unpackOne(Index i, ref U r)
        {
            bool checkValue(U value, U nextValue)
            {
                if (summand(nextValue, i) > packed)
                {
                    r = value;
                    packed -= summand(value, i);
                    return true;
                }
                return false;
            }

            // TODO optimize: (rolling product / binary search / precomputed tables)
            // TODO optimize: don't check below N-i
            static if (lexicographic)
            {
                foreach_reverse (U value; 0 .. unpackedCard)
                    if (checkValue(value, cast(U)(value - 1)))
                        break;
            }
            else
            {
                foreach         (U value; 0 .. unpackedCard)
                    if (checkValue(value, cast(U)(value + 1)))
                        break;
            }
        }

        U[N] values;
        static if (lexicographic)
            foreach         (Index i, ref r; values)
                unpackOne(i, r);
        else
            foreach_reverse (Index i, ref r; values)
                unpackOne(i, r);

        return values;
    }
}

Full code: https://gist.github.com/CyberShadow/67da819b78c5fd16d266a1a3b4154203