The set of all combinations of vectors in Python

583 views Asked by At

I'm having an issue with regards to creating what can be seen as the cartesian product of an array of vectors in Python. I have a code that gives the all possible partitions of a number n over r variables, and returns it as a numpy array. What I would like to do is to be able to call that code an arbitrary number of times, and then produce a set of all possible combinations of the arrays.

So to give an example, I might call the partition code and each successive call (for a varying parameter set)

array([[2,0],[1,1],[2,0]])
array([[1,0],[0,1]])
array([[0,0]])

What I'm looking for is to be able to return the set

array([[2,0],[1,0],[0,0]])
array([[2,0],[0,1],[0,0]])
array([[1,1],[1,0],[0,0]])
.....

either as an overall array, or returning it line by line (due to the obvious memory issues as the size of the number being partitioned grows).

Previously I had solved this problem using itertools.product, and run the code under PyPy. However, I have had to switch from PyPy to standard python due to the need for Numpy in other parts of the project, and I'm trying to replicate the speeds of the PyPy code through the use of Numpy. I've managed to get this working really roughly, but the code spent so much time changing between data types in order to try and bootstrap a solution together that it's impractical for implementation.

I was wondering if anybody would be able to help me out with providing a little guidance as to how I should progress with this in Python.

Thanks

1

There are 1 answers

4
Daniel On BEST ANSWER

This should get you started:

import numpy as np
import itertools as it

def row_product(*arrays):
    lengths = np.array([x.shape[0] for x in arrays])
    positions = np.cumsum(lengths)

    ranges = np.arange(positions[-1])
    ranges = np.split(ranges,positions[:-1])

    total = np.concatenate((arrays),axis=0)

    inds = np.fromiter(it.chain.from_iterable(it.product(*ranges)), np.int)
    inds = inds.reshape(-1, len(arrays))

    return np.take(total, inds, axis=0)

The last dimension(s) must be the same.

Showing the results:

a=np.array([[2,0],[1,1],[2,0]])
b=np.array([[1,0],[0,1]])
c=np.array([[0,0]])

print row_product(a,b,c)

[[[2 0]
  [1 0]
  [0 0]]

 [[2 0]
  [0 1]
  [0 0]]

 [[1 1]
  [1 0]
  [0 0]]

 [[1 1]
  [0 1]
  [0 0]]

 [[2 0]
  [1 0]
  [0 0]]

 [[2 0]
  [0 1]
  [0 0]]]

This is a 3D array where the unique combinations are in the last two axes. Seems to be reasonably fast, 1M unique combinations takes about 1/6 of a second.