I want generate a numpy array that represents potential species assemblages (each row is one assemblage) for following condition: Each assemblage comprises of maximum 5 species with maximum 50 individuals per species (in steps of 10). Thus, the final array should have 5 columns (representing each of the five species) and the values in each cell can take {0,10,20,30,40,50}. All possible combinations should be represented in the array.

I managed to do that quite simply in R but due to further processing (of a even larger dataset, n>35*10^6 rows) I would like to switch to python do improve calculation times (using some scipy-biodiversity functions).

Here the rather simple R code which I want to translate into a Python equivalent:

Assemblage_generated <- expand.grid(seq(0,50,10),seq(0,50,10),seq(0,50,10),seq(0,50,10),seq(0,50,10))

Is there a specific Python function specifically dedicated to do such things?

2 Answers

Jacques Gaudin On Best Solutions

You have the choice of two methods to do this:

  1. With numpy.meshgrid (I think it's the fastest)
from itertools import repeat
import numpy as np

val = list(range(0, 60, 10))

res = np.stack(np.meshgrid(*repeat(val, 5)), -1).reshape(-1, 5)
  1. With itertools product
from itertools import product, repeat
import numpy as np

val = list(range(0, 60, 10))

res = np.array(list(product(*repeat(val, 5))), dtype='int32')
Sigve Karolius On

I believe what you are looking for is the Cartesian product (I see that "jdehesa" proposed this before me). As shown in the link this can be achieved in a number of ways depending on the speed requirements.

A quick and dirty solution specific to your problem could be the following:

import numpy as np
import itertools

lst = list(itertools.product(*[np.arange(0, 60, 10, dtype='int') for i in range(5)]))

# Compare with output from R:
for (i, li) in enumerate(lst):
    print(i, li)