aggregate values of one colum by classes in second column using numpy

292 views Asked by At

I've a numpy array with shape N,2 and N>10000. I the first column I have e.g. 6 class values (e.g. 0.0,0.2,0.4,0.6,0.8,1.0) in the second column I have float values. Now I want to calculate the average of the second column for all different classes of the first column resulting in 6 averages one for each class.

Is there a numpy way to do this, to avoid manual loops especially if N is very large?

2

There are 2 answers

2
Jaime On

In pure numpy you would do something like:

unq, idx, cnt = np.unique(arr[:, 0], return_inverse=True,
                          return_counts=True)
avg = np.bincount(idx, weights=arr[:, 1]) / cnt
0
Michael Hecht On

I copied the answer from Warren to here, since it solves my problem best and I want to check it as solved:

This is a "groupby/aggregation" operation. The question is this close to being a duplicate of getting median of particular rows of array based on index. ... You could also use scipy.ndimage.labeled_comprehension as suggested there, but you would have to convert the first column to integers (e.g. idx = (5*data[:, 0]).astype(int)

I did exactly this.