Major vote by column?

157 views Asked by At

I have a 20x20 2D array, from which I want to get for every column the value with the highest count of occurring (excluding zeros) i.e. the value that receives the major vote.

I can do that for a single column like this :

 : np.unique(p[:,0][p[:,0] != 0],return_counts=True)
 : (array([ 3, 21], dtype=int16), array([1, 3]))

 : nums, cnts = np.unique(p[:,0][ p[:,0] != 0 ],return_counts=True)
 : nums[cnts.argmax()]
 : 21

Just for completeness, we can extend the earlier proposed method to a loop-based solution for 2D arrays -

# p is 2D input array
for i in range(p.shape[1]):
    nums, cnts = np.unique(p[:,i][ p[:,i] != 0 ],return_counts=True)
    output_per_col = nums[cnts.argmax()]

How do I do that for all columns w/o using for loop ?

1

There are 1 answers

0
Divakar On

We can use bincount2D_vectorized to get binned counts per col, where the bins would be each integer. Then, simply slice out from the second count onwards (as the first count would be for 0) and get argmax, add 1 (to compensate for the slicing). That's our desired output.

Hence, the solution shown as a sample case run -

In [116]: p # input array
Out[116]: 
array([[4, 3, 4, 1, 1, 0, 2, 0],
       [4, 0, 0, 0, 0, 0, 4, 0],
       [3, 1, 3, 4, 3, 1, 4, 3],
       [4, 4, 3, 3, 1, 1, 3, 2],
       [3, 0, 3, 0, 4, 4, 4, 0],
       [3, 0, 0, 3, 2, 0, 1, 4],
       [4, 0, 3, 1, 3, 3, 2, 0],
       [3, 3, 0, 0, 2, 1, 3, 1],
       [2, 4, 0, 0, 2, 3, 4, 2],
       [0, 2, 4, 2, 0, 2, 2, 4]])

In [117]: bincount2D_vectorized(p.T)[:,1:].argmax(1)+1
Out[117]: array([3, 3, 3, 1, 2, 1, 4, 2])

That transpose is needed because bincount2D_vectorized gets us 2D bincounts per row. Thus, for an alternative problem of getting ranks per row, simply skip that transpose.

Also, feel free to explore other options in that linked Q&A to get 2D-bincounts.