How to calculate row-wise average of a 2-D numpy array of only non-zero elements and subtract average from the non-zero elements

699 views Asked by At

I am building a recommendation system. I am using NumPy arrays to represent users, where each row is a user and each column is a movie. I want to normalize each row by subtracting the average of each row from the non-zero elements (i.e from only those entries for which the user provided a rating). I have tried many ways to do this using np.where, np.nonzero() and so on but could not achieve exactly what I want here. E.g I have the matrix

x = np.array([[0,0,1,2,3],[0,0,2,3,4],[0,0,3,4,5.0]])

I want to achieve the same effect as this

for i in range(len(x)):
    y = np.mean(x[i][x[i].nonzero()])
    x[i][x[i].

nonzero()] -= y

which outputs:

[[ 0.  0. -1.  0.  1.]
 [ 0.  0. -1.  0.  1.]
 [ 0.  0. -1.  0.  1.]]

but in a vectorized way, without using loop.

I have tried

mask = x!= 0
t = np.npwhere(mask, x-x.mean(axis=1).reshape(-1,1), x)

but this takes the average over the entire row, but I want to average over only the non-zero elements.

1

There are 1 answers

0
Valdi_Bo On

First compute where condition array (which elements to operate on):

wh = x != 0

Then compute the mean of the "wanted" elements:

mn = np.mean(x, axis=1, where=wh)[:, np.newaxis]

Note that [:, np.newaxis] converts a 1-D result into a 2-D array with a single column.

Caution: where parameter has been introduced in 1.20.0 version of Numpy. If you have older version, upgrade.

And to get the result, run:

result = np.where(wh, x - mn, x)

The result is:

array([[ 0.,  0., -1.,  0.,  1.],
       [ 0.,  0., -1.,  0.,  1.],
       [ 0.,  0., -1.,  0.,  1.]])