I am building a recommendation system. I am using NumPy arrays to represent users, where each row is a user and each column is a movie. I want to normalize each row by subtracting the average of each row from the non-zero elements (i.e from only those entries for which the user provided a rating). I have tried many ways to do this using np.where, np.nonzero() and so on but could not achieve exactly what I want here. E.g I have the matrix
x = np.array([[0,0,1,2,3],[0,0,2,3,4],[0,0,3,4,5.0]])
I want to achieve the same effect as this
for i in range(len(x)):
y = np.mean(x[i][x[i].nonzero()])
x[i][x[i].
nonzero()] -= y
which outputs:
[[ 0. 0. -1. 0. 1.]
[ 0. 0. -1. 0. 1.]
[ 0. 0. -1. 0. 1.]]
but in a vectorized way, without using loop.
I have tried
mask = x!= 0
t = np.npwhere(mask, x-x.mean(axis=1).reshape(-1,1), x)
but this takes the average over the entire row, but I want to average over only the non-zero elements.
First compute where condition array (which elements to operate on):
Then compute the mean of the "wanted" elements:
Note that
[:, np.newaxis]converts a 1-D result into a 2-D array with a single column.Caution: where parameter has been introduced in 1.20.0 version of Numpy. If you have older version, upgrade.
And to get the result, run:
The result is: