The problem is :
A set of 5 independent users where asked to rate 50 products given to them. All 50 products would have been used by the users in some point of time. Some users have more bias towards certain products. One user did not truly complete the survey and gave random values. It is not necessary for the users to rate all the products. Now given a 4 sample dataset , rank the products based on ratings
datset :
product #user1 #user2 #user3 #user4 #user5
0 29 - 10 90 12
1 - - - - 7
2 - - 95 6 1
3 - - - - 2
4 - - - - 50
5 - 35 21 13 -
6 - - - - 5
7 4 - - 30 -
8 11 - - - 14
.
.
.
How to come out with a ranking for the products.
This is a remodeled problem very close to the original problem.
Solution: I tried to clean the data and fill missing values using PCA and apply NMF but i'm not sure about the solution .
Any help will be deeply appreciated
In this case, two imputation methods can be used:
Actually, I think the second method seems better for this dataset where users mostly rank more than one product.
Also, if you have another datasets depending on users, you may use it too for prediction of the missing values in this dataset.