I am currently trying to implement a scholarly paper recommendation system. The first part of this project is to create the profile for each junior researcher using the research paper that they have published and the papers that have been referenced in that paper. The mathematical notation for which will be :
Where P1 is the researcher's paper, f is the feature vector, W is the weight assigned to the vectors to give appropriate importance to each paper referenced and ref is the reference paper.
Now the data for each paper, reference and research, is given as the words and their term frequency. For eg.
For the individual files I have no problem in constructing the feature vector. I use this code:
def create_fvector_p(file_name):
file = open(file_name,'r')
feature_dict = defaultdict(float)
for line in file:
feature = line.split()
feature_dict[feature[0]] = feature[1]
feature_vector = DataFrame.from_dict([feature_dict])
return feature_vector
Now when it comes to do operations using these vectors I am lost. I don't know how to manipulate this vector space model so that i can fit it into those equations. What am I doing wrong and how should I make it right ?