How to create feature vectors out of document of words and do operations on them?

396 views Asked by At

I am currently trying to implement a scholarly paper recommendation system. The first part of this project is to create the profile for each junior researcher using the research paper that they have published and the papers that have been referenced in that paper. The mathematical notation for which will be :

Equation for feature vector of a user

Where P1 is the researcher's paper, f is the feature vector, W is the weight assigned to the vectors to give appropriate importance to each paper referenced and ref is the reference paper.

Now the data for each paper, reference and research, is given as the words and their term frequency. For eg.

enter image description here

For the individual files I have no problem in constructing the feature vector. I use this code:

def create_fvector_p(file_name):
    file = open(file_name,'r')
    feature_dict = defaultdict(float)
    for line in file:
        feature = line.split()
        feature_dict[feature[0]] = feature[1]
    feature_vector = DataFrame.from_dict([feature_dict])
    return feature_vector

Now when it comes to do operations using these vectors I am lost. I don't know how to manipulate this vector space model so that i can fit it into those equations. What am I doing wrong and how should I make it right ?

0

There are 0 answers