How to form sentence embeddings from word embeddings using glove on dataframe trained tensors?

1.2k views Asked by At

I am working with a dataset containing snippets of event information. My dataframe looks similar to:

index| event_description
----------------------
1    | concert with thousands of people
2    | people gathering 
3    | there was an event in the city and it was so much fun
...
8000 | very boring gathering

My job is to cluster these events based on their meanings. I do not know how many events there should be, that's the job of the unsupervised learning.

In order to proceed with the DBSCAN clustering I have embedded all words in my dataframe into vectors using GloVe (rather doc2Vec, etc).

How do you convert word vectors into sentence vectors, to proceed to clustering?

I have read this article as well as some other posts and papers, which use other sentence embedding algorithms, not GloVe word embedding. Also, some repos like InferSent and Google universal sentence encoder are pretty good, however they are using pre trained tensors.

Given these constraints, that I must use GloVe and dataframe trained tensors rather than pretrained ones, how can I form sentence vectors from word vectors?

0

There are 0 answers