Tracing terms in topic models to their full-text version in R

21 views Asked by At

How does one retrieve full-text examples of the terms making up a topic model? The goal is to get to know more context of what the ngram is about, to help assign labels better. To achieve this, the ngram relating to a topic, such as 'manufactur_method' is to be traced to its full-text occurence within the dataset. Code is irrelevant for this question in my opinion, it's a general question for how to do it, not a coding problem.

I'm very thankful for any help!

I have added the full-text a document variable in the dfm. I thought that if I can get the dfm to display the ngrams and full text in their respective column, I could search for the ngram I want to look up and see the texts containing them. Exporting to Excel doesn't work due to its size (21 Gb as character)

1

There are 1 answers

0
Connor On

Since noone has provided any help, I have moved to a manual solution, which I am sharing for any poor soul having the same problem in the future. Instead of trying to save my dfm or convert it, which wasn't possible due to the size of it and computational power required, I now do it manually in excel using the original, unprocessed dataset. I use the bigram terms I have, and add an asterisk (*) as prefix and suffix for both grams. This produces the things I need neatly. For a large number of terms, I could Imagine exporting the terms and making an automated function, but for my case it's not necessary.