I have a corpus of text with each line in the csv file uniquely specifying a "topic" I am interested in. If I were to run an topic model on this corpus using an LDA or Gibbs method from either the topicmodels package or lda, as expected I would get multiple topics per "document" (a line of text in my CSV which I have a-priori defined to be my unique topic of interest). I get that this is a result of the topic model's algorithm and the bag of words assumption.
What I am curious about however is this
1) Is there a pre-fab'd package in R that is designed for the user to specify the topics using the empirical word distribution? That is, I don't want the topics to be estimated; I want to tell R what the topics are. I suppose I could run a topic model with the correct number of Topics, use that structure of the object and then overwrite its contents. I was just hoping there was an easier or more obvious way that I'm just not seeing at this point.
Thoughts?
edit: added - I just thought about the alpha and beta parameters having control over the topic/term distributions within the LDA modeling algorithm. What settings might I be able to use that would force the model to only find 1 topic per document? Or is there a setting which would allow for that to occur?
If these seem like silly questions I understand - I'm quite new to this particular field and I am finding it fascinating.
What are you trying to accomplish with this approach? If you want to tell R what the topics are so it can predict the topics in other lines or documents, then RTextTools may be a helpful package.