Convert processed format with stm into dtm (Structural topic modeling)

Question

Convert processed format with stm into dtm (Structural topic modeling)

237 views Asked by Dario Lacan At 06 May 2022 at 12:35

I have used the textProcessor and the prepDocuments functions from the stm package to clean a corpus. Now I would like to convert the resulting object (list of indices plus vocabulary) into a standard document-term matrix (or quanteda document-feature matrix) so that I can apply topicmodels function LDA and compare the resulting topics with stm.

processed <- textProcessor(poliblog5k.docs,
                           metadata = poliblog5k.meta,
                           language = "en")

prepped <- prepDocuments(processed$documents,
                         processed$vocab,
                         processed$meta,
                         lower.thresh = 20)

LDA(processed)
LDA(prepped)

> Error in x != vector(typeof(x), 1L)

LDA(processed$documents)
LDA(prepped$documents)

> Error in !all.equal(x$v, as.integer(x$v))

Original Q&A

There are 1 answers

**Ignacio Toledo** · Accepted Answer · 2022-07-15T09:56:26+00:00

I had the same problem. What I did is to transform the output from prepDocuments to a one-term-per-document-per-row format and then apply the cast_dfm function from the package {tidytext}.

library(topicmodels)
library(tidyverse)
library(tidytext)
library(magrittr)
library(stm)

stm_to_dtm <- function(out){
  tibble(out_doc = out$documents %>% map(t)) %>%
    mutate(out_doc = out_doc %>% map(set_colnames, c("term", "n"))) %>% 
    mutate(out_doc = out_doc %>% map(as_tibble)) %>% 
    rownames_to_column(var = "document") %>% 
    unnest(cols = out_doc) %>% 
    mutate(term = out$vocab[term]) %>% 
    cast_dtm(document, term, n)
}

temp<-textProcessor(documents=gadarian$open.ended.response,metadata=gadarian)
meta<-temp$meta
vocab<-temp$vocab
docs<-temp$documents
out <- prepDocuments(docs, vocab, meta)

prepped <- stm_to_dtm(out)

> prepped
<<DocumentTermMatrix (documents: 341, terms: 462)>>
Non-/sparse entries: 3149/154393
Sparsity           : 98%
Maximal term length: 11
Weighting          : term frequency (tf)

> LDA(prepped, k = 5)
A LDA_VEM topic model with 5 topics.

TechQA.

Convert processed format with stm into dtm (Structural topic modeling)

There are 1 answers

Related Questions in R

Related Questions in QUANTEDA

Related Questions in TOPICMODELS

Popular Questions

Popular Tags

Trending Questions