I am trying to run through text2vec's example on this page. However, whenever I try to see what the vocab_vectorizer function returned, it's just an output of the function itself. In all my years of R coding, I've never seen this before, but it also feels funky enough to extend beyond just this function. Any pointers?
> library(data.table)
> data("movie_review")
> setDT(movie_review)
> setkey(movie_review, id)
> set.seed(2016L)
> all_ids <- movie_review$id
> train_ids <- sample(all_ids, 4000)
> test_ids <- setdiff(all_ids, train_ids)
> train <- movie_review[J(train_ids)]
> test <- movie_review[J(test_ids)]
>
> prep_fun <- tolower
> tok_fun <- word_tokenizer
>
> it_train <- itoken(train$review,
+ preprocessor = prep_fun,
+ tokenizer = tok_fun,
+ ids = train$id,
+ progressbar = FALSE)
> vocabulary <- create_vocabulary(it_train)
>
> vec <- text2vec::vocab_vectorizer(vocabulary = vocabulary)
> vec
function (iterator, grow_dtm, skip_grams_window_context, window_size,
weights, binary_cooccurence = FALSE)
{
vocab_corpus_ptr = cpp_vocabulary_corpus_create(vocabulary$term,
attr(vocabulary, "ngram")[[1]], attr(vocabulary, "ngram")[[2]],
attr(vocabulary, "stopwords"), attr(vocabulary, "sep_ngram"))
setattr(vocab_corpus_ptr, "ids", character(0))
setattr(vocab_corpus_ptr, "class", "VocabCorpus")
corpus_insert(vocab_corpus_ptr, iterator, grow_dtm, skip_grams_window_context,
window_size, weights, binary_cooccurence)
}
<bytecode: 0x7f9c2e3f7380>
<environment: 0x7f9c18970970>
>
The output of vocab_vectorizer is supposed to be a function. I ran the function from the example in the documentation as below:
The output of vocab_vectorizer:
In the documentation, it has been mentioned that "It supposed to be used only as argument to create_dtm, create_tcm, create_vocabulary".
Finally, when I ran create_dtm(it, vectorizer), I got the output
I hope this answers you.