How to apply a custom function to a quanteda corpus

Question

How to apply a custom function to a quanteda corpus

411 views Asked by Doug Fir At 30 August 2017 at 06:10

I'm trying to migrate a script from using tm to quanteda. Reading the quanteda documentation there is a philosophy about applying changes "downstream" so that the original corpus is unchanged. OK.

I previously wrote a script to find spelling mistakes in our tm corpus and had support from our team to create a manual lookup. So, I have a csv file with 2 columns, the first column is the misspelt term and the second column is the correct version of that term.

Using tm package previously I did this:

# Write a custom function to pass to tm_map
# "Spellingdoc" is the 2 column csv
library(stringr)
library(stringi)
library(tm)
stringi_spelling_update <- content_transformer(function(x, lut = spellingdoc) stri_replace_all_regex(str = x, pattern = paste0("\\b", lut[,1], "\\b"), replacement = lut[,2], vectorize_all = FALSE))

Then within my tm corpus transformations I did this:

mycorpus <- tm_map(mycorpus, function(i) stringi_spelling_update(i, spellingdoc))

What is the equivilent way to apply this custom function to my quanteda corpus?

Original Q&A

There are 2 answers

Doug Fir On 30 August 2017 at 08:49

I think I found an indirect answer over here.

texts(myCorpus) <- myFunction(myCorpus)

**Ken Benoit** · Accepted Answer · 2017-08-30T16:05:30+00:00

Impossible to know if that will work from your example, which leaves some parts out, but generally:

If you want to access texts in a quanteda corpus, you can use texts(), and to replace those texts, texts()<-.

So in your case, assuming that mycorpus is a tm corpus, you could do this:

library("quanteda")
stringi_spelling_update2 <- function(x, lut = spellingdoc) {
    stringi::stri_replace_all_regex(str = x, 
                                    pattern = paste0("\\b", lut[,1], "\\b"), 
                                    replacement = lut[,2], 
                                    vectorize_all = FALSE)
}

myquantedacorpus <- corpus(mycorpus)
texts(mycorpus) <- stringi_spelling_update2(texts(mycorpus), spellingdoc)

TechQA.

How to apply a custom function to a quanteda corpus

There are 2 answers

Related Questions in R

Related Questions in TEXT-MINING

Related Questions in QUANTEDA

Popular Questions

Popular Tags

Trending Questions