I analyze some brands in text to find out KPI´s like Ad recognition. However brands which contain special characters are destroyed by my code so far.
library(qdap)
library(stringr)
test <- c("H&M", "C&A", "Zalando", "Zalando", "Amazon", "Sportscheck")
wfm(test)
This is the output:
            all
a             1
amazon        1
c             1
h             1
m             1
sportscheck   1
zalando       2
Is there a package or method to archieve that H&M gets h&m, but not "h" and "m", like its two brands?
edit: The wfm function has got a ... argument which SHOULD allow me to use the strip function.
wfm(test, ... = strip(test, char.keep = "&"))
Does not work unfortunately.
 
                        
I would say something like this. In the udpipe package there is a function
document_term_frequencieswhere you can specify the split and it turns the data into a data.frame with the frequency count. If there is no id column to specify it will generate one. The resulting object of thedocument_term_frequenciesis a data.table.