Combine two words in a corpus with R

Question

Combine two words in a corpus with R

512 views Asked by florian joly At 23 December 2019 at 23:52

So here is my code

ny <- read.csv2("nyt.csv", sep = "\t", header = T)
ny_texte <- as.vector(ny)

iterator <- itoken(ny_texte,
                   preprocessor=tolower, 
                   tokenizer=word_tokenizer, 
                   progressbar=FALSE)

vocabulary <- create_vocabulary(iterator)

My .csv is articles from the new york times. I would like to combine words like "new york", "south africa", "ellis island" in vocabulary and not just have token like this : "new" , "york", etc

How can I do this ?

Thank You

for more precision: I m using these libraries

library(text2vec)
library(stopwords)
library(tm)
library(dplyr)
library(readr)

and for example about my results

ny[1]

1 " LEAD Governor Cuomo with possible Presidential campaign waiting the wings took the oath office New Year Eve for second term New York chief executive LEAD Governor Cuomo with possible Presidential campaign waiting the wings ...

vocabulary enter image description here

Original Q&A

There are 1 answers

**user697473** · Accepted Answer · 2019-12-25T00:38:51+00:00

It's still a little hard to answer your question: we can't run your code because we don't have "nyt.csv." But it seems that gsub() will do what you want:

ny <- read.csv2("nyt.csv", sep = "\t", header = TRUE)
ny <– gsub("new york", "newyork", ny, ignore.case = TRUE)
ny <– gsub("south africa", "southafrica", ny, ignore.case = TRUE)
ny_texte <- as.vector(ny)

(And then run the itoken() and create_vocabulary() commands from your example.)

TechQA.

Combine two words in a corpus with R

There are 1 answers

Related Questions in R

Related Questions in TEXT-MINING

Related Questions in CORPUS

Related Questions in TEXT2VEC

Popular Questions

Trending Questions