How to extract text within brackets in Excel .CSV file in R?

Question

How to extract text within brackets in Excel .CSV file in R?

89 views Asked by CogNeuro123 At 06 February 2023 at 23:02

I have an Excel .CSV file in which one column has the transcription of a conversation. Whenever the speaker uses Spanish, the Spanish is written within brackets.

so [usualmente] maybe [me levanto como a las nueve y media] like I exercise and the I like either go to class online or in person like it depends on the day

Ideally, I'd like to extract the English and Spanish separately, so one file would contain all the Spanish words, and another would contain all the English words.

Any ideas on how to do this? Or which function/package to use?

Edited to add: there's about 100 cells that contain text in this Excel sheet. I guess where I'm confused is how do I treat this entire CSV as a "string"?

Original Q&A

There are 1 answers

**jpsmith** · Answer 1 · 2023-02-07T00:00:43+00:00

You could do this by Vectorizeing the seq function and indexing, then using stringr::word to extract the whole words at the indices:

Example string:

strng <- "so [usualmente] maybe [me levanto como a las nueve y media] like I exercise and the I like either go to class online or in person like it depends on the day"

Code

strng <- "so [usualmente] maybe [me levanto como a las nueve y media] like I exercise and the I like either go to class online or in person like it depends on the day"

vecSeq <- Vectorize(seq.default, vectorize.args = c("to", "from"))

ixstart <- grep("\\[", unlist(strsplit(strng, " ")))
ixend <- grep("\\]", unlist(strsplit(strng, " ")))
spanish_ix <- unlist(vecSeq(ixstart, ixend, 1))
english_ix <- setdiff(1:(lengths(gregexpr("\\W+", strng)) + 1), spanish_ix)

spanish <- paste(stringr::word(strng, spanish_ix), collapse = " ")
english <- paste(stringr::word(strng, english_ix), collapse = " ")

#spanish
#[1] "[usualmente] [me levanto como a las nueve y media]"
#> english
#[1] "so maybe like I exercise and the I like either go to class #online or in person like it depends on the day"

Note to remove the pesky brackets just do: spanish <- gsub("\\]|\\[", "", spanish)

TechQA.

How to extract text within brackets in Excel .CSV file in R?

There are 1 answers

Related Questions in R

Related Questions in EXTRACT

Related Questions in TEXT-EXTRACTION

Related Questions in DATA-EXTRACTION

Popular Questions

Trending Questions