I have an Excel .CSV file in which one column has the transcription of a conversation. Whenever the speaker uses Spanish, the Spanish is written within brackets.
so [usualmente] maybe [me levanto como a las nueve y media] like I exercise and the I like either go to class online or in person like it depends on the day
Ideally, I'd like to extract the English and Spanish separately, so one file would contain all the Spanish words, and another would contain all the English words.
Any ideas on how to do this? Or which function/package to use?
Edited to add: there's about 100 cells that contain text in this Excel sheet. I guess where I'm confused is how do I treat this entire CSV as a "string"?
You could do this by
Vectorizeing theseqfunction and indexing, then usingstringr::wordto extract the whole words at the indices:Example string:
Code
Note to remove the pesky brackets just do:
spanish <- gsub("\\]|\\[", "", spanish)