Strange behavior in dplyr when mapping language vector on tm::stopwords

67 views Asked by At

I want to extract stop words for several languages in one dplyr pipeline using this code:

    library(tidyverse)
    library(qdap)
    library(tm)
    map_dfr(tibble(language=c("english", "italian")), tm::stopwords)

Which gives me uninformative error message:

Error in file(con, "r") : invalid 'description' argument In addition: Warning message: In if (is.na(resolved)) kind else if (identical(resolved, "porter")) "english" else resolved : the condition has length > 1 and only the first element will be used

Can some one explain this and suggest work around. I would like to have tibble where each row corresponds to language title and respective list (vector) of stop words?

1

There are 1 answers

0
akrun On

It is not looping as intended. The unit here is a single column. We need to extract the column and loop

library(tidyverse)
out <- map(tibble(language=c("english", "italian"))$language, ~ tm::stopwords(.x))

Or another option is

tibble(language=c("english", "italian")) %>% 
   mutate(stop_words = Vectorize(stopwords)(language))
# A tibble: 2 x 2
#   language stop_words  
#  <chr>    <named list>
#1 english  <chr [174]> 
#2 italian  <chr [279]>