hunspell: Error in FUN(X[[i]], ...) : subscript out of bounds

29 views Asked by At

I tried using a function created by another user here to do a spell check and replace my text column with hunspell suggestions (this function helped another user solve the same error). Yet, it doesn't work for me (I checked this code and it worked for a smaller subset of data but splitting my data delivered the same error). My dataset is very large (100,000+ rows) and has multiple columns, so going through each row to look for problems doesn't help.

More about the dataset: captions_tidy$caption is a column of Instagram captions which I have already cleared of special characters, stop words, etc. I have two other columns, username and link - only the link is unique to each caption.

I'm new to stackoverflow so please be patient with me :)

captions_tidy <- data.frame("username" = c("_666rotten", "_666rotten", "_666rotten"), "link" = c("https://www.instagram.com/p/CAeJt6RHtLX/", "https://www.instagram.com/p/CDc_qDrnseK/", "https://www.instagram.com/p/CDrdAsjH6-e/"), "caption" = c("miss guys", "colors dis magical paints art page paintingz fo sale", "swipe 12 pinks purples mint greenish blue black cell activator"))

cleantext = function(x){

  sapply(1:length(x),function(y){
    bad = hunspell(x[y])[[1]]
    good = unlist(lapply(hunspell_suggest(bad),`[[`,1))

    if (length(bad)){
      for (i in 1:length(bad)){
        x[y] <<- gsub(bad[i],good[i],x[y])
      }}})
  x
}
captions_tidy$caption <- cleantext(captions_tidy$caption)
0

There are 0 answers