Running sentiment analysis for google news headlines faced error while using udpipe

277 views Asked by At

Here is my code so far

pacman::p_load(dplyr, ggplot2, stringr, udpipe, lattice)
gnewsheadlines <- read.csv(file.choose(), stringsAsFactors = F)

udmodel_english <- udpipe_load_model(file = "C:/Users/Palam/Documents/english-ewt-ud-2.5-191206.udpipe")

Step 2 – count the number of total headlines by date and plot the results to examine

headlinegoogle <- gnewsheadlines %>% filter(date >= "3/31/2022 ", date <= "4/3/2022")

s <- udpipe_annotate(udmodel_english,headlinegoogle$headline)
x <- data.frame(s)

This is the error i got while running the udpipe_annotate:

Error in `[.data.table`(out, , `:=`(c("token_id", "token", "lemma", "upos",  :
Supplied 10 columns to be assigned an empty list (which may be an empty data.table or data.frame since they are lists too). To delete multiple columns use NULL instead. To add multiple empty list columns, use list(list()).

In addition: Warning message:

In strsplit(x$conllu, "\\n", fixed = TRUE) : input string 1 is invalid UTF-8
1

There are 1 answers

0
AudioBubble On

Looks like headlinegoogle$headline is not in UTF-8 encoding. See https://cran.r-project.org/web/packages/udpipe/vignettes/udpipe-tryitout.html