Edgar Package | Issue with .txt files

280 views Asked by At

Thank you already in advance for your input.

My issue is the getSentimentCount(word.frq, words.list) function of the edgar package. The function is supposed to read the word.list which is a .txt file and compare the content of another existing list (word.frq), also a .txt file.

This works fine for one .txt file which has no spaces between words, but R can still read the file as if there were (count > 1). The other file can be read (count > 1) if there is a new line after each word, but results in an error of the getSentimentCount(word.frq, words.list) function. If the .txt file is stripped of the new lines and all words put as one line (as other file) then R can only read one word (basically all the words in one line) and count = 1.

Are there different types of .txt files that R distinguishes?

LINK to both .txt files. negwords.txt works, litwords.txt results in an error.

I am grateful for any input.

2

There are 2 answers

0
Kim Sa On BEST ANSWER

SOLVED: The functiongetSentimentCount(word.frq, words.list)only reads.txt (MS-DOS).

1
Colin FAY On

The fact that R reads only one word in the nospace txt is normal: as far as R is concerned, this is just one character string without any separator.

I don't have any issue reading the other docs :

library(edgar)
wf <- getWordfrquency("R/litwords_space.txt")
neg <- readLines("R/negwords.txt")
wgs <- getSentimentCount(word.frq = wf, words.list = neg)

For now, your list of words just contains words which appear once, so the frequency table will always be one.

If ever you're into text-mining and sentiment analysis, I strongly advice you to switch to the tidytext package.

Colin