When using the stemDocument function from the tm (text mining) R package the word "already" is converted to "alreadi"
for example:
I am analyzing a number of tweets in a corpus document.
One of the tweets show the following prior to executing the command:
inspect(myCorpus[98])
<<VCorpus (documents: 1, metadata (corpus/indexed): 0/0)>>
[[1]]
<<PlainTextDocument (metadata: 7)>>
select member jeffroky attending sqlsat true already eventdt httptcoquyndcgs sqlpass
After executing the following line of code:
myCorpus <- tm_map(myCorpus, stemDocument, language = "english")>
inspect(myCorpus[98])
I obtain the following result:
[[1]]
PlainTextDocument (metadata: 7)
select member jeffroki attend sqlsat true alreadi eventdt httptcoquyndcg sqlpass
Please note the change in the word "already" to "alreadi" Can someone shed some light regarding this behaviour?
Thanks! Luis
You need to use a stem Completion function. Try
stemCompletion("alreadi", dictionary = myCorpus)
Refer to this post https://stackoverflow.com/a/25391686/2748373