Unexpected result using the stemDocument function from the tm (text mining) R package

315 views Asked by At

When using the stemDocument function from the tm (text mining) R package the word "already" is converted to "alreadi"

for example:

I am analyzing a number of tweets in a corpus document.

One of the tweets show the following prior to executing the command:

inspect(myCorpus[98])
<<VCorpus (documents: 1, metadata (corpus/indexed): 0/0)>>

[[1]]
<<PlainTextDocument (metadata: 7)>>
select   member  jeffroky  attending sqlsat   true  already eventdt httptcoquyndcgs sqlpass

After executing the following line of code:

myCorpus <- tm_map(myCorpus, stemDocument, language = "english")>
inspect(myCorpus[98])

I obtain the following result:

[[1]] 
PlainTextDocument (metadata: 7) 
select   member  jeffroki  attend sqlsat   true alreadi eventdt   httptcoquyndcg sqlpass

Please note the change in the word "already" to "alreadi" Can someone shed some light regarding this behaviour?

Thanks! Luis

1

There are 1 answers

0
Prashant Mishra On

You need to use a stem Completion function. Try

stemCompletion("alreadi", dictionary = myCorpus)

Refer to this post https://stackoverflow.com/a/25391686/2748373