Textblob and sentiment analysis: how to refine a dictionary?

792 views Asked by At

Many people use text blob for sentiment analysis on text. I am sure that I am missing something in understanding the approach and how to use it, but there is something that does not work at all with the results I am getting from my analysis.

This is an example of data that I have:

Top                                                     Text                                                   label    sentiment   polarity
51  CVD-Grown Carbon Nanotube Branches on Black Si...   silicon-carbon nanotube (bSi-CNT) hybrid struc...         -1    (-0.16666666666666666, 0.43333333333333335) -0.166667
69  Navy postpones its largest-ever Milan exercise...   Navy on Tuesday postponed a multi-nation mega ...           -1  (-0.125, 0.375) -0.125000
81 Malaysia rings alarm bell on fake Covid...   The United Nations International Children's Em...                   -1  (-0.5, 1.0) -0.500000
82  Poison Not Transmitted By Air...    it falls on the fabric remains 9 hours, so was...                   -1  (-0.2, 0.0) -0.200000
87  A WhatsApp rumor is spreading that is allegedl...   strict about unsourced speculation than other ...        -1 (-0.1, 0.1) -0.100000
90  Dumb Whatsapp Forwards - Page 2 - Cricket Web   as the ones that say like or share this pictur...          -1   (-0.375, 0.5)   -0.375000
144 malaysia | Unicef Malaysia rings alarm b... such messages claiming to be from us,” #Milan...                -1  (-0.5, 1.0) -0.500000
134 False and unverified claims are being...    Soccer was not issued by the U...                               -1  (-0.4000000000000001, 0.6)  -0.400000
123 Truth behind the Viral message about Co...  number of stories ever since the wave of misin...               -1  (-0.4, 0.7) -0.400000
166 In India, Fake WhatsApp Forwards on Coronaviru...   of confirmed cases of rises rapidl...                   -1  (-0.5, 1.0) -0.500000

I used the following algorithm:

df['sentiment'] = df['Top'].apply(lambda Tweet: TextBlob(Tweet).sentiment)

df1=pd.DataFrame(df['sentiment'].tolist(), index= df.index)

df_new = df
df_new['polarity'] = df1['polarity']
df_new.polarity = df1.polarity.astype(float)
df_new['subjectivity'] = df1['subjectivity']
df_new.subjectivity = df1.polarity.astype(float)
# print(df_new)

conditionList = [
    df_new['polarity'] == 0,
    df_new['polarity'] > 0,
    df_new['polarity'] < 0]
choiceList = ['neutral', 'not_fake', 'fake']
df_new['label'] = np.select(conditionList, choiceList, default='no_label')

but as you can see the all these messages come from fact checking sources, so they are not fake. How could I improve the results, maybe removing some specific words? I can see that if the text contains false, unverified, viral, fake, it is tagged as negative and this makes results even worst.

1

There are 1 answers

3
Stripedbass On

All of your text has negative polarity, so they get labeled fake as per your code.

There is no indication how that polarity field is determined, it is in the source file precalculated. If it is using textblob default polarity algo, what text is it running against?

(Also, there may be a typo. Df_new.subjectivity is getting assigned the float cast of polarity)