I am working on my thesis for economics and I am trying to scrape tweets between two dates for a list of users. Unfortunately, my program, which works fine for a single user breaks and throws this error when I try to loop it for the followers of an influencer. Anyone have suggestions?
Also once I get that fixed I will need to sort between two dates (I was just going to download a massive amount and then sort later using SPSS, but there must be a better way). Does anyone know a way to do this I tried this: tweepy get tweets between two dates but it didn't work and gave me super irregular results. Also if anyone knows how to make this not trip rate limits that would be great because I think that will be the next problem. :)
Sorry if the code is a little messy it is my first time coding.
The error (I am working in spyder so its a bit long):
Traceback (most recent call last):
File "C:\Users\XPS.ipython\OG + BUILD UP FROM SCRACH.py", line 91, in extract_followers(user)
File "C:\Users\XPS.ipython\OG + BUILD UP FROM SCRACH.py", line 66, in extract_followers posts = api.user_timeline(screen_name = user, count = 100, language = "en", tweet_mode="extended", include_rts = True)
File "C:\Users\XPS\Python\lib\site-packages\tweepy\binder.py", line 252, in _call return method.execute()
File "C:\Users\XPS\Python\lib\site-packages\tweepy\binder.py", line 238, in execute result = self.parser.parse(self, resp.text, return_cursors=self.return_cursors)
File "C:\Users\XPS\Python\lib\site-packages\tweepy\parsers.py", line 98, in parse result = model.parse_list(method.api, json)
File "C:\Users\XPS\Python\lib\site-packages\tweepy\models.py", line 75, in parse_list results.append(cls.parse(api, obj))
File "C:\Users\XPS\Python\lib\site-packages\tweepy\models.py", line 89, in parse for k, v in json.items(): AttributeError: 'str' object has no attribute 'items'
My Code
Import the libraries
import tweepy
from textblob import TextBlob
import pandas as pd
import re
import matplotlib.pyplot as plt
import csv
plt.style.use('fivethirtyeight')
Twitter API Credentials
consumerkey = ('a')
consumersecret = ('a')
bearer = ('a')
token = ('a')
tokensecret = ('a')
Create the authentication object
authenticate = tweepy.OAuthHandler(consumerkey, consumersecret)
#Set the access token
authenticate.set_access_token(token, tokensecret)
#create the API object while passing in the auth info
api = tweepy.API(authenticate, wait_on_rate_limit= True, wait_on_rate_limit_notify=True)
Create a function to clean the tweets
def cleanTxt(text):
text = re.sub('@[A-Za-z0–9]+', '', text) #Removing @mentions
text = re.sub('#', '', text) # Removing '#' hash tag
text = re.sub('RT[\s]+', '', text) # Removing RT
text = re.sub('https?:\/\/\S+', '', text) # Removing hyperlink
text = re.sub('https?:\/\/\S+', '', text) # Removing hyperlink
return text
Create a function to get the subjectivity
def getSubjectivity(text):
return TextBlob(text).sentiment.subjectivity
Create a function to get the polarity (how positive or negative the txt is)
def getPolarity(text):
return TextBlob(text).sentiment.polarity
#list of followers
name_list = ['_prashantnair','urxnlc', 'Gurmeet1018', 'arpit8691yahooc', 'bnirmaljain', 'anoldschoolboy', 'rrpatange']
In some versions I just call an excel - The full list is a few thousand per influencer
Create function to extract 100 tweets from the influencer with dates
def extract_followers (user):
results = []
posts = api.user_timeline(screen_name = user, count = 100, language = "en", tweet_mode="extended", include_rts = True)
for tweet in posts:
data = (
tweet.full_text,
tweet.created_at,
tweet.user.screen_name)
results.append(data)
cols = "Tweets Date screen_name".split()
global df
df = pd.DataFrame(results, columns=cols)
print("df original")
print (df)
for tweet in posts:
cleaned_text = cleanTxt(tweet.full_text)
with open('influencer.csv', 'a', newline= '') as f:
worksheet = csv.writer(f)
worksheet.writerow([str(tweet.user.screen_name), str(tweet.created_at), str(getSubjectivity(cleaned_text)), str(getPolarity(cleaned_text))])
print("Tweet Added")
Call extract tweets function
for user in name_list:
extract_followers(user)
Clean the tweets
df['Tweets'] = df['Tweets'].apply(cleanTxt)
Show the cleaned tweets
print('df cleaned')
print (df)
Create two new columns 'Subjectivity' & 'Polarity'
df['Subjectivity'] = df['Tweets'].apply(getSubjectivity)
df['Polarity'] = df['Tweets'].apply(getPolarity)
Show the new dataframe with columns 'Subjectivity' & 'Polarity'
print ("df with subjectivity")
print (df)