I used tweepy to download tweets in Spanish and then write them into a CSV file. I used the code below to do this:
while True:
try:
for tweet in tweets:
print tweet.created_at, tweet.text.encode('utf-8')
csvWriter.writerow([tweet.created_at, tweet.id_str, tweet.author.name.encode('utf8'), tweet.author.screen_name.encode('utf8'),
tweet.user.location.encode('utf-8'), tweet.coordinates, tweet.text.encode('utf-8'), tweet.retweet_count, tweet.favorite_count])
except tweepy.TweepError:
Now, the row containing the tweet text contains weird characters, for example: México, D.F. appears as México, D.F. I tried converting exporting the file to utf-8 in Numbers but this changes the same string to:Mí©xico, D.F.
For other tweets I also get something like this: RT @taniarin: _ôÖ‰_ôÖ‰_ôÖ‰_ôÖ‰ #UberSeQueda.
I am using pandas to read the file with this:
pd.read_csv("uber_dataFULL_utf8.csv", encoding='utf-8')
but it doesn't seem to work.
I don't know exactly what the problem is or might be. I used chardet and it detects the text as to be encoded in utf-8.
Thank you!