I'm trying to stream twitter data using python library Tweepy. I have setup working environment googled about the stuff but i'm not getting how things are working. I want to use spark streaming (DStream - Batch processing) with python (tweepy). I have go through at least following links:
- How to get tweets of a particular hashtag in a location in a tweepy?
- http://spark.apache.org/docs/latest/streaming-programming-guide.html
- http://docs.tweepy.org/en/v3.5.0/streaming_how_to.html
- Retrieving Twitter data using Tweepy
- http://www.awesomestats.in/spark-twitter-stream
Following tweepy code is working fine for me:
import tweepy
consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)
politicsTweets = tweepy.Cursor(api.search, q='#GONAWAZGO').items(100)
for tweet in politicsTweets:
print tweet.created_at, tweet.text, tweet.lang
but it's not using spark streaming. How should I update the aforementioned code to use Spark Streaming? I'm not getting why do I need two separate files? Overall I'm trying to do the followings:
- Get top 10 hashtags from 1st May, 2017. (Tweepy search function accepts parameter 'since_id', not getting how to use it [http://docs.tweepy.org/en/latest/api.html#help-methods ]?)
- Count how many times #GONAWAZGO found since 11th May, 2013.
- Count how many #gonawazgo were done by people outside of Pakistan. (Without any date limit, Tweepy cursor method accepts geocode but I want tweets from locations other than the provided geocode.)
- Observe the trend about France Elections on Twitter.
- Find the most recent tweets done by [https://twitter.com/imrankhanpti ] twitter account. (Tweepy search method accepts userID, how I can get that?)
Above all I'm a bit confused about when to use Twitter REST/Streaming API. I think for 1st and 2nd point REST API should be used as we are processing past data till date and for remaining Streaming API should be used.
Twitter search API has a 7-day limit. That means you can not get any data older than 7 days. Here is a link to Twitter search API documentation. Have a look at the description mentioned for "until" parameter:
https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets.html
I hope that helps!