Spark Streaming using Tweepy

1.1k views Asked by At

I'm trying to stream twitter data using python library Tweepy. I have setup working environment googled about the stuff but i'm not getting how things are working. I want to use spark streaming (DStream - Batch processing) with python (tweepy). I have go through at least following links:

Following tweepy code is working fine for me:

import tweepy

consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)

politicsTweets = tweepy.Cursor(api.search, q='#GONAWAZGO').items(100)

for tweet in politicsTweets:
    print tweet.created_at, tweet.text, tweet.lang

but it's not using spark streaming. How should I update the aforementioned code to use Spark Streaming? I'm not getting why do I need two separate files? Overall I'm trying to do the followings:

  1. Get top 10 hashtags from 1st May, 2017. (Tweepy search function accepts parameter 'since_id', not getting how to use it [http://docs.tweepy.org/en/latest/api.html#help-methods ]?)
  2. Count how many times #GONAWAZGO found since 11th May, 2013.
  3. Count how many #gonawazgo were done by people outside of Pakistan. (Without any date limit, Tweepy cursor method accepts geocode but I want tweets from locations other than the provided geocode.)
  4. Observe the trend about France Elections on Twitter.
  5. Find the most recent tweets done by [https://twitter.com/imrankhanpti ] twitter account. (Tweepy search method accepts userID, how I can get that?)

Above all I'm a bit confused about when to use Twitter REST/Streaming API. I think for 1st and 2nd point REST API should be used as we are processing past data till date and for remaining Streaming API should be used.

1

There are 1 answers

0
harsh pamnani On

Twitter search API has a 7-day limit. That means you can not get any data older than 7 days. Here is a link to Twitter search API documentation. Have a look at the description mentioned for "until" parameter:

https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets.html

I hope that helps!