How to get large list of followers Tweepy

13.6k views Asked by At

I'm trying to use Tweepy to get the full list of followers from an account with like 500k followers, and I have a code that gives me the usernames for smaller accounts, like under 100, but if I get one that's even like 110 followers, it doesn't work. Any help figuring out how to make it work with larger numbers is greatly appreciated!

Here's the code I have right now:

import tweepy
import time

key1 = "..."
key2 = "..."
key3 = "..."
key4 = "..."

accountvar = raw_input("Account name: ")

auth = tweepy.OAuthHandler(key1, key2)
auth.set_access_token(key3, key4)

api = tweepy.API(auth)

ids = []
for page in tweepy.Cursor(api.followers_ids, screen_name=accountvar).pages():
     ids.extend(page)
     time.sleep(60)

users = api.lookup_users(user_ids=ids)
for u in users:
     print u.screen_name

The error I keep getting is:

Traceback (most recent call last):
  File "test.py", line 24, in <module>
    users = api.lookup_users(user_ids=ids)
  File "/Library/Python/2.7/site-packages/tweepy/api.py", line 321, in lookup_users
    return self._lookup_users(post_data=post_data)
  File "/Library/Python/2.7/site-packages/tweepy/binder.py", line 239, in _call
    return method.execute()
  File "/Library/Python/2.7/site-packages/tweepy/binder.py", line 223, in execute
    raise TweepError(error_msg, resp)
tweepy.error.TweepError: [{u'message': u'Too many terms specified in query.', u'code': 18}]

I've looked at a bunch of other questions about this type of question, but none I could find had a solution that worked for me, but if someone has a link to a solution, please send it to me!

4

There are 4 answers

0
Leb On

The twitter API only allows 100 users to be searched for at a time. That's why no matter how many you input to it you'll get 100. The followers_id is giving you the correct number of users but you're being limited by GET users/lookup

What you'll need to do is iterate through each 100 users but staying within the rate limit.

1
mataxu On

I actually figured it out, so I'll post the solution here just for reference.

import tweepy
import time

key1 = "..."
key2 = "..."
key3 = "..."
key4 = "..."

accountvar = raw_input("Account name: ")

auth = tweepy.OAuthHandler(key1, key2)
auth.set_access_token(key3, key4)

api = tweepy.API(auth)

users = tweepy.Cursor(api.followers, screen_name=accountvar).items()

while True:
    try:
        user = next(users)
    except tweepy.TweepError:
        time.sleep(60*15)
        user = next(users)
    except StopIteration:
        break
    print "@" + user.screen_name

This stops after every 300 names for 15 minutes, and then continues. This makes sure that it doesn't run into problems. This will obviously take ages for large accounts, but as Leb mentioned:

The twitter API only allows 100 users to be searched for at a time...[so] what you'll need to do is iterate through each 100 users but staying within the rate limit.

You basically just have to leave the program running if you want the next set. I don't know why mine is giving 300 at a time instead of 100, but as I mentioned about my program earlier, it was giving me 100 earlier as well.

Hope this helps anyone else that had the same problem as me, and shoutout to Leb for reminding me to focus on the rate limit.

0
Alec On

To extend upon this:

You can harvest 3,000 users per 15 minutes by adding a count parameter:

users = tweepy.Cursor(api.followers, screen_name=accountvar, count=200).items()

This will call the Twitter API 15 times as per your version, but rather than the default count=20, each API call will return 200 (i.e. you get 3000 rather than 300).

6
Himanshu Punetha On

Twitter provides two ways to fetch the followers: -

  1. Fetching full followers list (using followers/list in Twitter API or api.followers in tweepy) - Alec and mataxu have provided the approach to fetch using this way in their answers. The rate limit with this is you can get at most 200 * 15 = 3000 followers in every 15 minutes window.
  2. Second approach involves two stages:-
    a) Fetching only the followers ids first (using followers/ids in Twitter API or api.followers_ids in tweepy).you can get 5000 * 15 = 75K follower ids in each 15 minutes window.

    b) Looking up their usernames or other data (using users/lookup in twitter api or api.lookup_users in tweepy). This has rate limitation of about 100 * 180 = 18K lookups each 15 minute window.

Considering the rate limits, Second approach gives followers data 6 times faster when compared to first approach. Below is the code which could be used to do it using 2nd approach:-

#First, Make sure you have set wait_on_rate_limit to True while connecting through Tweepy
api = tweepy.API(auth, wait_on_rate_limit=True,wait_on_rate_limit_notify=True)

#Below code will request for 5000 follower ids in one request and therefore will give 75K ids in every 15 minute window (as 15 requests could be made in each window).
followerids =[]
for user in tweepy.Cursor(api.followers_ids, screen_name=accountvar,count=5000).items():
    followerids.append(user)    
print (len(followerids))

#Below function could be used to make lookup requests for ids 100 at a time leading to 18K lookups in each 15 minute window
def get_usernames(userids, api):
    fullusers = []
    u_count = len(userids)
    print(u_count)
    try:
        for i in range(int(u_count/100) + 1):            
            end_loc = min((i + 1) * 100, u_count)
            fullusers.extend(
                api.lookup_users(user_ids=userids[i * 100:end_loc])                
            )
        return fullusers
    except:
        import traceback
        traceback.print_exc()
        print ('Something went wrong, quitting...')

#Calling the function below with the list of followeids and tweepy api connection details
fullusers = get_usernames(followerids,api)

Hope this helps. Similiar approach could be followed for fetching friends details by using api.friends_ids inplace of api.followers_ids

If you need more resources for rate limit comparison and for 2nd approach, check below links:-