Python: Too many requests

5.2k views Asked by At

I have made a python program, which parses the the sub-reddits page and make a list of them. But the problem is whenever I try to run this program, reddit server always give me error: 429, 'too many requests'.

How can I bring down the number of requests made, so that I am not rate limited?

from bs4 import BeautifulSoup as bs
from time import sleep
import requests as req

html = req.get('http://www.reddit.com/')
print html
soup = bs(html.text)

# http://www.reddit.com/subreddits/
link_to_sub_reddits = soup.find('a',id='sr-more-link')['href']

print link_to_sub_reddits

L=[]

for navigate_the_pages in xrange(1):

        res = req.get(link_to_sub_reddits)

        soup = bs(res.text)
        # soup created
        print soup.text

        div = soup.body.find('div', class_=lambda(class_):class_ and class_=='content')
        div = div.find('div', id= lambda(id):id and id=='siteTable')

        cnt=0

        for iterator in div:

            div_thing = div.contents[cnt]

            if not div_thing=='' and div_thing.name=='div' and 'thing' in div_thing['class']:

                div_entry = div_thing.find('a',class_=lambda(class_):class_ and 'entry' in class_)
                # div with class='entry......'

                link = div_entry.find('a')['href']
                # link of the subreddit
                name_of_sub = link.split('/')[-2]
                # http://www.reddit.com/subreddits/
                # ['http:', '', 'www.reddit.com', 'subreddits', '']

                description = div_entry.find('strong').text
                # something about the community

                p_tagline = div_entry.find('p',class_='tagline')
                subscribers = p_tagline.find('span',class_='number').text

                L.append((name_of_sub, link, description, subscribers))

            elif not div_thing=='' and div_thing.name=='div' and 'nav-buttons' in div_thing['class']:
                # case when we find 'nav' button

                link_to_sub_reddits = div_thing.find('a')['href']
                break

            cnt = cnt + 1
            sleep(10)

        sleep(10)

Edit: All the guys downvoting, I don't know what grave error I have made by posting this question(feedback is appreciated). If it helps, I am 3 days old 'Pythoner'. So basically I am trying to learn Python. May be whatever I am asking is way too obvious for you guys but it's not for me. This question could help some other noob like me trying to learn Python. But thanks to downvotes it will get lost somewhere.

3

There are 3 answers

0
theprikshit On BEST ANSWER

One possible reason behind this can be that reddit may have been checking for user agent header. Since you are not adding any user agent header, reddit is flagging this as a request by bot and that's why you are getting the error. Trying adding user agent to request.

0
Anshul Goyal On

This is normal rate limiting that reddit does. The only option you have is to make lesser number of requests, or to make requests from multiple servers with different IPs (in which case your approach scales corresponding to the number of servers).

From the wikipedia description for HTTP error code 429:

429 Too Many Requests (RFC 6585):

The user has sent too many requests in a given amount of time. Intended for use with rate limiting schemes.

0
camz On

First try to find out how often you are allowed to send requests and compare it to the maximum rate you are sending requests.

When you find the point where you make requests too often, add something simple such as time.sleep(interval) between each request to make sure you wait enough time between them.

If you want to be clever, you could write something to time how long it has been since your last request, or count how many you have made in the recent time period. You can then use this information to decide how long to sleep for.

EDIT: In fact looking at the rules page: https://github.com/reddit/reddit/wiki/API#rules

Monitor the following response headers to ensure that you're not exceeding the limits:
  X-Ratelimit-Used: Approximate number of requests used in this period
  X-Ratelimit-Remaining: Approximate number of requests left to use
  X-Ratelimit-Reset: Approximate number of seconds to end of period
Clients connecting via OAuth2 may make up to 60 requests per minute.

It seems that they tell you in the response how many requests you can make, and how long you have to wait until you get more. When you have no remaining requests to use, sleep until the end of the minute.