urlib2.urlopen through proxy fails after a few calls

Question

urlib2.urlopen through proxy fails after a few calls

1.2k views Asked by Nicolas Lefebvre At 25 February 2011 at 14:56

Edit: after much fiddling, it seems urlgrabber succeeds where urllib2 fails, even when telling it close the connection after each file. Seems like there might be something wrong with the way urllib2 handles proxies, or with the way I use it ! Anyways, here is the simplest possible code to retrieve files in a loop:

import urlgrabber

for i in range(1, 100):
    url = "http://www.iana.org/domains/example/"
    urlgrabber.urlgrab(url, proxies={'http':'http://<user>:<password>@<proxy url>:<proxy port>'}, keepalive=1, close_connection=1, throttle=0)

Hello all !

I am trying to write a very simple python script to grab a bunch of files via urllib2.

This script needs to work through the proxy at work (my issue does not exist if grabbing files on the intranet, i.e. without the proxy).

Said script fails after a couple of requests with "HTTPError: HTTP Error 401: basic auth failed". Any idea why that might be ? It seems the proxy is rejecting my authentication, but why ? The first couple of urlopen requests went through correctly !

Edit: Adding a sleep of 10 seconds between requests to avoid some kind of throttling that might be done by the proxy did not change the results.

Here is a simplified version of my script (with identified information stripped, obviously):

import urllib2

passmgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
passmgr.add_password(None, '<proxy url>:<proxy port>', '<my user name>', '<my password>')
authinfo = urllib2.ProxyBasicAuthHandler(passmgr)

proxy_support = urllib2.ProxyHandler({"http" : "<proxy http address>"})
opener = urllib2.build_opener(authinfo, proxy_support)
urllib2.install_opener(opener)

for i in range(100):
with open("e:/tmp/images/tst{}.htm".format(i), "w") as outfile:
    f = urllib2.urlopen("http://www.iana.org/domains/example/")
    outfile.write(f.read())

Thanks in advance !

Original Q&A

There are 2 answers

wisty On 26 February 2011 at 18:17

The proxy might be throttling your requests. I guess it thinks you look like a bot.

You could add a timeout, and see if that gets you through.

**VGE** · Accepted Answer · 2011-03-04T07:50:54+00:00

You can minimize the number of connection by using the keepalive handler from the urlgrabber module.

import urllib2
from keepalive import HTTPHandler
keepalive_handler = HTTPHandler()
opener = urllib2.build_opener(keepalive_handler)
urllib2.install_opener(opener)

fo = urllib2.urlopen('http://www.python.org')

I am unsure that this will work correctly with your Proxy setup. You may have to hack the keepalive module.

TechQA.

urlib2.urlopen through proxy fails after a few calls

There are 2 answers

Related Questions in PYTHON

Related Questions in AUTHENTICATION

Related Questions in PROXY

Related Questions in URLLIB2

Related Questions in URLOPEN

Popular Questions

Popular Tags

Trending Questions