How do I access the original response headers that contain a redirect when using urllib2.urlopen

1.7k views Asked by At

I'm trying to parse the location header of an HTTP response that is returned after using urllib2.urlopen, but the only response headers that I receive are from the target redirect --- not the original response that contains the location header.

I have followed other questions on Stack Overflow that suggest to subclass the urllib2.HTTPRedirectHandler, but I'm still not able to understand how to access the original response that urlopen ends up following.

Here's an example of the problem:

import urllib2

req = urllib2.urlopen("http://wp.me")

print req.info()

The output of print contains the response headers of the target of the redirected request. I would like to see the original.

Any help would be appreciated.

1

There are 1 answers

1
Senthil Kumaran On BEST ANSWER

urllib2 does a transparent redirection, but as you said, you can subclass HTTPRedirectHandler and use that as an opener to get your required values.

import urllib2

class SmartRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        result = urllib2.HTTPRedirectHandler.http_error_302(self, req, fp,
                                                                 code, msg,
                                                                 headers)
        result.status = code
        result.headers = headers
        return result

request = urllib2.Request("http://wp.me")
opener = urllib2.build_opener(SmartRedirectHandler())
obj = opener.open(request)
print 'The original headers where', obj.headers
print 'The Redirect Code was', obj.status

Any further attributes that you can set for your req in the SmartRedirectHandler, can be made available to you via the result.