I think I've discovered a problem with the Requests library's handling of redirects when using HTTPS. As far as I can tell, this is only a problem when the server redirects the Requests client to another HTTPS resource.
I can assure you that the proxy I'm using supports HTTPS and the CONNECT method because I can use it with a browser just fine. I'm using version 2.1.0 of the Requests library which is using 1.7.1 of the urllib3 library.
I watched the transactions in wireshark and I can see the first transaction for https://www.paypal.com/ but I don't see anything for https://www.paypal.com/home. I keep getting timeouts when debugging any deeper in the stack with my debugger so I don't know where to go from here. I'm definitely not seeing the request for /home as a result of the redirect. So it must be erroring out in the code before it gets sent to the proxy.
I want to know if this truly is a bug or if I am doing something wrong. It is really easy to reproduce so long as you have access to a proxy that you can send traffic through. See the code below:
import requests
proxiesDict = {
'http': "http://127.0.0.1:8080",
'https': "http://127.0.0.1:8080"
}
# This fails with "requests.exceptions.ProxyError: Cannot connect to proxy. Socket error: [Errno 111] Connection refused." when it tries to follow the redirect to /home
r = requests.get("https://www.paypal.com/", proxies=proxiesDict)
# This succeeds.
r = requests.get("https://www.paypal.com/home", proxies=proxiesDict)
This also happens when using urllib3 directly. It is probably mainly a bug in urllib3, which Requests uses under the hood, but I'm using the higher level requests library. See below:
proxy = urllib3.proxy_from_url('http://127.0.0.1:8080/')
# This fails with the same error as above.
res = proxy.urlopen('GET', https://www.paypal.com/)
# This succeeds
res = proxy.urlopen('GET', https://www.paypal.com/home)
Here is the traceback when using Requests:
Traceback (most recent call last):
File "tests/downloader_tests.py", line 22, in test_proxy_https_request
r = requests.get("https://www.paypal.com/", proxies=proxiesDict)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 382, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 505, in send
history = [resp for resp in gen] if allow_redirects else []
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 167, in resolve_redirects
allow_redirects=False,
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 485, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 375, in send
raise ProxyError(e)
requests.exceptions.ProxyError: Cannot connect to proxy. Socket error: [Errno 111] Connection refused.
Update:
The problem only seems to happen with a 302 (Found) redirect not with the normal 301 redirects (Moved Permanently). Also, I noticed that with the Chrome browser, Paypal doesn't return a redirect. I do see the redirect when using Requests - even though I'm borrowing Chrome's User Agent for this experiment. I'm looking for more URLs that return a 302 in order to get more data points.
I need this to work for all URLs or at least understand why I'm seeing this behavior.
This is a bug in urllib3. We're tracking it as urllib3 issue #295.