I'm trying to get user identities
from the Zendesk API for a few hundred thousand user id
s, using Python 3.4.3, and requests
library. It works for many user ids, and then my program receives a bad response from Zendesk API.
Below is the relevant Python function:
def get_user_identities(user_id):
url = config.zendesk_api_url + '/api/v2/users/' + user_id + '/identities.json'
session = requests.Session()
session.auth = config.credentials
response = ''
while True:
try:
response = session.get(url)
except requests.ConnectionError as error:
logger.error("ConnectionError: {0}".format(error))
num_seconds = 30
logger.info("Sleeping for {} seconds...".format(num_seconds))
time.sleep(num_seconds)
else:
break
while True:
response = session.get(url)
if response.status_code == 429:
logger.info('Rate limited! Waiting for {} seconds'.format(response.headers['retry-after']))
time.sleep(int(response.headers['retry-after']))
else:
break
if response.status_code != 200:
logger.error('Error with status code {}'.format(response.status_code))
exit()
data = response.json()
This function is called within a loop, retrieving the user identity
for thousands of users without any problems, but then it exits because of a bad HTTP response status:
Traceback (most recent call last):
File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 595, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 393, in _make_request
six.raise_from(e, None)
File "<string>", line 2, in raise_from
File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 389, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.4/http/client.py", line 1171, in getresponse
response.begin()
File "/usr/lib/python3.4/http/client.py", line 351, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.4/http/client.py", line 321, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: ''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 330, in send
timeout=timeout
File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 640, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/local/lib/python3.4/dist-packages/urllib3/util/retry.py", line 287, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='companyname.zendesk.com', port=443): Max retries exceeded with url: /api/v2/users/1608220001/identities.json (Caused by ProtocolError('Connection aborted.', BadStatusLine("''",)))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/emre.sevinc/code/company-zendesk/get_user_identities.py", line 72, in <module>
get_user_identities(user_id)
File "/home/emre.sevinc/code/company-zendesk/get_user_identities.py", line 42, in get_user_identities
response = session.get(url)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 467, in get
return self.request('GET', url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 455, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 558, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 378, in send
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='companyname.zendesk.com', port=443): Max retries exceeded with url: /api/v2/users/1608220001/identities.json (Caused by ProtocolError('Connection aborted.', BadStatusLine("''",)))
But when I test the same URL to get the user identity using HTTPie, it works just fine:
$ http -a [email protected]:password https://companyname.zendesk.com/api/v2/users/1608220001/identities.json
HTTP/1.1 200 OK
Cache-Control: must-revalidate, private, max-age=0
Connection: keep-alive
Content-Encoding: gzip
Content-Type: application/json; charset=UTF-8
Date: Tue, 12 Sep 2017 15:11:39 GMT
ETag: W/"8135d41f9068e1c2b45d0f307c6431d4"
Last-Modified: Mon, 09 Nov 2015 20:55:44 GMT
Server: nginx
Strict-Transport-Security: max-age=31536000;
Transfer-Encoding: chunked
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Rack-Cache: miss
X-Rate-Limit: 700
X-Rate-Limit-Remaining: 416
X-Request-Id: f1320883-caf0-4d33-cd94-a0369f4368f8
X-Runtime: 0.381444
X-UA-Compatible: IE=Edge,chrome=1
X-Zendesk-API-Version: v2
X-Zendesk-Application-Version: v40.20
X-Zendesk-Origin-Server: app15.pod3.dub1.zdsys.com
X-Zendesk-Request-Id: a0606a3ae1d043968f53
{
"count": 1,
"identities": [
{
"created_at": "2015-11-09T20:55:44Z",
"deliverable_state": "deliverable",
"id": 1020870341,
"primary": true,
"type": "email",
...
Can it be that Zendesk REST API endpoint is 'thinking' that I'm trying to "scrape" it and deliberately disconnecting? as suggested at https://stackoverflow.com/a/33226080/236007 ?
Or is it something else, and do you have suggestion to make it work? (Except faking the user agent?)
Apparently, the code has to catch one more exception,
urllib3.exceptions.MaxRetryError
, and an HTTP status code as well (BAD_GATEWAY_ERROR = 502
), to work around what Zendesk REST API endpoint throws at it:After the changes above, it was able to successfully retrieve more than 700.000 records from Zendesk REST API endpoint.
The issues I've encountered look like Zendesk servers' behavior in such cases.