Why bad HTTP status with ProtocolError('Connection aborted.', BadStatusLine("''",)) when getting data from Zendesk API?

3.4k views Asked by At

I'm trying to get user identities from the Zendesk API for a few hundred thousand user ids, using Python 3.4.3, and requests library. It works for many user ids, and then my program receives a bad response from Zendesk API.

Below is the relevant Python function:

def get_user_identities(user_id):
  url = config.zendesk_api_url + '/api/v2/users/' + user_id + '/identities.json'

  session = requests.Session()
  session.auth = config.credentials

  response = ''

  while True:
    try:
      response = session.get(url)
    except requests.ConnectionError as error:
      logger.error("ConnectionError: {0}".format(error))
      num_seconds = 30
      logger.info("Sleeping for {} seconds...".format(num_seconds))
      time.sleep(num_seconds)
    else:
      break

  while True:
    response = session.get(url)
    if response.status_code == 429:
      logger.info('Rate limited! Waiting for {} seconds'.format(response.headers['retry-after']))
      time.sleep(int(response.headers['retry-after']))
    else:
      break

  if response.status_code != 200:
    logger.error('Error with status code {}'.format(response.status_code))
    exit()

  data = response.json()

This function is called within a loop, retrieving the user identity for thousands of users without any problems, but then it exits because of a bad HTTP response status:

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 595, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 393, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 389, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.4/http/client.py", line 1171, in getresponse
    response.begin()
  File "/usr/lib/python3.4/http/client.py", line 351, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.4/http/client.py", line 321, in _read_status
    raise BadStatusLine(line)
http.client.BadStatusLine: ''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 330, in send
    timeout=timeout
  File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 640, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/local/lib/python3.4/dist-packages/urllib3/util/retry.py", line 287, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='companyname.zendesk.com', port=443): Max retries exceeded with url: /api/v2/users/1608220001/identities.json (Caused by ProtocolError('Connection aborted.', BadStatusLine("''",)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/emre.sevinc/code/company-zendesk/get_user_identities.py", line 72, in <module>
    get_user_identities(user_id)
  File "/home/emre.sevinc/code/company-zendesk/get_user_identities.py", line 42, in get_user_identities
    response = session.get(url)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 467, in get
    return self.request('GET', url, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 455, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 558, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 378, in send
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='companyname.zendesk.com', port=443): Max retries exceeded with url: /api/v2/users/1608220001/identities.json (Caused by ProtocolError('Connection aborted.', BadStatusLine("''",)))

But when I test the same URL to get the user identity using HTTPie, it works just fine:

$ http -a [email protected]:password https://companyname.zendesk.com/api/v2/users/1608220001/identities.json

HTTP/1.1 200 OK
Cache-Control: must-revalidate, private, max-age=0
Connection: keep-alive
Content-Encoding: gzip
Content-Type: application/json; charset=UTF-8
Date: Tue, 12 Sep 2017 15:11:39 GMT
ETag: W/"8135d41f9068e1c2b45d0f307c6431d4"
Last-Modified: Mon, 09 Nov 2015 20:55:44 GMT
Server: nginx
Strict-Transport-Security: max-age=31536000;
Transfer-Encoding: chunked
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Rack-Cache: miss
X-Rate-Limit: 700
X-Rate-Limit-Remaining: 416
X-Request-Id: f1320883-caf0-4d33-cd94-a0369f4368f8
X-Runtime: 0.381444
X-UA-Compatible: IE=Edge,chrome=1
X-Zendesk-API-Version: v2
X-Zendesk-Application-Version: v40.20
X-Zendesk-Origin-Server: app15.pod3.dub1.zdsys.com
X-Zendesk-Request-Id: a0606a3ae1d043968f53

{
    "count": 1, 
    "identities": [
        {
            "created_at": "2015-11-09T20:55:44Z", 
            "deliverable_state": "deliverable", 
            "id": 1020870341, 
            "primary": true, 
            "type": "email", 
...

Can it be that Zendesk REST API endpoint is 'thinking' that I'm trying to "scrape" it and deliberately disconnecting? as suggested at https://stackoverflow.com/a/33226080/236007 ?

Or is it something else, and do you have suggestion to make it work? (Except faking the user agent?)

1

There are 1 answers

0
Emre Sevinç On BEST ANSWER

Apparently, the code has to catch one more exception, urllib3.exceptions.MaxRetryError, and an HTTP status code as well (BAD_GATEWAY_ERROR = 502), to work around what Zendesk REST API endpoint throws at it:

BAD_GATEWAY_ERROR = 502
RATE_LIMITED_ERROR = 429
MAX_NUM_SECONDS_TO_SLEEP = 30
MAX_NUM_OF_ALLOWED_RETRIES = 10


def get_user_identities(user_id):
  url = config.zendesk_api_url + '/api/v2/users/' + user_id + '/identities.json'

  session = requests.Session()
  session.auth = config.credentials

  script_path = get_script_path()

  num_retries = 0
  response = ''

  while True:
    if num_retries > MAX_NUM_OF_ALLOWED_RETRIES:
      logger.error('Tried more than {} times without success. Skipping the user id {} .'
                   .format(MAX_NUM_OF_ALLOWED_RETRIES, user_id))
      return

    try:
      response = session.get(url)

      if response.status_code == RATE_LIMITED_ERROR:
        logger.info('Rate limited! Waiting for {} seconds and will try again.'
                    .format(response.headers['retry-after']))
        time.sleep(int(response.headers['retry-after']))
        num_retries += 1
        continue

      if response.status_code == BAD_GATEWAY_ERROR:
        logger.info('Bad Gateway Error. Waiting for {} seconds and will try again.'
                    .format(str(MAX_NUM_SECONDS_TO_SLEEP)))
        time.sleep(MAX_NUM_SECONDS_TO_SLEEP)
        num_retries += 1
        continue

      if response.status_code != 200:
        logger.error('Error with status code {}. Skipping the user id {}'
                     .format(response.status_code, user_id))
        return

    except (requests.ConnectionError, urllib3.exceptions.MaxRetryError) as error:
      logger.error("ConnectionError: {0}".format(error))
      logger.info("Sleeping for {} seconds...".format(MAX_NUM_SECONDS_TO_SLEEP))
      time.sleep(MAX_NUM_SECONDS_TO_SLEEP)
      num_retries += 1
    else:
      break

  data = response.json()

After the changes above, it was able to successfully retrieve more than 700.000 records from Zendesk REST API endpoint.

The issues I've encountered look like Zendesk servers' behavior in such cases.