Python Elasticseach indexing error

2.6k views Asked by At

Elasticsearch was working well and fine before today.

Issue:

Some documents which are failing to index with error:

u'Limit of total fields [1000] in index [mintegral_incent] has been exceeded' 

Error:

"BulkIndexError: (u'14 document(s) failed to index.', [{u'index': {u'status': 400, u'_type': u'mintegral_incent', u'_id': u'168108082', u'error': {u'reason': u'Limit of total fields [1000] in index [mintegral_incent] has been exceeded', u'type': u'illegal_argument_exception'}

Using Amazon Elastic service

Elasticsearch Version 5.1

ES setup:

from elasticsearch import Elasticsearch
from elasticsearch import helpers
es_repo = Elasticsearch(hosts=[settings.ES_INDEX_URL],
                        verify_certs=True)

Code giving issue:

def bulk_index_offers(index_name, id_field, docs):
    actions = []
    for doc in docs:
        action = {
            "_index": index_name,
            "_type": index_name,
            "_id": doc.get(id_field),
            "_source": doc
        }
        actions.append(action)
    # Error at this following line.
    resp = helpers.bulk(es_repo, actions)
    return resp

What I have tried:

I have tried setting chunks to smaller and increased read_timeout to 30 from default 10 like this : resp = helpers.bulk(es_repo, actions, chunks=500, read_timeout=30)

But still facing same issue.

Please help.

2

There are 2 answers

9
Val On BEST ANSWER

By default, a mapping type is only allowed to contain up to 1000 fields and it seems you are exceeding that limit. In order to increase that threashold you can run this command:

PUT mintegral_incent/_settings
{ 
  "index": {
    "mapping": {
      "total_fields": {
        "limit": "2000"
      }
    }
  }
}

Using curl, it'd look like this:

curl -XPUT http://<your.amazon.host>/mintegral_incent/_settings -d '{ 
  "index": {
    "mapping": {
      "total_fields": {
        "limit": "2000"
      }
    }
  }
}'

Then you can run your bulk script again and it should work.

0
Pablo On

In case you want to work from Python, try:

import requests

headers = {
    'Content-Type': 'application/json',
}

resp = requests.put('http://localhost:9200/your_index/_settings',
                    headers=headers,
                    data='{"index": {"mapping": {"total_fields": {"limit": "2000"}}}}')

print(f'\nHTTP code: {resp.status_code} -- response: {resp}\n')
print(f'Response text\n{resp.text}')

You may also use a terminal like indicated above, although I had to add a header, -H'Content-Type: application/json'

curl -XPUT http://localhost:9200/your_index/_settings -d '{"index": {"mapping": {"total_fields": {"limit": "2000"}}}}' -H'Content-Type: application/json'

If you need to use curl requests (get, put, post) from Python, this guide is very helpful (it is the source for my answer), and even provides code for a nice method to handle this.