Partially updating elasticsearch list field value using python

909 views Asked by At

The purpose of this question is to ask the community how to go about partially updating a field without removing any other contents of that field.

There are many examples in StackOverflow to partially update ElasticSearch _source fields using python, curl, etc. The elasticsearch python library comes equipped with a elasticsearch.helpers folder with functions - parallel_bulk, streaming_bulk, bulk, which allow developers to easily update documents.

If users have data in a pandas dataframe, one can easily iterate over the rows to create a generator to update/create documents in elasticsearch. Elasticsearch documents are immutable, thus, when an update occurs elasticsearch takes the information being passed to create a new document, incrementing the docs version, while updating what needs to be updated. If a document has a field as a list, if the update request has a single value it will replace the entire list with that new value. (Many SO QAs covering this). I do not want to replace the value of that list with the new value, but instead to update a single value in a list to a new value.

For example, in my _source I have a field as ['101 country drive', '35 park drive', '277 thunderroad belway']. This field has three values, but let's say we realize that this document is incorrect and we need to update '101 country drive' to '1001 country drive'.

I do not want to delete the other values in the list, instead, I want to only update the index value with a new value.

Do I need to write a painless script to perform this action, or is there another method to perform this action?

Example: Want to update the document From ---

{'took': 176,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 0, 'relation': 'eq'},
  'max_score': None,
  'hits': [{'_index': 'docobot', '_type': '_doc', '_id': '19010239', 
'_source': {'name': 'josephine drwaler', 'address': ['101 country drive', '35 park drive', '277 thunderroad belway']
}}]}}

to

{'took': 176,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 0, 'relation': 'eq'},
  'max_score': None,
  'hits': [{'_index': 'docobot', '_type': '_doc', '_id': '19010239', 
'_source': {'name': 'josephine drwaler', 'address': ['1001 country drive', '35 park drive', '277 thunderroad belway']
}}]}}

Notice that the address is updated only for the first index, but the index number should not be a factor in updating the value of address in _source.

What is the most efficient and pythonic way to go about partially updating documents in elasticsearch while keeping the integrity of the remaining values in that field?

2

There are 2 answers

0
warkolm On

the _source is what is passed to Elasticsearch in the API request, it's not a "field" in the same context of what address is considered

that said, you need to replace the entire address field with what you want, not just the value you want corrected. Elasticsearch assumes that what you pass in is what the entirety of the field's value should be and will overwrite that field with what it gets

0
Jenobi On

Need to create a painless script to update. When doing so need to keep in mind that you can access any field in source by:

ctx._source.address = ['1001 country drive', '35 park drive', '277 thunderroad belway']

But this doesn't solve the problem...

The field is a list, so we need to iterate through the list. Below painless script loops through each item, compares it to the search param, if it matches returns the answer.

def upd_address= [];
for (def item: ctx._source.address) ]
{ 
  if (item == params.search_id) {
   upd_address.add(params.answer)
    } 
  else {
   upd_address.add(item)
 }} ctx._source.address = upd_address; 

You can use the above with elasticsearch_dsl as

ubq = UpdateByQuery(using=[your es connection], doc_type='doc', index=['your index']
ubq = ubq.script(source=[above query], params={'search_id': addrss, 'answer': upd_addrss)
res = ubq.execute()
print(res, type(res))

Update query loops through each item in the list. Checks if the item is the search id, if so keep the answer else keep same id.