The purpose of this question is to ask the community how to go about partially updating a field without removing any other contents of that field.
There are many examples in StackOverflow to partially update ElasticSearch _source fields using python, curl, etc. The elasticsearch python library comes equipped with a elasticsearch.helpers
folder with functions - parallel_bulk
, streaming_bulk
, bulk
, which allow developers to easily update documents.
If users have data in a pandas dataframe, one can easily iterate over the rows to create a generator to update/create documents in elasticsearch. Elasticsearch documents are immutable, thus, when an update occurs elasticsearch takes the information being passed to create a new document, incrementing the docs version, while updating what needs to be updated. If a document has a field as a list, if the update request has a single value it will replace the entire list with that new value. (Many SO QAs covering this). I do not want to replace the value of that list with the new value, but instead to update a single value in a list to a new value.
For example, in my _source I have a field as ['101 country drive', '35 park drive', '277 thunderroad belway']. This field has three values, but let's say we realize that this document is incorrect and we need to update '101 country drive' to '1001 country drive'.
I do not want to delete the other values in the list, instead, I want to only update the index value with a new value.
Do I need to write a painless script to perform this action, or is there another method to perform this action?
Example: Want to update the document From ---
{'took': 176,
'timed_out': False,
'_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
'hits': {'total': {'value': 0, 'relation': 'eq'},
'max_score': None,
'hits': [{'_index': 'docobot', '_type': '_doc', '_id': '19010239',
'_source': {'name': 'josephine drwaler', 'address': ['101 country drive', '35 park drive', '277 thunderroad belway']
}}]}}
to
{'took': 176,
'timed_out': False,
'_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
'hits': {'total': {'value': 0, 'relation': 'eq'},
'max_score': None,
'hits': [{'_index': 'docobot', '_type': '_doc', '_id': '19010239',
'_source': {'name': 'josephine drwaler', 'address': ['1001 country drive', '35 park drive', '277 thunderroad belway']
}}]}}
Notice that the address is updated only for the first index, but the index number should not be a factor in updating the value of address in _source.
What is the most efficient and pythonic way to go about partially updating documents in elasticsearch while keeping the integrity of the remaining values in that field?
the
_source
is what is passed to Elasticsearch in the API request, it's not a "field" in the same context of whataddress
is consideredthat said, you need to replace the entire
address
field with what you want, not just the value you want corrected. Elasticsearch assumes that what you pass in is what the entirety of the field's value should be and will overwrite that field with what it gets