I have an index that contains 1 document. This document has a field1 with value "B" and a field2 with value "C, D, E" (that is, the value in field2 is comma separated and can have variable lenght). I want to create a new index, that contains the following 3 documents: field1: "B" and field2:"C" field1:"B" and field2:"D" field1:"B" and fielde:"E" I was thinking about using a watcher to reindex the already existing documents and creating the new field at the same time. But I'm not sure how to do this nor if this is the correct approach.
How to create index with new fields, parsing values from field in existing index?
48 views Asked by gbs At
2
There are 2 answers
6
On
You can use ingest pipeline with script processors.
add dummy data
POST _bulk
{"index":{"_index":"source_index","_id":"1"}}
{"field1": "B", "field2": "C,D,E,F,G"}
create the ingest pipeline
PUT _ingest/pipeline/split-pipeline
{
"description": "Split values in field2 dynamically",
"processors": [
{
"split": {
"field": "field2",
"separator": ","
}
},
{
"script": {
"source": """
for (int i = 0; i < ctx.field2.size(); i++) {
ctx["field2_" + i] = ctx.field2[i];
}
"""
}
},
{
"remove": {
"field": "field2"
}
}
]
}
reindex the data
POST _reindex
{
"source": {
"index": "source_index"
},
"dest": {
"index": "destination_index",
"pipeline": "split-pipeline"
}
}
search the new data
GET destination_index/_search
OUTPUT:
"hits": [
{
"_index": "your_destination_index",
"_id": "1",
"_score": 1,
"_source": {
"field1": "B",
"field2_4": "G",
"field2_3": "F",
"field2_2": "E",
"field2_1": "D",
"field2_0": "C"
}
}
]
Note: For real time data flow you can add the ingest pipeline into your source_index settings.
PUT source_index/_settings
{
"index.default_pipeline": "split-pipeline"
}
For the existing data you can use
_update_by_queryAPI call.
POST source_index/_update_by_query?pipeline=split-pipeline

I managed to do this with a version of the following python script: