What are you trying to do?
I have location data of some sensors, I want to make geo-spatial queries to find which sensors are in a specific area (query by polygon, bounding-box, etc). The location data (lat-lon) for these sensors may change in the future. I should be able to paste json files in ndjson format
in the watched folder and overwrite the existing data with the new location data for each sensor.
I also have another filestream input for the indexing the logs of these sensors.
I went through docs for deduplication and filestream input for ndjson and followed them exactly.
Show me your configs.
# ============================== Filebeat inputs ===============================
filebeat.inputs:
- type: filestream
id: "log"
enabled: true
paths:
- D:\EFK\Data\Log\*.json
parsers:
- ndjson:
keys_under_root: true
add_error_key: true
fields.doctype: "log"
- type: filestream
id: "loc"
enabled: true
paths:
- D:\EFK\Data\Location\*.json
parsers:
- ndjson:
keys_under_root: true
add_error_key: true
document_id: "Id" # Not working as expected.
fields.doctype: "location"
processors:
- copy_fields:
fields:
- from: "Lat"
to: "fields.location.lat"
fail_on_error: false
ignore_missing: true
- copy_fields:
fields:
- from: "Long"
to: "fields.location.lon"
fail_on_error: false
ignore_missing: true
# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:
hosts: ["localhost:9200"]
index: "sensor-%{[fields.doctype]}"
setup.ilm.enabled: false
setup.template:
name: "sensor_template"
pattern: "sensor-*"
# ------------------------------ Global Processors --------------------------
processors:
- drop_fields:
fields: ["agent", "ecs", "input", "log", "host"]
What does your input file look like?
{"Id":1,"Lat":19.000000,"Long":20.00000,"key1":"value1"}
{"Id":2,"Lat":19.000000,"Long":20.00000,"key1":"value1"}
{"Id":3,"Lat":19.000000,"Long":20.00000,"key1":"value1"}
It's the 'Id' field here that I want to use for deduplicating (overwriting with new) documents.
Update 10/05/22 : I have also tried working with:
- json.document_id: "Id"
filebeat.inputs - type: filestream id: "loc" enabled: true paths: - D:\EFK\Data\Location\*.json json.document_id: "Id"
- ndjson.document_id: "Id"
filebeat.inputs - type: filestream id: "loc" enabled: true paths: - D:\EFK\Data\Location\*.json ndjson.document_id: "Id"
- Straight up document_id: "Id"
filebeat.inputs - type: filestream id: "loc" enabled: true paths: - D:\EFK\Data\Location\*.json document_id: "Id"
- Trying to overwrite _id using copy_fields
processors: - copy_fields: fields: - from: "Id" to: "@metadata_id" fail_on_error: false ignore_missing: true
Elasticsearch config has nothing special other than disabled security. And it's all running on localhost.
Version used for Elasticsearch, Kibana and Filebeat: 8.1.3
Please do comment if you need more info :)
References:
- Deduplication in Filebeat: https://www.elastic.co/guide/en/beats/filebeat/8.2/filebeat-deduplication.html#_how_can_i_avoid_duplicates
- Filebeat ndjson input: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-filestream.html#_ndjson
- Copy_fields in Filebeat: https://www.elastic.co/guide/en/beats/filebeat/current/copy-fields.html#copy-fields
Tldr;
I believe you are close to the solution.
There are indeed a few way to set the
@metadata._id
Which are documentedSolution
Set up:
I have this document:
data.ndjson
And those settings:
filebeat.yml
Results
When I am running those I get:
As you can see, the first two documents do have an id, the last one do not.
Conclusion
I expect you have to modify your config so that:
filebeat.yml