EDITED I'm trying to find out how to delete data from Elasticsearch according to a criteria. I know that older versions of ElasticSearch had Delete By Query feature, but it had really serious performance issues, so it was removed. I know also for that there is a Java plugin for delete by query:
org.elasticsearch.plugin:delete-by-query:2.2.0
But I don't know if it has a better implementation of delete which has a better performance or it's the same as the old one.
Also, someone suggested using scroll to remove data, but I know how to retrieve data scrolling, not how to use scroll to remove!
Does anyone have an idea (the amount of documents to remove in a call would be huge, over 50k documents.
Thanks in advance!
Finally used this guy's third option
You are correct that you want to use the scroll/scan. Here are the steps:
So you don't delete exactly using the scroll/scan, you just use that as a tool to get all the IDs for the records that you want to delete. In this way you're only deleting N records at a time and not all 50,000 in 1 chunk (which would cause you all kinds of problems).