How to delete data from ElasticSearch through JavaAPI

1.4k views Asked by At

EDITED I'm trying to find out how to delete data from Elasticsearch according to a criteria. I know that older versions of ElasticSearch had Delete By Query feature, but it had really serious performance issues, so it was removed. I know also for that there is a Java plugin for delete by query:

org.elasticsearch.plugin:delete-by-query:2.2.0

But I don't know if it has a better implementation of delete which has a better performance or it's the same as the old one.

Also, someone suggested using scroll to remove data, but I know how to retrieve data scrolling, not how to use scroll to remove!

Does anyone have an idea (the amount of documents to remove in a call would be huge, over 50k documents.

Thanks in advance!

Finally used this guy's third option

1

There are 1 answers

1
jhilden On BEST ANSWER

You are correct that you want to use the scroll/scan. Here are the steps:

  1. begin a new scroll/scan
  2. Get next N records
  3. Take the IDs from each record and do a BulkDelete of those IDs
  4. go back to step 2

So you don't delete exactly using the scroll/scan, you just use that as a tool to get all the IDs for the records that you want to delete. In this way you're only deleting N records at a time and not all 50,000 in 1 chunk (which would cause you all kinds of problems).