Elastic Search Reindex : Wait for completion

9.3k views Asked by At

I'm trying to reindex 2695140 documents, using Nest C#. I need to calculate how much time it taken to reindex all the documents, for which I've written the logs. But after running for 1 minute, my code is returning an invalid response (Failed) but the documents are getting indexed properly as we have triggered Reindex endoint of elastic search.

I would want my code should wait until the reindex operation is completed so that I can calculate the total time taken to reindex. Below is the code I'm using

return await Client.ReindexOnServerAsync(selector => selector
                                .Source(src => src
                                  .Index(_config.SomeIndex))
                               .Destination(dest => dest
                                  .Index(newIndexName).OpType(OpType.Index))
                               .WaitForCompletion(true));

Thanks in advance.

2

There are 2 answers

4
Sahil Gupta On

I would want my code should wait until the reindex operation is completed

I don't know which programming language are you using but essentially for languages following "One Thread per Request" model it is not wise to wait for the reindex operation. The time taken by the operation will be proportional to the number of documents to re-index and it blocks the thread (consuming resource) until the operation is complete.

Instead you should:

  1. Re-index without waiting for completion e.g :
POST _reindex?wait_for_completion=false
   {
    "source":{
        "index":"book"
    },
    "dest":{
        "index":"book_new1"
    }
}

Response: will have task_id

  1. Use tasks API for tracking the completion of task. It will also contain status whether the request succeeded or not and the time taken by the operation. Sample response for tasks API will look like :
{
  "completed" : true,
  "task" : {
    "node" : "jF8smI1eR1mwwNxl8_7z2A",
    "id" : 2427911
    },
    "description" : "reindex from [book] to [book_new1][_doc]",
    "start_time_in_millis" : 1600335207787,
    "running_time_in_nanos" : 640430472,
    "cancellable" : true,
    "headers" : { }
  },
  "response" : {
    "took" : 634,  // <====== Time taken by operation
    "timed_out" : false,
    "total" : 3,
    "updated" : 0,
    "created" : 3,
    "deleted" : 0,
    "batches" : 1,
    "version_conflicts" : 0,
    "noops" : 0
  }
}
  1. You can periodically check (using cron/scheduler/etc) until completion and take required action.
0
Tomasz Hławiczka On

In addition to the @sahil-gupta answer please checkout this option wait for completion - after starting an asynchronous task you can wait for a specific task or even all started tasks to be completed using another request:

curl "http://127.0.0.1:9200/_tasks/?wait_for_completion=true&timeout=100s"

Please note that except of a standard timeout response (json, related to the timeout query parameter) it may end up with an error like this:

context deadline exceeded (Client.Timeout exceeded while awaiting headers)

so it is needed to handle possible-long-tasks with a simple loop of such requests.