Providing a sample application to reproduce the issue, https://github.com/PSHREYASHOLLA/SamplebatchApplication.
Its a maven project, so you can call mvn install, it will create \target\SamplebatchApplication-0.0.1-SNAPSHOT.jar. Now you can start it like any springboot app(Liquibase enabled), java -jar SamplebatchApplication-0.0.1-SNAPSHOT.jar.
If you see application.properties file we are pointing it to a postgres database. All our batch configurationhttps://github.com/PSHREYASHOLLA/SamplebatchApplication/blob/main/src/main/java/com/example/postgresql/model/FEBPDBConfig.java.
Please start the batch process by calling rest post API , http://localhost:8080/batch/trigger-or-resume-application-batch-job JSON body { "appRestartJobExecutionId": "" } If we call this with empty appRestartJobExecutionId, the flow is like below, com.example.postgresql.controller.BatchController.triggerOrResumeApplicationBatchJobByB2E() --->com.example.postgresql.model.FebpApplicationJobServiceImpl.triggerApplicationBatchJob() --> We do JobLauncher.run(). Now this job will read 50 records from febp_emp_detail_test as part of reader and as part of writer writes the updated records to febp_emp_tax_detail_test. This is a happy flow.
Now if you call the above API and say after 5 seconds you kill the server, only partial commit will happen into febp_emp_tax_detail_test and the batch status will be in STARTED state. As you can see inhttps://github.com/PSHREYASHOLLA/SamplebatchApplication/blob/main/src/main/java/com/example/postgresql/model/EmployeeTaxCalculationProcessor.java
if(item.getEmpId().equals("emp0032"))
{
**TimeUnit.SECONDS.sleep(30);**
}
The records emp0031 to emp0035 will not be commiitted to DB since there is a sleep and in between I am killing the server. Now say I restart the server and call the same post API with failed job execution ID it will now call om.example.postgresql.controller.BatchController.triggerOrResumeApplicationBatchJobByB2E() --->com.example.postgresql.model.FebpApplicationJobServiceImpl.resumeApplicationBatchJob() ---> jobOperator.restart(failedBatchExecutionId);
The job changes state to completed but the record is not getting processed again and no entry into DB.
@Mahmoud Ben Hassine Removed throttleLimit as well. Please check latest code.
The step you shared is a multi-threaded step, as it is configured with a task executor here.
Concurrency in chunk-oriented steps is not compatible with restartability. That's why the restart feature is not working in that case. If you remove the task executor configuration, the restart should work as expected.
If you want to scale your job while preserving restartability, you can use other concurrency/parallelism techniques like a partitioned step.