We are using Spring Cloud Data Flow running in Kubernetes platform. Sometimes when we run large number of partitions (pods), some partitions are failing. We want to restart those failed tasks only.
Based on the suggestion from this Stackoverflow post: Spring cloud task in dataflow dashboard is complete but one partition of spring batch failed we added below 2 flags to command line arguments of DeployerPartitionHandler.
List<String> commandLineArgs = new ArrayList<>();
commandLineArgs.add("--spring.profiles.active=worker");
commandLineArgs.add("--spring.cloud.task.initialize-enabled=false");
commandLineArgs.add("--spring.batch.initializer.enabled=false");
commandLineArgs.add("--spring.cloud.task.closecontextEnabled=true");
commandLineArgs.add("--spring.cloud.task.batch.failOnJobFailure=true");
commandLineArgs.add("--spring.batch.job.enabled=false");
Our job configuration is very similar to this class: https://github.com/spring-cloud/spring-cloud-task/blob/main/spring-cloud-task-samples/partitioned-batch-job/src/main/java/io/spring/JobConfiguration.java#LL100-L101C67.
But, when we restart the job from SCDF Dashboard, it is restarting all the partitions again. We want to start only failed partitions. Are we missing some config in the JobConfiguration class ? How can restart just failed partitions using the job number ?
Thanks in advance.