jsr 352 batch with retryable and skippable exception may processes items many times

1k views Asked by At

I have a batch implemented with JSR-352 (using jberet on wildfly).

I have a chunk with item-count 15 and java.lang.Exception is configured as retryable and skippable exception.

When there are many exceptions, most of the items will be processed multiple times. In this extreme case all items would throw an exception in the writer:

  • First 15 items are read
  • Exception occurs on first item
  • Chunk is rolled back and configured with item-count = 1
  • First item is read
  • Exception occurs again, item is skipped
  • Proceed with the other 14 items, exception may occur on every item, every item is skipped
  • After the first 15 items the chunk is back with item-count = 15
  • Items 16-30 are read
  • Exception occurs again
  • Reader is rolled back to latest checkpoint

At this point there is still no checkpoint because there was no successful processed item yet. Hence the reader starts with the first item again. All 30 items are processed with item-count = 1. etc.

If there are many such failures the batch would process all items again and again.

I think the checkpoint needs to be set also for skipped items because a skipped item should not be processed again.

I think this is a bug in the specification so I already opened an issue there: https://github.com/WASdev/standards.jsr352.batch-spec/issues/15 Or am I wrong and have misunderstood the implementation?

How is this implemented in Spring Batch?

1

There are 1 answers

6
Scott Kurz On BEST ANSWER

I think the specification is clear enough, which suggests this could be a JBeret bug (assuming it's not an application issue).

In the spec (an unofficial version here), the section:

8.2.1.4.3 Retry and Skip the Same Exception

says that during a retry with rollback, the items are processed one-at-a-time, (in one-item chunks), and that skip takes precedence during retry.

So if a skippable exception occurs during retry, that item would just be skipped, and an updated checkpoint should be persisted. This is how WebSphere Liberty Batch, the JSR 352 implementation I work on, does it.

So I'd suggest producing a recreate project and opening a JBeret issue if it still looks like one. At this point, I don't see a spec issue.