what does spring batch do behind the scenes with jpa batch update chunk processing?

369 views Asked by At

I think spring batch does so much behind the scenes that it is very hard to understand.

    For instance, what if I want to work with and save 2 objects (parent and children) in 1 processor.
    How do chunks save up db/jpa statements like a save/insert, or delete, etc... if multiple things
    and objects do saves in the processor? If your chunk size = 50 does it gather up 50 processor
    sets of actions and then do them all in a writer?
    

I have this scenario:

    I have a person object, which say has item objects.
    Say the item has a part and a type
            
    say I get records:
    person-A  Item-11,header
    person-A  Item-22,body
    person-B  Item-33,title
    person-B  Item-44,index
    

my process expects to get fileInfo and return person

    in my processor I now am trying to do all processing that I need,
    so I might update the person and call personRepo.save( myperson );
    I might also need to insert Items like ItemRepo.save( new Item(..;.) );
    my writer actually does no saves.
    

Questions:

    Will these get grouped into a BatchUpdate SQL?
    when will they get persisted to the db?  in the writer only, and not the processor?
    Does the batch update group in the ItemRepo saves also and not do them until the writer?
    it might have person.save and item.save and item2.save
    How does that work?  Can I do random repo.save() and when do those happen?
    
    Then for chunking:
    What if the next person B needs to do a lookup on Person A;
    if A had changes, will the lookup not see the changes because it is in the same chunk
    and it is not saved to the db yet?
    if person A got deleted, will the lookup still find person A or not?
    
    Will I need to make chunksize = 1 so that everthing gets saved to the db
    after each write so I can see all previous changes?
1

There are 1 answers

0
johnnyutts On

Will these get grouped into a BatchUpdate SQL?

No you will have to configure this yourself

when will they get persisted to the db? in the writer only, and not the processor?

If you use JpaItemWriter it will only call persist or merge on items that are not already managed (in persistence context). The default behaviour of RepositoryItemWriter is to call saveAll on the list of item in the current chunk. Nothing gets written to the database until the chunk is committed after the writer completes. If you are doing all your crud operations in the processor the writer becomes redundant so you'll likely just want to create an empty writer

Does the batch update group in the ItemRepo saves also and not do them until the writer?
it might have person.save and item.save and item2.save
How does that work?  Can I do random repo.save() and when do those happen?

You can do what you want in the processor with regard to saving different items using different repositories, but the items won't get written to the db until the chunk is committed.

Then for chunking: What if the next person B needs to do a lookup on Person A; if A had changes, will the lookup not see the changes because it is in the same chunk and it is not saved to the db yet? if person A got deleted, will the lookup still find person A or not? Will I need to make chunksize = 1 so that everthing gets saved to the db after each write so I can see all previous changes?

using a chunk size of 1 is inefficient. If an entity is already managed a lookup will return it from the persistence context. If this is not sufficient and you have queries that by pass the persistence context you can always flush your saves to the db using entitymanager.