Recently I came across interesting article on how to perform batch operations on database using spring repository - http://knes1.github.io/blog/2015/2015-10-19-streaming-mysql-results-using-java8-streams-and-spring-data.html and I've implemented Solution Using Paging.
Everything works fine, however I don't understand how clearing entitymanager can impact other operations that may occur on the database during the time of batch processing.
- What is the relation between Spring Entity Manager and Spring Data Repository? Why clearing Spring Entity Manager impacts used memory if I perform operations on Spring Data Repository?
- How clearing Spring Entity Manager may affect other read/write operations that may occur during batch processing?
- How to create dedicated instances of Spring Entity Manager and Spring Data Repository? Now I'm using basic autowiring
@PersistenceContext private EntityManager entityManager; @Autowired private MyRepository myRepository;
- Does creating separate instance of Spring Entity Manager and Spring Data Repository for batch processing make any sense?
Thanks for your help
A Spring Data
JpaRepository
has a reference to anEntityManager
and uses it to implement its methods. For the fixed methods that make up theJpaRepository
you can find the implementations in theSimpleRepository
The
EntityManager
is a JPA construct. All managed references have are connected to oneEntityManager
. This is so the JPA implementation learns when an entity gets modified and therefore needs saving on the next flush event. In order to do that theEntityManager
has to keep track of all the entities it encounters. With a typical web application this means:EntityManager
at the end of the request.Since only a couple of entities are involved holding a reference to all the entities is not much of a problem.
In a batch setting things often work quite different. Without special thought the natural thing would be:
EntityManager
.Thus you keep many thousand or even millions of references to entities that in many cases wouldn't get touched anymore. This puts pressure on memory and also affects performance when the
EntityManager
accesses its list of entity references. Such operations are:EntityManager
But there also benefits of the
EntityManager
and its 1st level cache (the map of entities it keeps): Entities get only loaded once. If your batch references some already existing entities (typical "master data") over and over again it doesn't get loaded over and over again, but just once perEntityManager
. So you probably also don't want to flush yourEntityManager
after every processed row.You don't. There are ways to create dedicated instances. After all, it is just Java code. But you really don't want to do that. Instead just use the sing repository instance that gets generated per type. It gets injected a single proxy for the
EntityManager
. The actualEntityManager
will live for a transaction. And get swapped at the beginning of the transaction inside the proxy mentioned above. That is part of the Spring Scope Magic.What you need to do is annotated your methods with
@Transactional
in such a way that the transactions stay within a reasonable size.No, I don't think so. It just makes things complicated.