Recently I came across interesting article on how to perform batch operations on database using spring repository - http://knes1.github.io/blog/2015/2015-10-19-streaming-mysql-results-using-java8-streams-and-spring-data.html and I've implemented Solution Using Paging.
Everything works fine, however I don't understand how clearing entitymanager can impact other operations that may occur on the database during the time of batch processing.
- What is the relation between Spring Entity Manager and Spring Data Repository? Why clearing Spring Entity Manager impacts used memory if I perform operations on Spring Data Repository?
- How clearing Spring Entity Manager may affect other read/write operations that may occur during batch processing?
- How to create dedicated instances of Spring Entity Manager and Spring Data Repository? Now I'm using basic autowiring
@PersistenceContext private EntityManager entityManager; @Autowired private MyRepository myRepository;
- Does creating separate instance of Spring Entity Manager and Spring Data Repository for batch processing make any sense?
Thanks for your help
A Spring Data
JpaRepositoryhas a reference to anEntityManagerand uses it to implement its methods. For the fixed methods that make up theJpaRepositoryyou can find the implementations in theSimpleRepositoryThe
EntityManageris a JPA construct. All managed references have are connected to oneEntityManager. This is so the JPA implementation learns when an entity gets modified and therefore needs saving on the next flush event. In order to do that theEntityManagerhas to keep track of all the entities it encounters. With a typical web application this means:EntityManagerat the end of the request.Since only a couple of entities are involved holding a reference to all the entities is not much of a problem.
In a batch setting things often work quite different. Without special thought the natural thing would be:
EntityManager.Thus you keep many thousand or even millions of references to entities that in many cases wouldn't get touched anymore. This puts pressure on memory and also affects performance when the
EntityManageraccesses its list of entity references. Such operations are:EntityManagerBut there also benefits of the
EntityManagerand its 1st level cache (the map of entities it keeps): Entities get only loaded once. If your batch references some already existing entities (typical "master data") over and over again it doesn't get loaded over and over again, but just once perEntityManager. So you probably also don't want to flush yourEntityManagerafter every processed row.You don't. There are ways to create dedicated instances. After all, it is just Java code. But you really don't want to do that. Instead just use the sing repository instance that gets generated per type. It gets injected a single proxy for the
EntityManager. The actualEntityManagerwill live for a transaction. And get swapped at the beginning of the transaction inside the proxy mentioned above. That is part of the Spring Scope Magic.What you need to do is annotated your methods with
@Transactionalin such a way that the transactions stay within a reasonable size.No, I don't think so. It just makes things complicated.