What is the relation between Spring Entity Manager and Spring Data Repository?

474 views Asked by At

Recently I came across interesting article on how to perform batch operations on database using spring repository - http://knes1.github.io/blog/2015/2015-10-19-streaming-mysql-results-using-java8-streams-and-spring-data.html and I've implemented Solution Using Paging.

Everything works fine, however I don't understand how clearing entitymanager can impact other operations that may occur on the database during the time of batch processing.

  • What is the relation between Spring Entity Manager and Spring Data Repository? Why clearing Spring Entity Manager impacts used memory if I perform operations on Spring Data Repository?
  • How clearing Spring Entity Manager may affect other read/write operations that may occur during batch processing?
  • How to create dedicated instances of Spring Entity Manager and Spring Data Repository? Now I'm using basic autowiring
@PersistenceContext
private EntityManager entityManager;
@Autowired
private MyRepository myRepository;
  • Does creating separate instance of Spring Entity Manager and Spring Data Repository for batch processing make any sense?

Thanks for your help

1

There are 1 answers

0
Jens Schauder On BEST ANSWER

What is the relation between Spring Entity Manager and Spring Data Repository? Why clearing Spring Entity Manager impacts used memory if I perform operations on Spring Data Repository?

A Spring Data JpaRepository has a reference to an EntityManager and uses it to implement its methods. For the fixed methods that make up the JpaRepository you can find the implementations in the SimpleRepository

How [does] clearing the EntityManager affect other read/write operations that may occur during batch processing?

The EntityManager is a JPA construct. All managed references have are connected to one EntityManager. This is so the JPA implementation learns when an entity gets modified and therefore needs saving on the next flush event. In order to do that the EntityManager has to keep track of all the entities it encounters. With a typical web application this means:

  • you load some entities.
  • modify them.
  • maybe add or delete some entities
  • flush the lot and close the EntityManager at the end of the request.

Since only a couple of entities are involved holding a reference to all the entities is not much of a problem.

In a batch setting things often work quite different. Without special thought the natural thing would be:

  1. you load some entities.
  2. modify them.
  3. maybe add or delete some entities.
  4. repeat thousands or millions of times.
  5. at the end of the batch flush the lot and close the EntityManager.

Thus you keep many thousand or even millions of references to entities that in many cases wouldn't get touched anymore. This puts pressure on memory and also affects performance when the EntityManager accesses its list of entity references. Such operations are:

  • Checking for each loaded and persisted entity if it is already referenced in the EntityManager
  • Checking which entities need saving during the flush event and bringing all the statements to execute in the right order.

But there also benefits of the EntityManager and its 1st level cache (the map of entities it keeps): Entities get only loaded once. If your batch references some already existing entities (typical "master data") over and over again it doesn't get loaded over and over again, but just once per EntityManager. So you probably also don't want to flush your EntityManager after every processed row.

How to create dedicated instances of Spring Entity Manager and Spring Data Repository? Now I'm using basic auto-wiring.

You don't. There are ways to create dedicated instances. After all, it is just Java code. But you really don't want to do that. Instead just use the sing repository instance that gets generated per type. It gets injected a single proxy for the EntityManager. The actual EntityManager will live for a transaction. And get swapped at the beginning of the transaction inside the proxy mentioned above. That is part of the Spring Scope Magic.

What you need to do is annotated your methods with @Transactional in such a way that the transactions stay within a reasonable size.

Does creating separate instance of Spring Entity Manager and Spring Data Repository for batch processing make any sense?

No, I don't think so. It just makes things complicated.