How to tell Hibernate's 2nd Level Cache to use the proper Id?

274 views Asked by At

In a Spring Boot 3 application using Hibernate 6 and Ehcache 3 I ran into a weird problem. My entities have an id property which property-name is prefixed by the entity name, so for example a Display entity would have an id named displayId.

The entity with cache annotation looks like that:

@Entity
@Access(AccessType.FIELD)
@org.hibernate.annotations.Cache(region = "display-cache",
                                 usage = CacheConcurrencyStrategy.NONSTRICT_READ_WRITE)
@Table(name = "display")
public class Display {

    @Id
    @Column(name = "display_id")
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long displayId;

    @Column(name = "description")
    private String description;

    //...
}

Now, as long as I query for a display using the built-in findById() method of the JpaRepository everything is fine and the display gets cached as expected. But when I try to query for the id-property field displayId itself there is no caching, although the parameter is the id of the entity:

The JpaRepository looks like this:

public interface DisplayRepository extends JpaRepository<Display, Long> {

    Optional<Display> findByDisplayId(Long displayId);

}

And the query looks like that:

displayRepository.findByDisplayId(id);  // cache is not working

// just for comparison:
displayRepository.findById(id);         // cache is working

So my question is:

How can I tell Hibernate that the used displayId is the id of the entity so that Hibernate will do the caching as expected?

2

There are 2 answers

1
Andrey B. Panfilov On BEST ANSWER

Implementing 2nd-level cache in Hibernate is not so straightforward as you might read in someone's blogposts, and the rule of thumb is following: if you are able to implement caching on service/business level - do that without looking back and stay clear of 2nd-level cache.

Below are some my thoughts based on my previous researches:

I. The implementation is buggy and no one is going to fix that: 2nd-level cache does not work as expected, that worth to note I have discovered those issues using integration tests only, so, I have no idea why this functionality is not covered by tests in Hibernate project.

II. You need to think about how are you going to deal with stale data in cache: you may both override actual data with the stale one or make wrong decisions based on stale data, both situations do not look good. The most straightforward way is to enable version stamp checks via @Version fields, however, "version checking" is completely another universe and you may face with challenges you have never faced with before (on the other hand I can't understand how someone uses JPA without version checking)

III. Do not use spring caching capabilities together with JPA repositories: spring is not designed for caching mutable data, JPA entities are mutable by design, instead of performance improvements you will get wrong data in DB.

IV. If application modifies entities via update (@Modifying and @Query(update/delete) in JPA repositories), those operations invalidate caches - avoid using such patterns

V. Query caching does not work at all due to following reasons:

  1. If you are retrieving entities, HBN caches corresponding identifiers, successive cache hits will retrieve entities using those cached ids one-by-one, slow-by-slow
  2. modifying entities from the same query space (i.e. entities backed by the same tables as tables involved in the query to be cached) invalidates query cache - HBN is unable to figure out whether the entity to be updated/deleted affects query result or not, so it invalidates everything.

VI. global/distributed caches seem to be useless, local caches accepting remote invalidation messages seem to be OK: the problem is if entity does not have "a lot of" associations retrieving it from DB via single query should not be slower than retrieving it from remote cache, so, from user experience perspective global/distributed cache does improve nothing.

VII. I do believe the idea of controlling cache behaviour via annotations over entity classes is completely wrong, the point is following: entities just define data, however assumptions about possible optimisations and data consistency is a responsibility of particular application, so, in my opinion the best option to setup caching is to take advantage of org.hibernate.integrator.spi.Integrator, for example:

@Override
public void integrate(Metadata metadata, SessionFactoryImplementor sessionFactory, SessionFactoryServiceRegistry serviceRegistry) {
    for (PersistentClass persistentClass : metadata.getEntityBindings()) {
        if (persistentClass instanceof RootClass) {
            RootClass rootClass = (RootClass) persistentClass;
            if ("myentity".equals(rootClass.getEntityName())) {
                rootClass.setCached(true);
                rootClass.setCacheRegionName("myregion");
                rootClass.setCacheConcurrencyStrategy(AccessType.NONSTRICT_READ_WRITE.getExternalName());
            }
        }
    }
}

VIII The safest way of implementing 2nd-level cache in Hibernate is following:

  1. at first, let HBN to feed up 2nd-level cache:
@Bean
public HibernatePropertiesCustomizer hibernateSecondLevelCacheCustomizer() {
    return map -> {
        map.put(AvailableSettings.JPA_SHARED_CACHE_RETRIEVE_MODE, CacheRetrieveMode.BYPASS);
        map.put(AvailableSettings.JPA_SHARED_CACHE_STORE_MODE, CacheStoreMode.USE);
    };
}

  1. after that you may call EntityManager#find(Class<T>, Object, Map<?,?>) method with AvailableSettings#JAKARTA_JPA_SHARED_CACHE_RETRIEVE_MODE property set to CacheRetrieveMode#USE if you think it is appropriate

As regards to your problem...

There are a couple of options to retrieve entity by id in Hibernate:

  1. EntityManager#find - the most common one, does respect both 2nd- and 1st-level caches
  2. EntityManager#getReference - instead of retrieving an entity, it creates proxy object, there are some scenarios when it could be useful, however HBN implementation seems to be broken: successive call of EntityManager#find returns proxy object instead of full-functional entity
  3. Session#byMultipleIds - allows to retrieve entities of the same type in batches, does respect both 2nd- and 1st-level caches, unfortunately, is not supported by JPA repositories
  4. via JPQL query like select e from entity e where e.id=:id - the most bizarre option to do the simple thing:
    • when auto flush is enabled (which is actually a reasonable default for the most Hibernate applications), Hibernate tends to keep DB state in sync with persistence context, which in turn means that before executing any JPQL query Hibernate will check whether persistence context contains dirty entities (that takes some time) and flush those dirty entities into DB.
    • if entity to be retrieved is already present in persistent context, Hibernate won't refresh its state using DB data, such behaviour seems to be weird

From JPA repository perspective every declared method, which is not default and is not implemented by base repository (SimpleJpaRepository in the most cases) or fragment, is backed up by JPQL query and, thus, may not work as intended/desired in some corner cases.

so, the best option for particular case is to give up on using naming convention which causes performance issues, if that is not possible you may take advantage of using default methods:

default Optional<Display> findByDisplayId(Long displayId) {
   return findById(displayId);
}
10
VonC On

When querying by displayId with findByDisplayId(id), the cache does not seem to recognize displayId as the identifier of the entity and does not operate as expected, unlike when using findById(id) which works fine.

A possible solution would be to use Hibernate's @NaturalId annotation, also mentioned here.
A natural ID is an immutable business key which is unique within the scope of a particular database table. In your case, the displayId is acting as this business key.
But since displayId is already annotated with @Id, indicating it as the primary key for the entity: overloading the identity concept with a natural ID on the same field could lead to confusion and unexpected behavior.

Another possibility: Creating a custom query method that leverages Hibernate's 2nd Level Cache explicitly, using @Query annotation with the @Cacheable annotation to make sure caching is utilized (illustrated in "Caching in Hibernate" by Himani Prasad).

However:

The second approach is the classic query-cache approach I use for some special cases. I could do a workaround this way, but that would mean to have two caches, while the query cache is for what I know a bit slower than a dedicated entity cache alone.

Using @EntityGraph annotation to hint Hibernate to fetch the entity in a cache-friendly manner, even when combining with QueryHints. @QueryHints is mentioned in "11 JPA and Hibernate query hints every developer should know" by Thorben Janssen.

But does not work here. The goal would be to ensure that Hibernate's 2nd Level Cache is utilized when querying by displayId, without resorting to the query cache.

Andrey B. Panfilov suggests in the comments

default Optional<Display> findByDisplayId(Long displayId) {return findById(displayId);}.

When HBN is caching queries it caches ids only, so, despite any hints you have provided it will always call em.find(), moreover, such "caches" get invalidated upon any entity of corresponding type gets saved, thus such caches are completely useless.

That would indeed be a straightforward method to utilize Hibernate's 2nd Level Cache by delegating the findByDisplayId method to the built-in findById method of the JpaRepository. That makes sure the em.find() method, which is cache-aware, is utilized when querying by displayId.

public interface DisplayRepository extends JpaRepository<Display, Long> {

    default Optional<Display> findByDisplayId(Long displayId) {
        return findById(displayId);
    }

}

When caching queries, Hibernate caches entity identifiers only, not the entities themselves. And the cache gets invalidated whenever any entity of the corresponding type gets saved.

Andrey detailed that approach in "JPA Query with several different @Id columns", with:

default Optional<T> findByIdAndType(K id, String type) {
    return findById(id)
            .filter(e -> Objects.equals(e.getType(), type));
}

The EntityManager#find method would be the most efficient way to retrieve entities by ID: it backs up the CrudRepository#findById method in Spring Data JPA, which is designed to utilize Hibernate's 2nd Level Cache effectively.

From the comments: Any database call that is not made via entityManager.find() is not automatically cached by Hibernate's 1st or 2nd level cache and is treated as a custom query call.
These include derived JPA query methods and custom JPA methods, even when they use the @Id annotated primary key.

Hibernate's caching mechanism is directly tied to the findById() method in JPA repositories, and this is not an "issue" with Hibernate but rather the way spring-data-jpa is designed.

While annotations cannot directly influence Hibernate to treat custom query methods as cacheable find operations, using a default method that delegates to findById() within your repository interface is an idiomatic way to achieve the desired caching behavior with Hibernate and Spring Data JPA.