I need to do a massive insert using EJB 3, Hibernate, Spring Data and Oracle. Originally, I am using Spring Data and code is below:
talaoAITDAO.save(taloes);
Where talaoAITDAO is a Spring Data JpaRepository subclass and taloes is a Collection of TalaoAIT entity. In this entity, Its respective ID has this form:
@Id
@Column(name = "ID_TALAO_AIT")
@SequenceGenerator(name = "SQ_TALAO_AIT", sequenceName = "SQ_TALAO_AIT", allocationSize = 1000)
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "SQ_TALAO_AIT")
private Long id;
Also this entity has no related entities to do cascade insert.
My problem here, is that all entities are individually inserted (such as INSERT INTO TABLE(col1, col2) VALUES (val1, val2)
). Occasionally, it can cause a timeout and all insertions will be rolled back. I would want convert these individual inserts in batch inserts (such as INSERT INTO TABLE(col1, col2) VALUES (val11, val12), (val21, val22), (val31, val32), ...
).
Studying alternatives to improve performance, I found this page in hibernate documentation, beyond Hibernate batch size confusion and this other page. Based on them, I wrote this code:
Session session = super.getEntityManager().unwrap(Session.class);
int batchSize = 1000;
for (int i = 0; i < taloes.size(); i++) {
TalaoAIT talaoAIT = taloes.get(i);
session.save(talaoAIT);
if(i % batchSize == 0) {
session.flush();
session.clear();
}
taloes.add(talaoAIT);
}
session.flush();
session.clear();
Also, in peristence.xml, I added these properties:
<property name="hibernate.jdbc.batch_size" value="1000" />
<property name="order_inserts" value="true" />
However, although in my tests I had perceived a subtle difference (mainly with big collections and big batch sizes), it was not so big as desirable. In logging console, I saw that Hibernate continued to do individual inserts, not replacing them for massive insert. As in my entity, I am using a Sequence generator I believe that it is not problem (according Hibernate documentation, I would had problem if I was using Identity generator).
So, my question is what can be missing here. Some configuration? Some method not used?
Thanks,
Rafael Afonso.
I recently found a promising small library for batching inserts with Hibernate and Postgresql. It is called pedal-dialect and uses the Postgresql - command
COPY
which is claimed by many people to be much faster than batched inserts (references: Postgresql manual, Postgresql Insert Strategies - Performance Test, How does copy work and why is it so much faster than insert?). pedal-dialect allows to useCOPY
without fully losing the ease of use of Hibernate. You still get automatic mapping of entities and rows and don't have to implement it on your own.