The application I'm working on is a Java-based ETL process that loads data into multiple tables. The DBMS is Infobright (a MYSQL-based DBMS geared for data warehousing).
The data loading should be done atomically; however, for performance reasons, I want to load data into multiple tables at the same time (using a LOAD DATA INFILE
command). This means I need to open multiple connections.
Is there any solution that allows me to do the loads atomically and in parallel? (I'm guessing the answer might depend on the engine for the tables I load into; most of them are Brighthouse, which allows Transactions, but no XA and no Savepoints).
To further clarify, I want to avoid a situation where let's say:
- I load data into 5 tables
- I commit the loads for the first 4 tables
- The commit for the 5th table fails
In this situation, I can't rollback the first 4 loads, because they are already commited.
Intro
As I've promised I've hacked up a complete example. I've used MySQL and created three tables like the following:
test2
contains a single row initially.(I've posted the full code to http://pastebin.com.)
The following example does several things.
threads
to3
which determines how many jobs are going to be run in parallel.threads
number of connections.a
for every table).threads
number of jobs to be run and loads them with data.threads
number of threads and waits for their completion (successful or not).(Note, that I've used Java 7's automatic resource management feature in
SQLTask.call()
.)Logic
Data
Tasks
Run
Result
Given the default data returned by
getTableData(...)
and the fact that
test2
already containsa
(and thedata
column is unique) the second job will fail and throw an exception, thus every connection will be rolled back.If instead of
a
s you returnb
s, then the connections will be committed safely.This can be done similarly with
LOAD DATA
.After OP's response on my answer I realized that what she/he wants to do isn't possible to do in a simple and clear manner.
Basically the problem is that after a successful commit the data that was committed can't be rolled-back, because the operation is atomic. Given multiple commits are needed in the case given, rolling-back everything isn't possible unless one tracks all data (in all of the transactions) and if somethings happens deletes everything that was successfully committed.
There is a nice answer relating to the issue of commits and rollbacks.