c3p0 DataSource monitor deadlock - all threads hang - how to fix

3.1k views Asked by At

We have a Spring based application and recently we went into production. We are using Spring @Controller that ultimately hit DAOs that use JDBCTemplate. It is using c3p0's ComboPooledDataSource

On an increased load (something like 150 concurrent users), the application hangs for all users - the DataSource gets locked by something - on a thread dump, there are like 200 threads that say - obviously the DataSource is deadlocked.

"http-bio-8080-exec-440" - Thread t@878
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Native Method)
- waiting on <146d984e> (a com.mchange.v2.resourcepool.BasicResourcePool)
at com.mchange.v2.resourcepool.BasicResourcePool.awaitAvailable(BasicResourcePool.java:1418)
at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:606)
at com.mchange.v2.resourcepool.BasicResourcePool.checkoutResource(BasicResourcePool.java:526)
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutAndMarkConnectionInUse(C3P0PooledConnectionPool.java:756)
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:683)
at com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource.getConnection(AbstractPoolBackedDataSource.java:140)
at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111)
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:573)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:637)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:666)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:674)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:718)

After that point the application becomes unusable unless restarted. When this happened DBA team didn't observe any load on the database.

At that time c3p0 was configured like that:

app_en.driverClass=com.mysql.jdbc.Driver
app_en.user=tapp_en
app_en.password=tapp_en
app_en.jdbcUrl=jdbc:mysql://10.10.0.102:3306/tapp_en?useUnicode=true&characterEncoding=utf-8&autoReconnect=true

app_en.acquireIncrement=5
app_en.maxIdleTime=3600
app_en.maxIdleTimeExcessConnections=300
app_en.unreturnedConnectionTimeout=3600
app_en.numHelperThreads=6
app_en.minPoolSize=20
app_en.maxPoolSize=100
app_en.idleConnectionTestPeriod=120
app_en.testConnectionOnCheckin=true

After that, I changed c3p0's configuration as follows - and enabled DEBUG logging for com.mchange.v2.c3p0 package:

app_en.driverClass=com.mysql.jdbc.Driver
app_en.user=tapp_en
app_en.password=tapp_en
app_en.jdbcUrl=jdbc:mysql://10.10.0.102:3306/tapp_en?    useUnicode=true&characterEncoding=utf-8&autoReconnect=true

app_en.acquireIncrement=5
app_en.maxIdleTime=180
app_en.maxIdleTimeExcessConnections=60
app_en.unreturnedConnectionTimeout=30
app_en.checkoutTimeout=10000
app_en.numHelperThreads=12
app_en.debugUnreturnedConnectionStackTraces=true
app_en.initialPoolSize=10
app_en.maxPoolSize=100
app_en.idleConnectionTestPeriod=120
app_en.preferredTestQuery="select 1 from tbl_users"

With this configuration in place, I again ran load tests and the application still hanged... although the threads recover after they are unable to obtain connection to the database. Even though, the game hanged for too many users even though the threads recovered unlike the previous configuration - so they had to restart their clients. Although all logging was enabled, the c3p0 logs don't log any deadlock messages. The error messages I see are just that:

[06/24/2015 12:20:54] [C3P0PooledConnectionPoolManager[identityToken->1oed6dl9a9ak8qsgqfvdu|4d6145af]-HelperThread-#10] DEBUG NewPooledConnection  - com.mchange.v2.c3p0.impl.NewPooledConnection@7f0bc55a closed by a client.
java.lang.Exception: DEBUG -- CLOSE BY CLIENT STACK TRACE
at com.mchange.v2.c3p0.impl.NewPooledConnection.close(NewPooledConnection.java:659)
at com.mchange.v2.c3p0.impl.NewPooledConnection.closeMaybeCheckedOut(NewPooledConnection.java:255)
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.destroyResource(C3P0PooledConnectionPool.java:621)
at com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask.run(BasicResourcePool.java:1024)
at com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:696)

There aren't any transactions in the application made, nor are we using any TransactionManager or TransactionTemplate. I wonder if this may be some kind of bug in the frameworks used, or misconfiguration. These are the relevant frameworks used:

c3p0-0.9.5-pre8
mysql-connector-java-5.1.24
spring-core-3.2.1.RELEASE
spring-web-3.2.1.RELEASE
mchange-commons-java-0.2.7

We really appreciate any help because this is blocking our efforts to release our product.

P.S. EDIT: Here is the configuration of the DataSource:

<bean id="app_en_DataSource" class="com.mchange.v2.c3p0.ComboPooledDataSource"
    destroy-method="close">
    <property name="driverClass" value="${app_en.driverClass}" />
    <property name="jdbcUrl" value="${app_en.jdbcUrl}" />
    <property name="user" value="${app_en.user}" />
    <property name="password" value="${app_en.password}" />

    <property name="acquireIncrement" value="${app_en.acquireIncrement}"></property>
    <property name="maxIdleTime" value="${app_en.maxIdleTime}"></property>
    <property name="maxIdleTimeExcessConnections" value="${app_en.maxIdleTimeExcessConnections}"></property>
    <property name="unreturnedConnectionTimeout" value="${app_en.unreturnedConnectionTimeout}"></property>
    <property name="checkoutTimeout" value="${app_en.checkoutTimeout}"></property>
    <property name="numHelperThreads" value="${app_en.numHelperThreads}"></property>
    <property name="debugUnreturnedConnectionStackTraces" value="${app_en.debugUnreturnedConnectionStackTraces}"></property>
    <property name="initialPoolSize" value="${app_en.initialPoolSize}"></property>
    <property name="maxPoolSize" value="${app_en.maxPoolSize}"></property>
    <property name="idleConnectionTestPeriod" value="${app_en.idleConnectionTestPeriod}"></property>
    <property name="preferredTestQuery" value="${app_en.preferredTestQuery}"></property>
</bean>

And here is some code inside the application that isn't using the jdbcTemplate directly. There is nothing else that does that, everything else is jdbcTemplate.update, jdbcTemplate.query:

    Connection conn = null;
    ResultSet getItemsRS = null;

    try {
        JdbcTemplate jdbcTemplate = getJdbcTemplate(database);

        conn = jdbcTemplate.getDataSource().getConnection();

        UserItems items;

        if (!action.areItemsNew()) {

            conn.setAutoCommit(false);
            conn.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE);

            PreparedStatement getItemsPS = conn.prepareStatement("select * from tbl_items where ownerId = ? for update",
                    ResultSet.TYPE_FORWARD_ONLY,
                    ResultSet.CONCUR_UPDATABLE);
            getItemsPS.setLong(1, userId);

            getItemsRS = getItemsPS.executeQuery();
            getItemsRS.next();

            items = new UserItemsRowMapper().mapRow(getItemsRS, getItemsRS.getRow());
        } else {
            items = new UserItems();
        }

        action.doUserItemsAction(items);

        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        ObjectOutputStream oos = new ObjectOutputStream(baos);
        oos.writeObject(items.getItemContainers());
        oos.close();
        byte[] data = baos.toByteArray();
        Blob blob = conn.createBlob();
        blob.setBytes(1, data);

        if (!action.areItemsNew()) {
            getItemsRS.updateBlob("data", blob);
            getItemsRS.updateRow();
        } else {
            jdbcTemplate.update("insert into tbl_items(ownerId,data) values(?,?)", userId, data);
        }

    } catch (Exception e) {
        logger.error(e);
        throw new RuntimeException(e);
    } finally {
        if (!action.areItemsNew()) {
            try {
                conn.commit();
                conn.close();
            } catch (SQLException e) {
                logger.error(e);
                throw new RuntimeException(e);
            }
        }
    }

The reason for this code is that I would like to block reading/writing to the user's items before they are updated by this operation action.doUserItemsAction(items) as written above.

2

There are 2 answers

15
M. Deinum On BEST ANSWER

The code you have is potentially dangerous and has a connection leak, when checking out a connection yourself you should always close it, there might be a case it fails to close the connection.

Instead I strongly suggest using Spring to manage your transactions and connections.

First annotate your method with @Transactional(isolation=SERIALIZABLE). Next add a DataSourceTransactionManager and <tx:annotation-driven /> to your configuration. After these changes rewrite the data access code you have.

JdbcTemplate jdbcTemplate = getJdbcTemplate(database);
final UserItems items;
if (!action.areItemsNew()) {
    items = jdbcTemplate.queryForObject("select * from tbl_items where ownerId = ? for update", userId, new UserItemsRowMapper());
} else {
    items = new UserItems();
}

action.doUserItemsAction(items);

String query = !action.areItemsNew() ? "update tbl_items set data=? where ownerId=?" : "insert into tbl_items(data,ownerId) values(?,?)";

byte[] data = SerializationUtils.serialize(items.getItemContainers());
jdbcTemplate.update(query, new SqlLobValue(data), userId);

Something like that (together with the aformentioned modification should work). (This was more or less from the top of my head, so it might need some tweaking). The use of proper transaction management ensures that everything is reusing the same single connection instead of multiple connections, it also ensures that a connection is returned to the pool when finished or when something goes wrong.

I would still suggest a different datasource as C3P0 is pretty old.

12
Steve Waldman On

So, a few things.

1) The "error" messages that you see are not errors, when c3p0 logs an Exception whose message begins with DEBUG, that means you are logging at DEBUG levels and c3p0 has generated the Exception just to capture the stack trace. (c3p0 is an old library; Thread.getStackTrace() didn't exist back in the day, creating an Exception was a convenient way to capture and dump the stack.) You are just logging the expected destruction of pooled Connections due to expiration or test failures. In general, c3p0 expects to log at INFO, it will be very verbose at DEBUG levels.

2) You are not deadlocking c3p0's Thread pool. If you were, you'd see APPARENT DEADLOCK messages and then recoveries. You are experiencing a condition of pool exhaustion: clients are awaiting Connections, but the pool is at maxPoolSize and unable to acquire them.

3) The usual cause of pool exhaustion is a Connection leak: Somewhere in your application's code path, under some (probably Exceptional) circumstances, Connections are acquired and then never close()ed. You need to be very careful to ensure that Connections are reliably close()ed in finally blocks in ways that cannot be skipped due to prior failures within the finally block. In Java 7+, use try-with-resources. In older versions, use the reliable resource cleanup idiom.

4) To test whether a Connection leak is the issue, set the c3p0 config params unreturnedConnectionTimeout and debugUnreturnedConnectionStackTraces. unreturnedConnectionTimeout will work around the problem, but yuck. More importantly, debugUnreturnedConnectionStackTraces will show you where the problem is so that you can fix it, logging the stack trace that opened the unclosed Exception at INFO. (You must set unreturnedConnectionTimeout for debugUnreturnedConnectionStackTraces to have any effect; the stack trace is logged when a Connection times out as abandoned.)

5) Although 0.9.5-pre8 is probably ok, the current production version of c3p0 is c3p0-0.9.5.1 (which depends upon mchange-commons-java v.0.2.10). You might think about using that. I don't think it has anything at all to do with your issue, but still.

I hope this helps!

Update: Since you've now posted code that shows the likely Connection leak, here's a suggestion for how to fix it. Replace your finally block with:

} finally {
    if ( conn != null ) {
        try { if (!action.areItemsNew()) conn.commit(); }
        catch (SQLException e) {
           logger.error(e);
           throw new RuntimeException(e);
        } finally {
           conn.close()
        }
    }
}

Update 2: The redone finally block above will solve the Connection leak, but if I were you I'd also change the logic of this code regarding commit(). Here's a suggested revision:

Connection conn = null;
ResultSet getItemsRS = null;

try {
    JdbcTemplate jdbcTemplate = getJdbcTemplate(database);

    conn = jdbcTemplate.getDataSource().getConnection();

    UserItems items;

    if (!action.areItemsNew()) {

        conn.setAutoCommit(false);
        conn.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE);

        PreparedStatement getItemsPS = conn.prepareStatement("select * from tbl_items where ownerId = ? for update",
                ResultSet.TYPE_FORWARD_ONLY,
                ResultSet.CONCUR_UPDATABLE);
        getItemsPS.setLong(1, userId);

        getItemsRS = getItemsPS.executeQuery();
        getItemsRS.next();

        items = new UserItemsRowMapper().mapRow(getItemsRS, getItemsRS.getRow());
    } else {
        items = new UserItems();
    }

    action.doUserItemsAction(items);

    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    ObjectOutputStream oos = new ObjectOutputStream(baos);
    oos.writeObject(items.getItemContainers());
    oos.close();
    byte[] data = baos.toByteArray();
    Blob blob = conn.createBlob();
    blob.setBytes(1, data);

    if (!action.areItemsNew()) {
        getItemsRS.updateBlob("data", blob);
        getItemsRS.updateRow();
        conn.commit();
    } else {
        jdbcTemplate.update("insert into tbl_items(ownerId,data) values(?,?)", userId, data);
    }
} catch (Exception e) {
    logger.error(e);
    throw new RuntimeException(e);
} finally {
    try { if ( conn != null ) conn.close(); }
    catch ( Exception e )
      { logger.error(e); }
}

Now commit() will get called only if (!action.areItemsNew()) AND all expected operations have succeeded. Before commit() would get called even if something went wrong. The resource cleanup code is much simpler and cleaner too. Note that in the suggested version, if there's an Exception on close() it's logged, but it is not wrapped and rethrown as a RuntimeException. Usually if there's an Exception on close(), there will have been a more informative Exception prior and that's the one you want to see. If the only place an Exception occurs is on close(), it means that all database operations have succeeded so your application can proceed correctly despite the fault. (If there are lots of Exceptions on close(), eventually you'd exhaust the Connection pool, but in practice that would happen only if there's something badly wrong with your database or network.)