I have IBM WebSphere Application 8.5 server work with Db2 11.1 works from 2 years. Since a month the Application server hangs, the dB CPU goes to 0 and the application server CPU go to >80 , and hang after nearly 24 hour the same problem repeats every day. with logs on app server
db2diag Error today 2020-12-09-10.03.24.732486+120 I1234525159E610 LEVEL: Error PID : 5737 TID : 139739072030464 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : WPJCR APPHDL : 0-38161 APPID: ::ffff:x.42258.201209075007 UOWID : 199 ACTID: 1 AUTHID : DB2INST1 HOSTNAME: ERTUWCMDB1Az EDUID : 1760 EDUNAME: db2agent (WPJCR) 0 FUNCTION: DB2 UDB, common communication, sqlcctest, probe:50 MESSAGE : sqlcctest RC DATA #1 : Hexdump, 2 bytes 0x00007F1789BFCDE0 : 3600 6.
2020-12-09-10.03.24.732661+120 I1234525770E601 LEVEL: Error PID : 5737 TID : 139739072030464 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : WPJCR APPHDL : 0-38161 APPID: ::ffff:x.42258.201209075007 UOWID : 199 ACTID: 1 AUTHID : DB2INST1 HOSTNAME: ERTUWCMDB1Az EDUID : 1760 EDUNAME: db2agent (WPJCR) 0 FUNCTION: DB2 UDB, base sys utilities, sqeAgent::AgentBreathingPoint, probe:10 CALLED : DB2 UDB, common communication, sqlcctest RETCODE : ZRC=0x00000036=54
[11/3/20 6:42:13:596 EET] 000006ad XATransaction E J2CA0027E: An exception occurred while invoking rollback on an XA Resource Adapter from DataSource jdbc/wpjcrdbDS, within transaction ID {XidImpl: formatId(57415344), gtrid_length(36), bqual_length(54),
data(000001758c648aa7000000082a775800f8c220c5f6bdab92156eae0be31e28ea7605ade8000001758c648aa7000000082a775800f8c220c5f6bdab92156eae0be31e28ea7605ade8000000010000000000000000000000000001)} : com.ibm.db2.jcc.am.XaException: [jcc][t4][2041][12326][4.25.13] Error executing XAResource.rollback(). Server returned XAER_NOTA. ERRORCODE=-4203, SQLSTATE=null
After a while the dB CPU goes to 0 and the application server CPU go to >80 and hang after nearly 24 hour the same problem repeats.
is this deadlock or locktimeout due to data corruption??
Without seeing any other app server logs, the combination of you noting that
would lead me to look for a change in your network where an connection timeout has been set recently, closing connections after 24 hours. This can be caused by replacing a router or upgrading firmware where settings are different. Does this occur at about the same time everyday and if so, is it occurring as the app goes from a quiet state (like overnight) to a busy state (like start of a workday)? Based on your answer, it sounds like the entire connection pool is becoming "stale" overnight, meaning the connections are not being used and a network timeout is causing them to become disconnected from the db server. You can try changing the WAS datasource settings for "Minimum connections" to 0 and the "Unused Timeout" to perhaps 12 hours. This will allow the connection pool to drain overnight as the server traffic quiesces. As the app load starts in the morning, new connections will be obtained, avoiding the errors. If your "Maximum Connections" settings is very large, you may experience some slowness as the connection pool is being filled.