We have 2 production environments say XX and XY. Both are on Oracle 11g Release 11.2.0.2.0. Both the environments have a queue table and a job queue to en-queue/de-queue jobs and process them. The application server is Apache Tomcat Version 6.0.32.
The problem I am facing is that in XX environment, the Oracle scheduler gets "hanged" or "stuck" almost daily at 4.30pm local time, whereas in XY environment it works perfectly fine. The entries for the scheduled jobs start accumulating in the queue table causing alerts to trigger off. Someone then has to re-start the application server after which the whole thing works perfectly fine and the pending jobs are processed without any issues. The DBA always says no issues in the database, but I am not really convinced.
Searching on google led me to this link : DBMS_SCHEDULER jobs stuck after upgrade to 11.2 And based on the explanation I checked and compared the DB parameter job_queue_processes. It has got the same values/details, so I am at a dead-end here.
Once while re-starting the application I got the below exception on the output/display but not in the logs:
Mar 18, 2014 11:44:19 AM org.apache.catalina.startup.Catalina stopServer
SEVERE: Catalina.stop:
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at java.net.Socket.<init>(Socket.java:375)
at java.net.Socket.<init>(Socket.java:189)
at org.apache.catalina.startup.Catalina.stopServer(Catalina.java:422)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.catalina.startup.Bootstrap.stopServer(Bootstrap.java:338)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:416)
So now I am wondering,
- Is it the case that while jobs are getting enqueued, the connection to DB is getting lost in between, causing the Oracle scheduler to "hang" in between?
- Is there any other database parameter I need to verify/validate? Is this is a known bug in Oracle? If yes is it documented anywhere? Sorry I may have missed out reading.
- Is there any other information that I need to ask the DBA to check?
Thanks in advance for any help.
EDIT: Link for the output of the query => select * from DBA_SCHEDULER_JOBS
is given in the excel at the link: Job_output.xlsx
Thanks!
Update as on 20/Mar/2014
The problem is not resolved as yet. Hence some more output from queries is posted below in the question.
SQL> select queue,msg_state,expiration_reason,count(*)
2 from AQ$JOB_QTBL
3 group by queue,msg_state,expiration_reason;
QUEUE MSG_STATE EXPIRATION_REASON COUNT(*)
------------------------------ ---------------- ------------------ ----------
JOB_Q READY 4
SQL> select owner, name, queue_type, max_retries, retry_delay
2 from dba_queues
3 where queue_table = 'JOB_QTBL'
4 and owner = 'ENV_XX';
OWNER NAME QUEUE_TYPE MAX_RETRIES RETRY_DELAY
------------------------------ ------------------------------ -------------------- ----------- -----------
ENV_XX AQ$_JOB_QTBL_E EXCEPTION_QUEUE 0 0
ENV_XX JOB_Q NORMAL_QUEUE 5 0
SQL>