Oracle Job Scheduler "Stuck"

5.3k views Asked by At

We have 2 production environments say XX and XY. Both are on Oracle 11g Release 11.2.0.2.0. Both the environments have a queue table and a job queue to en-queue/de-queue jobs and process them. The application server is Apache Tomcat Version 6.0.32.

The problem I am facing is that in XX environment, the Oracle scheduler gets "hanged" or "stuck" almost daily at 4.30pm local time, whereas in XY environment it works perfectly fine. The entries for the scheduled jobs start accumulating in the queue table causing alerts to trigger off. Someone then has to re-start the application server after which the whole thing works perfectly fine and the pending jobs are processed without any issues. The DBA always says no issues in the database, but I am not really convinced.

Searching on google led me to this link : DBMS_SCHEDULER jobs stuck after upgrade to 11.2 And based on the explanation I checked and compared the DB parameter job_queue_processes. It has got the same values/details, so I am at a dead-end here.

Values for job_queue_processes

Once while re-starting the application I got the below exception on the output/display but not in the logs:

Mar 18, 2014 11:44:19 AM org.apache.catalina.startup.Catalina stopServer
SEVERE: Catalina.stop:
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at java.net.Socket.<init>(Socket.java:375)
at java.net.Socket.<init>(Socket.java:189)
at org.apache.catalina.startup.Catalina.stopServer(Catalina.java:422)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.catalina.startup.Bootstrap.stopServer(Bootstrap.java:338)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:416)

So now I am wondering,

  • Is it the case that while jobs are getting enqueued, the connection to DB is getting lost in between, causing the Oracle scheduler to "hang" in between?
  • Is there any other database parameter I need to verify/validate? Is this is a known bug in Oracle? If yes is it documented anywhere? Sorry I may have missed out reading.
  • Is there any other information that I need to ask the DBA to check?

Thanks in advance for any help.

EDIT: Link for the output of the query => select * from DBA_SCHEDULER_JOBS is given in the excel at the link: Job_output.xlsx

Thanks!

Update as on 20/Mar/2014

The problem is not resolved as yet. Hence some more output from queries is posted below in the question.

SQL> select queue,msg_state,expiration_reason,count(*)
  2  from AQ$JOB_QTBL
  3  group by queue,msg_state,expiration_reason;

QUEUE                          MSG_STATE        EXPIRATION_REASON    COUNT(*)
------------------------------ ---------------- ------------------ ----------
JOB_Q                          READY                                        4

SQL> select owner, name, queue_type, max_retries, retry_delay
  2    from dba_queues
  3   where queue_table = 'JOB_QTBL'
  4   and owner = 'ENV_XX';

OWNER                          NAME                           QUEUE_TYPE         MAX_RETRIES RETRY_DELAY
------------------------------ ------------------------------ -------------------- ----------- -----------
ENV_XX                        AQ$_JOB_QTBL_E                 EXCEPTION_QUEUE                0           0
ENV_XX                        JOB_Q                          NORMAL_QUEUE                   5           0

SQL> 
0

There are 0 answers