java service wrapper parent process is hung

4.6k views Asked by At

I'm running the Tanuki Wrapper (and have been for a very very very long time). In production, it is working great, but over that few weeks I'm getting reports that the wrapper process (the C code) is hung and won't die which is causing production issues.

When I'm alerted and I take a look here is what I'm seeing:

1) The child java process was killed with SIGKILL/9 a few hours back

STATUS | wrapper | 2016/02/08 03:49:20 | JVM received a signal SIGKILL (9).

2) Then I see that a wrapper.sh stop was issues by my custom built internal watcher process to reset it, but that is entering an infinite loop as documented below: code link

stopit() {
    [snip]
            kill $pid  
            [snip]

        # MY NOTE It never gets out of this, the kill doesn't work

        # We can not predict how long it will take for the wrapper to
        #  actually stop as it depends on settings in wrapper.conf.
        #  Loop until it does.
        savepid=$pid
        CNT=0
        TOTCNT=0
        while [ "X$pid" != "X" ]
        do
            # Show a waiting message every 5 seconds.
            if [ "$CNT" -lt "5" ]
            then
                CNT=`expr $CNT + 1`
            else
                eval echo `gettext 'Waiting for $APP_LONG_NAME to exit...'`
                CNT=0
            fi
            TOTCNT=`expr $TOTCNT + 1`

            sleep 1

            testpid
        done

      [ SNIP ] 
    fi
}

3) I then log onto the box and find the wrapper process pid (remember the JVM is long dead) and issue a direct kill $pid, and wait... nothing. possible code?

4) Finally give up and issue kill -9 $pid and that finally kills it and everything cleans up and comes back alive.

QUESTIONS:

How do I trouble shoot an app where kill $pid (SIGTERM/15) does not work? This worked great for YEARS and still is on many other process, but on just a few it is failing.

Of course most of the questions and documetation on Tanuki are about how to manipulate/interrogate the child JVM, but I'm actually seeing a problem with what I assume is the C code and I'm not sure how to interrogate the hung PID for the C code to give up the secrets. Maybe something in /proc/$pid can tell me what it is hung on?

Help me Obi-Wan Kenobi, your my only hope...

1

There are 1 answers

2
Leif Mortenson On

Leif from Tanuki Software

The most likely cause of the JVM being unexpectedly killed with a SIGKILL is that the OS is out of resources and killed the process. When this happens, Java is often the biggest user of memory so it gets nailed. Please check the syslog as there should be an entry at the same time if that is the cause.

Even if that happens however, the Wrapper should be handling this correctly and restarting your JVM. It sounds like the Wrapper has gotten itself into an unexpected state and is not responding to normal signals itself. What is the version of the Wrapper that you are using? I double checked the release notes but don't think we have seen this exact problem before. http://wrapper.tanukisoftware.com/doc/english/release-notes.html

Please let me know what you find in your syslog at the time the JVM was killed.