java process is frozen(?) on linux

3.9k views Asked by At

This is my first question on S.O.
I have a very odd problem.
Below is my problem...

I write very simple method that write some text to a file.
Of course it works well my machine(XP, 4CPU, jdk1.5.0_17[SUN])
But it somtimes freezes on operating server
(Linux Accounting240 2.4.20-8smp, 4CPU, jdk1.5.0_22[SUN]).

kill -3 doesn't work.
ctrl + \ doesn't work.

So, I can't show you the thread dump.

It freezes well.. When I just write some Thread.sleep(XX) at this method, the problem is gone well(?)...
sleep(XX) break... it happened again today with Thread.sleep(XX)...

Do you know this problem? Do you have the some solution about that? Thanks. :-)

P.S.
linux distribution: Red Hat Linux 3.2.2-5
command: java -cp . T

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStreamWriter;
import java.text.SimpleDateFormat;
import java.util.Date;

public class T {
private BufferedWriter writer = null;

private void log(String log) {
    try {
        if (writer == null) {
            File logFile = new File("test.log");
            writer = new BufferedWriter(new OutputStreamWriter(
                    new FileOutputStream(logFile, true)));
        }
        writer.write(new SimpleDateFormat("[yyyy-MM-dd HH:mm:ss] ")
                .format(new Date()));
        writer.write("[" + log + "]" + "\n");
        writer.flush();

         /*
                         *  this is ad hoc solution ???
                         */
        //Thread.sleep(10);
    } catch (Exception e) {
        e.printStackTrace();
    } finally {         
    }

}

public void test() {
    long startTime = System.currentTimeMillis();

    while (true) {
        log(String.valueOf(System.currentTimeMillis()));
        System.out.println(System.currentTimeMillis());
        try {
            //Thread.sleep((int) (Math.random() * 100));
        } catch (Exception e) {
            break;
        }

        if (System.currentTimeMillis() - startTime > 1000 * 5) {
            break;
        }
    }

    if (writer != null) {
        try {
            writer.close();
        } catch (Exception e) {
        }
    }
    System.out.println("OK");
}

public static void main(String[] args) {
    new T().test();
}
}
3

There are 3 answers

0
Thorbjørn Ravn Andersen On BEST ANSWER

If the JVM does not respond to kill -3 then it is not your program but the JVM that is failing which is bad and would require a bug report to Sun.

I noticed you are running a 2.4.20-8smp kernel. This is not a typical kernel for a current open source Linux distribution, so I would suggest you have a look at http://java.sun.com/j2se/1.5.0/system-configurations.html to see if you are deploying to a supported configuration. If not, you should let the responsible people know this!

1
Andrzej Doyle On

The first step is to get a thread dump of where the program is when it "freezes". If this were on Java 6, you could connect JVisualVM or JConsole to it by default, and get the stacktraces of all the threads from there. Since it's Java 5, you should be able to use the jstack command to get a thread dump (or you could enable JMX with a command-line option to attach the aforementioned tools, but I don't think it's worth it in this case). In all cases, pressing Ctrl-Break from the console that launched the application may also produce a thread dump, depending on the environment.

Do this several times a few seconds apart and then compare the thread dumps. If they're always identical, then it looks like your application is deadlocked; and the top line of the dump will show exactly where the threads are blocking (which will give a very good clue, when you look at that line of the code, which resources they're blocked on).

On the other hand if the thread dumps change from time to time, the program is not strictly deadlocked but looks like it's running in an infinite loop - perhaps one of your loop conditions is not declared properly so the threads never exit or something of that sort. Again, look at the set of thread dumps to see what area of code each thread is looping around in, which will give you an idea of the loop condition that is never evaluating to an exit condition.

If the issue isn't obvious from this analysis, post back the dumps as it will help people debug your above code.

0
lorenzog On

I think this is a race condition. The while(true) will force the VM on linux to write and flush continuously, and the linux kernel VM will try to intercept those calls and buffer the writing. This will make the process spinloop while waiting for the syscall to be completed; at the same time, it will be picked up by the scheduler and assigned to another CPU (I might be wrong here, tho). The new CPU will try to acquire a lock on the resource, and everything will result in a deadlock.

This might be a sign of other issues to come. I suggest:

  • first of all, for clarity's sake: move the file creation outside of the log() method. That's what constructors are for.

  • secondly, why are you trying to write to a file like that? Are you sure your program logic makes sense in the first place? Would you not rather write your log messages to a container (say, an ArrayList) and every XX seconds dump that to disk in a separate thread? Right now you're limiting your logging ability to your disk speed: something you might want to avoid.