Home:ALL Converter>Multiple threads stuck in native calls (Java)

Multiple threads stuck in native calls (Java)

Ask Time:2008-09-01T15:01:34         Author:David Resnick

Json Formatter

I have a problem with an application running on Fedora Core 6 with JDK 1.5.0_08.

After some amount of uptime (usually some days) threads begin getting stuck in native methods.

The threads are locked in something like this:

"pool-2-thread-2571" prio=1 tid=0x08dd0b28 nid=0x319e waiting for monitor entry [0xb91fe000..0xb91ff7d4]
at java.lang.Class.getDeclaredConstructors0(Native Method)


"pool-2-thread-2547" prio=1 tid=0x75641620 nid=0x1745 waiting for monitor entry [0xbc7fe000..0xbc7ff554]
at sun.misc.Unsafe.defineClass(Native Method)

Especially puzzling to me is this one:

"HealthMonitor-10" daemon prio=1 tid=0x0868d1c0 nid=0x2b72 waiting for monitor entry [0xbe5ff000..0xbe5ff4d4]
at java.lang.Thread.dumpThreads(Native Method)
at java.lang.Thread.getStackTrace(Thread.java:1383)

The threads remain stuck until the VM is restarted.

Can anyone give me an idea as to what is happening here, what might be causing the native methods to block? The monitor entry address range at the top of each stuck thread is different. How can I figure out what is holding this monitor?

Author:David Resnick,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/37551/multiple-threads-stuck-in-native-calls-java
VoidPointer :

My initial suspicion would be that you are experiencing some sort of class-loader realted dead lock. I imagine, that class loading needs to be synchronized at some level because class information will become available for the entire VM, not just the thread where it was initially loaded.\n\nThe fact that the methods on top of the stack are native methods seems to be pure coincidence, since part of the class loading mechanism happens to implemented that way.\n\nI would investigate further what is going on class-loading wise. Maybe some thread uses the class loader to load a class from a network location which is slow/unavailable and thus blocks for a really long time, not yielding the monitor to other threads that want to load a class. Investigating the output when starting the JVM with -verbose:class might be one thing to try.",
David Smith :

I was having similar problems a few months ago and found the jthread(?) utility to be invaluable. You give it the process ID for your Java application and it will dump the entire stack for each thread in your process.\n\nFrom the output of jthread, I could see one thread was trying to obtain a lock after having entered a monitor and another thread was trying to enter the monitor after obtaining the lock. A recipe for deadlock.\n\nI was also wondering if your application was running into a garbage collection issue. You say it runs for a couple days before it stops like this. How long have you let it sit in the stuck state to see if maybe the GC ever finishes?",
VoidPointer :

Can you find out which thread is actually synchronizing on the monitor on which the native method is waiting?\nAt least the thread-dump you get from the VM when you send it a SIGQUIT (kill -3) should show this information, as in\n\n\"Thread-0\" prio=5 tid=0x0100b060 nid=0x84c000 waiting for monitor entry [0xb0c8a000..0xb0c8ad90]\n at Deadlock$1.run(Deadlock.java:8)\n - waiting to lock <0x255e5b38> (a java.lang.Object)\n...\n\"main\" prio=5 tid=0x01001350 nid=0xb0801000 waiting on condition [0xb07ff000..0xb0800148]\n at java.lang.Thread.sleep(Native Method)\n at Deadlock.main(Deadlock.java:21)\n- locked <0x255e5b38> (a java.lang.Object)\n\n\nIn the dumps you've posted so far, I can't see any thread that is actually waiting to lock a specific monitor...",
John Smithers :

Maybe you should use another jdk version.\nFor your \"puzzling one\", there is a bug entry for 1.5.0_08. A memory leak is reported (I do not know, if this is related to your problem):\nhttp://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6469701 \n\nAlso you could get the source code and look, what happens at line 1383. On the other side, it could just be the stack dump, after the original error occurred.",