Sunday, August 11, 2013

TroubleShoot to Find High CPU Thread on Solaris JVM

Unexpected High CPU Usage with Weblogic Server

On Solaris


  1. Run the prstat UNIX command on the Java process.

$ prstat -L -p 9499 1 1
  PID  USERNAME   SIZE  RSS  STATE  PRI NICE   TIME    CPU  PROCESS/LWPID
  9499 usera      153M  100M sleep   58    0   0:00.22 0.6% java/8
  9499 usera      153M  100M sleep   58    0   0:00.10 0.2% java/10
  9499 usera      153M  100M sleep   58    0   0:00.11 0.1% java/9
  9499 usera      153M  100M sleep   58    0   0:00.03 0.0% java/5
  9499 usera      153M  100M sleep   58    0   0:01.01 0.0% java/1
  9499 usera      153M  100M sleep   58    0   0:00.00 0.0% java/12
  9499 usera      153M  100M sleep   58    0   0:00.00 0.0% java/11
  9499 usera      153M  100M sleep   58    0   0:00.00 0.0% java/14
  9499 usera      153M  100M sleep   58    0   0:00.00 0.0% java/13
  9499 usera      153M  100M sleep   59    0   0:00.07 0.0% java/7
  9499 usera      153M  100M sleep   59    0   0:00.00 0.0% java/6
  9499 usera      153M  100M sleep   59    0   0:00.00 0.0% java/4
  9499 usera      153M  100M sleep   58    0   0:00.11 0.0% java/3
  9499 usera      153M  100M sleep   58    0   0:00.00 0.0% java/2

  1. Run the pstack command.
Example: pstack 9499 and redirect the output to a file.
If you use regular thread libraries in Solaris (i.e., you do not have /usr/lib/lwp in the LD_LIBRARY_PATH), a LWP is not mapped to OS thread directly, so you have to take pstack from process (so check to see if you are using the alternate thread library).
The example above shows the "java/8" process at the top of prstat.
  1. Examine the pstack output for "lwp# 8".
You would find that this maps to "thread# 76" from the pstack output, as below.


-----------------  lwp# 8 / thread# 76  --------------------
 ff29d190 poll     (e2e81548, 0, bb8)
 ff24d154 select   (0, 0, 0, e2e81548, ff2bf1b4, e2e81548) + 348
 ff36b134 select   (0, bb8, 7fffffff, fe4c8000, 0, bb8) + 34
 fe0f62e4 __1cCosFsleep6FpnGThread_xl_i_ (0, bb8, fe4c8000, 1, 0, 1e2fd8) + 234
 fe23f050 JVM_Sleep (2, 0, bb8, fe4de978, fe4c8000, 1e2fd8) + 22c
 0008f7ac ???????? (e2e818d4, bb8, 1e2fd8, 984a4, 0, 109a0)
 0008c914 ???????? (e2e8194c, 1, fe4d6a80, 98564, 8, e2e81868)
 fe5324e8 __1cMStubRoutinesG_code1_ (e2e819d8, e2e81c10, a, f6cb5000, 4, e2e818f0) + 3ec
 fe0cbe94 __1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_ (e2e81c08,fe4c8000, e2e81b54, 1e2fd8, 8e764, e2e81c10) +308
 fe1f6dbc __1cJJavaCallsMcall_virtual6FpnJJavaValue_nLKlassHandle_nMsymbolHandlee81c08, e2e81b54) + 150pnGThread__v_(f6cb64b8, e2e81b40, e2e81b44, fe4c8000, e2d8) + 60e_5pnGThread__v_ (e2e81c08, e2e81c04, e2e81c00,e2e81bf4, e2e81bec, 1e2f8000, e2e81d10, 1e, e) + 120FpnKJavaThread_pnGThread__v_ (f6817ff8, 1e2fd8, fe4c
 7fd70) + 3d8cKJavaThreadDrun6M_v_ (e2e02000, fe4d3e34, fe4c8000, 7fd70, 1e2fd8,
 fe213ec8 _start   (fe4c8000, fe625d10, 0, 5, 1, fe401000) + 20
 ff36b728 _thread_start (1e2fd8, 0, 0, 0, 0, 0) + 40

  1. Get a thread dump of the server by doing:
kill -3 <PID> on the Java Process.
  1. Since lwp# 8 mapped to thread #76, you can calculate the hex value of 76 to be 4c.
This maps to nid=0x4c in the JVM thread dump:

"Thread-6" prio=5 tid=0x1e2fd8 nid=0x4c waiting on monitor [0xe2e81000..0xe2e819d8]
        at java.lang.Thread.sleep(Native Method)
        at weblogic.management.deploy.GenericAppPoller.run(GenericAppPoller.java:139)

In this example, the thread using the most CPU is actually sleeping.  The application poller runs when the server is started in Development Mode. It runs every 30 seconds, so it is apparent that the thread dump was not captured in time to see activity in this thread.
Ideally, all three steps should be done quickly and in a row to capture the data as close together in time as possible. This can be done via a simple shell script like below:

#
# Takes an argument (PID of the WLS process) and loops three times.  This will append the prstat information to a file called dump_high_cpu.txt.  The thread dump information will either be in file where stdout was redirected or printed on the screen.
#

for loopnum in 1 2 3
do
   prstat -L -p $1 1 1 >> dump_high_cpu.txt
   pstack $1  >> dump_high_cpu.txt
   kill -3  $1
   echo "prstat, pstack, and thread dump done. #" $loopnum
   sleep 1
   echo "Done sleeping."
done

  1. Then you can then examine that thread to determine what it is doing and if there is a problem.

No comments:

Post a Comment