Java VM thread dumps

I have a love hate relationship with Java. It is important to understand that my inclinations are focused on system admin engineering and not on development. It seems everywhere I work java plays a major role. There are lots of talented developers who write great java code but java is not the answer to all needs. There was one occasion a developer had written a java tools to build a report. It took hours to run and sometimes would not finish in time to be useful. So I re-wrote it in awk, sed, and grep – it ran in 1/2 hour.
Java runs in a VM – which has limited visibility, especially for system calls and the like. There are tools that can help, jvisualvm, jmap, jstats, JTop, etc. and of course the application logs. Typically the logs are semi-controllable through property files (most developers use log4j) but the bottom line is that the coders are ultimately in control of what gets written into the logs. There is no trace or strace so you must rely on the jdk tools above to trouble shoot the java VM.
In that line of thinking I use a script to dump threads, get jmap histos and prstat output if available.
The jmap histos can help the developers gain some insight into memory usage and possible identify memory leaks.
The thread dumps are usefull is seeing locked threads which may be holding up other threads.
Sometimes performance issues (on solaris) can be analyzed by using the busiest prstat thread PID and comparing it with the thread identified in the thread dumps. One issue with this is thatthe prstat output the threads as decimals and thread dumps output them as hex numbers so you have to convert one of the other to find its equivalent.
This script adds a column to prstat output that converts the decimal PID into hex to facilitate analysis.
Here is the script – modify as you need – the string to search for in the ps listing needs to be changed for uniquely identifying your java VM.

#!/bin/bash
# script: tdumos.sh
# quick-n-dirty to dump threads and histos
# 20110131 added prstat out to an prstat.out file with time stamps
# redirect outout to a file if desired
# modify for your use.
# 20110413 made this a bit more generic and added additional comments
# also added the HEX column in the prstat output

TDATE=`date +%Y%m%d-%H%M`
AWK_CMD=`type -p nawk`

TDUMP_CMD=" kill -3 "
#  and or
JHISTO_CMD=" jmap -histo "

VM_STRING="UNIQUE_STRING_HERE" #string to search for - change this as needed
#echo DEBUG: VM_STRING=$VM_STRING

VM_PID=`ps -ef | grep $VM_STRING | grep -v grep | awk '{print $2}'`
#echo DEBUG: VM_PID=$VM_PID

ITERATIONS=6
#echo DEBUG: ITERATIONS=$ITERATIONS
SLEEP=10

i=0
while [ $i -lt $ITERATIONS ]; do
  echo ==== count=$i `date +%Y%m%d-%H%M` ====
  #echo DEBUG: Now running: $TDUMP_CMD $VM_PID
  # uncomment the next line once you are SURE it is going to do what you want
  # this command assumes the thread dump output will go into a log
  # you may have to redirect this if the output goes to the console
  $TDUMP_CMD $VM_PID
  #echo DEBUG: Now running: $JHISTO_CMD $VM_PID
  # uncomment the next lines once you are SURE it is going to do what you want
  echo `date +%Y%m%d-%H%M`>>histos-$TDATE.out
  # this will only grab the top 65 histos and that is probably what you want
  $JHISTO_CMD $VM_PID | head -65  >>histos-$TDATE.out
  echo "We are only capturing the top 65 rows of the histo output on each run">>histos-$TDATE.out
  # or you can run the full histo dump - but this is a lot of output
  #$JHISTO_CMD $VM_PID  >>histos-$TDATE.out

  # this only runs if we are on a Sun OS platform - but it can be very helpful in analysis
  # as you can get the busies thread from the prstats
  # convert the thread number from decimal to hex and find it in the
  # thread dumps - that can tell you what VM thread may be a performance hog
  if [ `uname -s | grep Sun` ] ; then
     # only works in Solaris
     # this next line is needed to throw the timestamp into the output file
     echo `date +%Y%m%d-%H%M`>>prstat-$TDATE.out
     # this should just grab the top 20 and adds a column that translated the thread PID to HEX
     echo "We are only capturing the top 20 rows of the prstat output">>prstat-$TDATE.out
     prstat -Lm 1 1 | head -20|$AWK_CMD '{if($1~/PID/){printf "HEX";}else{printf "%X",$1}print $0}'>>prstat-$TDATE.out
     # or you can use this next one which does not add the HEX column and gets all the threads
     #prstat -Lm 1 1  >> prstat-$TDATE.out
  fi
  echo Sleeping for $SLEEP seconds...
  sleep $SLEEP
  ((i++)) # or i=$( i + 1 )
done
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply