As I deduced in previous post, the root problem was lying somewhere in our code. The typicality of this problem was that it was happening only on our production server but not on our local development / test machines ! This made my life difficult and finally I decided to do remote debugging of the application using JPDA and Eclipse.
Remote Debugging On Tomcat + Eclipse
After searching through Google / Bing, I came across a few pages about how to setup Tomcat for remote debugging. I spent quite a few hours following the various instructions and with many failed attempts. Finally I could find out how to enable remote debugging by reading catalina.sh (Tomcat invocation / termination shellscript).
Essentially, the Java has standardized the debugging infrastructure in name of JPDA. So most of the Java based application servers and IDEs support this infrastructure to make poor developer’s life easy. In order to enable Tomcat for remote debugging, you just need to start Tomcat via following command,
$TOMCAT_HOME/bin/catalina.sh jpda start
Generally we use startup.sh to start and shutdown.sh to stop the tomcat server.
Here observe the first command line arg to catalina.sh (i.e. jpda), this will instruct catalina.sh to setup the remote debugging and bind the JPDA listener at 8000 TCP port. You can always change this 8000 port but then you might have to setup JPDA_ADDRESS environment variable.
Before starting Tomcat server, please make sure that 8000 is open in firewall for inbound connections. Now we will see what to do in Eclipse (it’s fairly simple),
-
Open the code workspace
-
Go to Run → Debug Configurations
-
Create new Remote Java Application configuration, as shown below,
-
Give valid project name.
-
In Connection Properties, enter your host server’s name / IP and port is 8000 (or whatever you set)
-
Your Debug button should be enabled (mine is disabled because I have given wrong Project Name :) )
-
Click on Debug to start debugging.
Replicating the issue in debugging
After having remote debugging on, I could simulate the issue by opening the application simultaneously in two different browsers (i.e. Firefox & IE). I initiated blank search request first on Firefox and then on IE. The breakpoints were kept in DAO layer to see exactly where it is getting stuck. The request thread initiated from Firefox stopped at breakpoint and after doing Step Over debugging, the request successfully fetched 4000 records and returned to web layer. Whereas the request thread initiated via IE came to DAO layer and went for fetching the records and after some time the thread just died without any trace. The debugging pointer just went away and no trace within Stack Trace / Console window. This was unique, I had never anticipated that the request thread would die like this.
After this first faceoff, I studied the logs and the record fetching stopped at roughly same point where it stopped in earlier attempts. I was going nowhere with this issue and suddenly it struck to my mind that what about using profiling tools. So after a few rounds of searching, I settled on two nifty profiling tools (bundled with JSDK), i.e. jmap (Java Memory Map) and jhat (Java Heap Analysis Tool).
Using Profiling Tools (First breakthrough…)
I will give a very short introduction of these tools. These profiling tools are forming a pair like data generator and report presentation tools. You guessed it right, jmap is a data generator tool whereas jhat is report presentation tool.
jmap dumps the entire heap memory objects into binary file with all the reference information. In other words it dumps entire memory allocated and used by JVM to store the run time data structures of objects.
jhat on other hand, reads the binary file created by jmap and works as a parsing and presentation front-end. It has inbuilt HTTP server and it generally listens to 7000 TCP port. jhat provides HTML based interface to access the information about the complex data structures of objects (Object graphs) present in dump file. It also supports Object Query Language (however I don’t know how to use that :) ).
So I restarted Tomcat and replayed the same scenario with remote debugging on. When the second request stuck, I ran jmap against the Tomcat Process and it generated the dump file. I executed following command on server,
jmap –dump:file=webappheap.dmp 42961
Here 42961 is the process id of Tomcat process. You may execute following Unix command to get the process id of Tomcat,
ps –xf | grep java
I download the file on my local machine and supplied it to jhat using following command,
jhat webappheap.dmp
Browser View:
After going through the object graphs presented, I found the objects allocated during search. After going through the entire object graph of any particular object, I roughly calculated the size of object was around 4Kb and there were around 8000 objects were being created which would roughly equal to 32 Mb of memory. This figure jolted me. It was too much of memory for just two search requests. Now the clouds of confusion were quite clear. I deduced the reason of the second thread’s termination. It was Garbage Collection which would kick-in because of very limited memory available to server and the request required atleast 16 Mb. So in order to make the future provision for free memory, JVM was invoking GC to free up the memory and this was causing the current thread to go to halt state and ultimately timeout !!
I checked the –Xmx parameter for Tomcat and it was just 64 Mb !! and my two requests were eating up 32 Mb memory (50% of entire Tomcat server).
Quick Fix
This is definitely because of our short sightedness and bit of bad programming. In order to make quick fix to this problem, the easiest way was to increase the –Xmx parameter’s value from 64Mb to at least 192 Mb. This will accommodate at least around 9 to 10 simultaneous full search requests without GC invocation.
Setting –Xmx parameter for Tomcat
In order to set the custom –Xmx parameter for Tomcat, you may set JAVA_OPTS environment variable or you can modify catalina.sh and set JAVA_OPTS like,
JAVA_OPTS=”-Xmx192m”
However this will not work if you are starting your tomcat through services in CentOS / Ubuntu. For example, if you run following command to start Tomcat server,
/sbin/service tomcat6 start
then it will neither take JAVA_OPTS environment variable into consideration nor will use catalina.sh to start the server. It uses its own application wrapper to start / stop the server. This application wrapper script would be lying at /etc/init.d/tomcat6
So for such cases you would have to change the tomcat configuration file at /etc/tomcat6/tomcat.conf. The following property needs to be changed,
# Maximum Java Heap Size (in MB)
wrapper.java.maxmemory=192
Restart the server and it will start allocating heap memory up to 192 Mb. You may execute
ps –xf –U tomcat | grep java
to check the java process executed under tomcat user. The output would look like,
3557 ? Sl 1:24 \_ /usr/java/j2sdk/bin/java -server -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false -Dcatalina.base=/opt/tomcat6 -Dcatalina.home=/opt/tomcat6 -Xms64m -Xmx192m -Djava.library.path=/opt/tomcat6/lib
Current Status
Currently the application is working fine even with larger search requests but we need to fix the code so that it would not exploit the server memory. The memory is a precious resource (at least in our server environment), it should not be wasted like that.
In next (and last) part, I will show what optimizations we made to code to reduce the memory load.