Tuesday, 29 December 2009

Case Study : Java Web Application Production Issue (Part-III)

As we saw in the 2nd part of this series, the real problem was fetching large no. of records, which in turn forcing JVM GC to kick in to free up the heap memory.

In order to fix this problem, we decided to implement pagination in search methods of the system. Till that point of time, the pagination was there only on UI side but from database all the applicable records were being fetched !

Pagination using iBatis

We have been lucky to use iBatis as our Data Access component. It provides very transparent and easy pagination functionality via its queryForList overloaded methods of SqlMapClient.

So we specifically used,

public java.util.List queryForList(java.lang.String id,
                                   int skip,
                                   int max)
                            throws java.sql.SQLException

to pass the pagination related information to database while selecting the records for specific search criteria.

Two important parameters for pagination are calculated as,

skip : (pageNo – 1) * records_per_page

max : records_per_page

This change drastically reduced the no. of records being fetched from around 4000 to 16 (records per page). However this solution came with a minor overhead.

The overhead was to fire one query with same search criteria to compute the total no. of records that could be returned by this criteria. This count is required to calculate the total no. of pages, which will later be shown in data grid. However it is not a great overhead and database work quite fast to return the count.

This optimization reduced the need of memory by great amount and system continues to have ample heap memory for new objects.

Learning

This debugging exercise came as a great learning to me. It took around 3 days for finding the problem, understanding the reason, and fixing it.

Technically, I learnt quite a few new things,

  • Remote debugging of web application, using JPDA and Eclipse
  • jmap and jhat profiling tools.
  • pagination with iBatis

This case study will always encourage me to do large data set testing of whatever application we would develop in future. Now I believe the fact that large no. of data records may reveal some hidden bugs and such testing would ensure the removal of such bugs and ultimately lowering the maintenance hassles and in turn the cost.

Monday, 28 December 2009

Case Study : Java Web Application Production Issue (Part-II)

As I deduced in previous post, the root problem was lying somewhere in our code. The typicality of this problem was that it was happening only on our production server but not on our local development / test machines ! This made my life difficult and finally I decided to do remote debugging of the application using JPDA and Eclipse.

Remote Debugging On Tomcat + Eclipse

After searching through Google / Bing, I came across a few pages about how to setup Tomcat for remote debugging. I spent quite a few hours following the various instructions and with many failed attempts. Finally I could find out how to enable remote debugging by reading catalina.sh (Tomcat invocation / termination shellscript).

Essentially, the Java has standardized the debugging infrastructure in name of JPDA. So most of the Java based application servers and IDEs support this infrastructure to make poor developer’s life easy. In order to enable Tomcat for remote debugging, you just need to start Tomcat via following command,

$TOMCAT_HOME/bin/catalina.sh jpda start

Generally we use startup.sh to start and shutdown.sh to stop the tomcat server.

Here observe the first command line arg to catalina.sh (i.e. jpda), this will instruct catalina.sh to setup the remote debugging and bind the JPDA listener at 8000 TCP port. You can always change this 8000 port but then you might have to setup JPDA_ADDRESS environment variable.

Before starting Tomcat server, please make sure that 8000 is open in firewall for inbound connections. Now we will see what to do in Eclipse (it’s fairly simple),

  • Open the code workspace
  • Go to Run Debug Configurations
  • Create new Remote Java Application configuration, as shown below,

    image
  • Give valid project name.
  • In Connection Properties, enter your host server’s name / IP and port is 8000 (or whatever you set)
  • Your Debug button should be enabled (mine is disabled because I have given wrong Project Name :) )
  • Click on Debug to start debugging.
Replicating the issue in debugging

After having remote debugging on, I could simulate the issue by opening the application simultaneously in two different browsers (i.e. Firefox & IE). I initiated blank search request first on Firefox and then on IE. The breakpoints were kept in DAO layer to see exactly where it is getting stuck. The request thread initiated from Firefox stopped at breakpoint and after doing Step Over debugging, the request successfully fetched 4000 records and returned to web layer. Whereas the request thread initiated via IE came to DAO layer and went for fetching the records and after some time the thread just died without any trace. The debugging pointer just went away and no trace within Stack Trace / Console window. This was unique, I had never anticipated that the request thread would die like this.

After this first faceoff, I studied the logs and the record fetching stopped at roughly same point where it stopped in earlier attempts. I was going nowhere with this issue and suddenly it struck to my mind that what about using profiling tools. So after a few rounds of searching, I settled on two nifty profiling tools (bundled with JSDK), i.e. jmap (Java Memory Map) and jhat (Java Heap Analysis Tool).

Using Profiling Tools (First breakthrough…)

I will give a very short introduction of these tools. These profiling tools are forming a pair like data generator and report presentation tools. You guessed it right, jmap is a data generator tool whereas jhat is report presentation tool.

jmap dumps the entire heap memory objects into binary file with all the reference information. In other words it dumps entire memory allocated and used by JVM to store the run time data structures of objects.

jhat on other hand, reads the binary file created by jmap and works as a parsing and presentation front-end. It has inbuilt HTTP server and it generally listens to 7000 TCP port. jhat provides HTML based interface to access the information about the complex data structures of objects (Object graphs) present in dump file. It also supports Object Query Language (however I don’t know how to use that :) ).

So I restarted Tomcat and replayed the same scenario with remote debugging on. When the second request stuck, I ran jmap against the Tomcat Process and it generated the dump file. I executed following command on server,

jmap –dump:file=webappheap.dmp 42961

Here 42961 is the process id of Tomcat process. You may execute following Unix command to get the process id of Tomcat,

ps –xf | grep java

I download the file on my local machine and supplied it to jhat using following command,

jhat webappheap.dmp

image

Browser View:

image

After going through the object graphs presented, I found the objects allocated during search. After going through the entire object graph of any particular object, I roughly calculated the size of object was around 4Kb and there were around 8000 objects were being created which would roughly equal to 32 Mb of memory. This figure jolted me. It was too much of memory for just two search requests. Now the clouds of confusion were quite clear. I deduced the reason of the second thread’s termination. It was Garbage Collection which would kick-in because of very limited memory available to server and the request required atleast 16 Mb. So in order to make the future provision for free memory, JVM was invoking GC to free up the memory and this was causing the current thread to go to halt state and ultimately timeout !!

I checked the –Xmx parameter for Tomcat and it was just 64 Mb !! and my two requests were eating up 32 Mb memory (50% of entire Tomcat server).

Quick Fix

This is definitely because of our short sightedness and bit of bad programming. In order to make quick fix to this problem, the easiest way was to increase the –Xmx parameter’s value from 64Mb to at least 192 Mb. This will accommodate at least around 9 to 10 simultaneous full search requests without GC invocation.

Setting –Xmx parameter for Tomcat

In order to set the custom –Xmx parameter for Tomcat, you may set JAVA_OPTS environment variable or you can modify catalina.sh and set JAVA_OPTS like,

JAVA_OPTS=”-Xmx192m”

However this will not work if you are starting your tomcat through services in CentOS / Ubuntu. For example, if you run following command to start Tomcat server,

/sbin/service tomcat6 start

then it will neither take JAVA_OPTS environment variable into consideration nor will use catalina.sh to start the server. It uses its own application wrapper to start / stop the server. This application wrapper script would be lying at /etc/init.d/tomcat6

So for such cases you would have to change the tomcat configuration file at /etc/tomcat6/tomcat.conf. The following property needs to be changed,

# Maximum Java Heap Size (in MB)
wrapper.java.maxmemory=192

Restart the server and it will start allocating heap memory up to 192 Mb. You may execute

ps –xf –U tomcat | grep java

to check the java process executed under tomcat user. The output would look like,

3557 ?        Sl     1:24  \_ /usr/java/j2sdk/bin/java -server -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false -Dcatalina.base=/opt/tomcat6 -Dcatalina.home=/opt/tomcat6 -Xms64m -Xmx192m -Djava.library.path=/opt/tomcat6/lib

Current Status

Currently the application is working fine even with larger search requests but we need to fix the code so that it would not exploit the server memory. The memory is a precious resource (at least in our server environment), it should not be wasted like that.

In next (and last) part, I will show what optimizations we made to code to reduce the memory load.

Friday, 25 December 2009

Case Study : Java Web Application Production Issue (Part-I)

This is the case study about the Java based Web application, which we have had developed for one of our clients. This is a business application and used by them to maintain their day to day transactions and other reference data.

Some days back they reported a weird issue wherein system was failing to respond in case of search screen. After observing what they were doing, we came to conclusion that system used to stop responding once the user made such search request, which fetched large no. of records ( > 4000 records). Although the application displays the records in paginated grid, the records are fetched upfront and kept in a Session until user closes the search screen or makes a new request.

Strangely, the large search was working fine but the subsequent search requests were failing miserably. In order to debug this problem we progressed step-by-step. Although many of our steps and assumptions were proven wrong ! it was a good learning.

Before going ahead a summary of the production environment,

Production Server Environment

Components

Description

Server Type Shared Private Hosting (VPS)
Server Config 256 MB RAM burstable to 333 MB
Server OS Cent OS 32-Bit
JRE Java SE 6
Tomcat 6.1.18
MySQL 5.1
Blame it on DBMS

The application is using MySQL as backend database and iBatis 2.3.3 as a Data access component. This combination served us a very good purpose of easy and flexible data access design.

In order to start debugging, first thing I did was to turn on the detailed logging to track down the exact failure point. After setting up the DEBUG level logging, I re-ran the scenario and carefully observed the logs.

The search requests I made were,

  • Blank search (with no criteria)
  • Like search on code field (where code like ‘%A%’)

Since the no. of records were quite high, and logging level was more granular system started taking more time (because it was logging each and every record into log file). After careful observation, I could make out that while serving second request, the system stopped fetched the records through java.sql.ResultSet. In other words, I could see that the last record, which was logged into the file was around 2000 ! On front-end the system just stopped loading the records in grid (it was an Ajax request). For information, we have used fabulous jqGrid to render data-grid on the pages.

I assumed that it should be a problem with MySQL which just gets mad for search like where code like ‘%A%’.

First thing I did, was putting up a question on StackOverflow. In a few moments people replied and the common tone of all the responses was don’t use ‘%A%’ but use ‘A%’ or ‘%A’ as criteria. Those answers made my doubt more strong on MySQL.

So I finally thought to test these SQLs using phpMyAdmin, which I have installed on the server for remote access to db. In my amazement, phpMyAdmin took around 100 - 300 millisecons to get the results ! This really shook my doubt on MySQL. Although phpMyAdmin was putting LIMIT / TOP keywords, the same query executed through MySQL prompt did not take much longer time !

I was completely wrong and stupid in underestimating the strength of MySQL therefore my next target became MySQL jdbc driver and iBatis ! So to resolve the doubt, I took the most recent versions of both of them and put them into application, but they did not solve the problem. Thus they were also out of the “criminal list”.

Now I was sure that there was something wrong with our code or some problem with the environment.

Friday, 11 December 2009

Welcome Intel Core i7 and Windows 7 !!!

If I take DATEDIFF of the current post and the last post on this blog, it would come around 488 days. I know, it has been long silence partly due to my carelessness attitude towards maintaining the blog and some work reasons too.

So I am dedicating this blog post to lovely thing called Windows 7. I just bought Dell XPS Studio 16 (with Intel Core i7 processor and ATI Radion 4670). The installed OS was Windows 7. After using it for around a day, I must say this is the best thing came out from MS camp after .NET.

Do you see some rhythm here !! Core i7 and Windows 7 :). I just don’t see, I can feel it too… Intel made fabulous advances in processor design. Just imagine this tiny thing has 4 cores and each core is capable to run two threads (Hyper-Threading tech.). See the below snap of the task manager,

image

If you are hardware junkie like me, you will feel the heart pounding at seeing 8 CPUs phew !!!

It took me around total 20 to 25 mins for all the tasks listed below,

  1. Booting PC with Windows 7 CD
  2. Starting fresh installation of Windows 7
  3. Dropping the current partition and creating 2 different partitions
  4. Installing OS
  5. Selecting localization information.
  6. Installing necessary drivers
  7. Formatting the unused space, which was 100 GB. You won’t believe, it just took 30-40 seconds to format 100 GB with NTFS !!!

One diversion here,

If you have Dell XPS Studio 1645 and you have reinstalled the OS then you might get warning for unrecognized “base system device” in device manager of Windows 7 control panel. The reason is Ricoh Card Reader. You need to install device driver from Dell Utilities CD for Ricoh Card Reader and that warning gone.

See the beauty of Windows 7,

image

Somebody said, beauty lies in person’s eye. I would say for Windows 7 it is not correct, it is really a beauty.