Performance tunning case study

Recently I had the chance to help one of our clients (company XXX) analyze and detect performance issues. Below are the main issues raised during the consultation session and the recommended course of action for each problem. I am presenting these here for the benefit of other employees who might be needed to do the same type of consulting.

General background: company XXX reported several problems, mainly related to high memory consumption and thread exhaustion, with respect to their product at one of their largest clients web site.

Hardware: company XXX runs a 64bit machine with eight cores, equipped with 32GB of physical RAM (I have not collected data about the actual OS swap configurations, they might or might not be optimal).

Software: The OS is Red-hat Linux, the Jvm is 1.5.10, HTTP server is Apache 2,the application server is JBoss 4.01, the servlet container is Tomcat, the database is mostly an in-memory Oracle server.

Reported issues: There seemed to be an issue with this version of the JDK's jmap command (for generating memory dumps)
Findings:jmap hanged while inspecting the JVM, there with a cryptic output which could not be really understood.
Action items: consider upgrading to the latest JDK 5 release (I have version, tested on a dual core Linux machine with no problems)
Alternative course of action: Run the UNIX command : kill -3 <pid> to generate a memory dump. OR run the JVM with the HPROF agent on (java -agentlib:hprof) and then kill the process from the command line, this will generate the memory dump on the local dir (default is text mode). Note that this will considerably slow down your JVM process and is not suitable for production use. OR run jstat for several minutes to monitor the JVM heap and GC trends on the fly (excellent option which does not slow down the JVM)
The memory heap dumps generated by these tools should be viewable with the HAT utility or the VisualVM utility (shipped with JDK 1.6)
Server configuration: 
Apache is currently the HTTP front end and uses mod_jk to connect to Jboss,
JBoss also connects to an RMI server slice and hence uses distributed garbage collection. Currently, there are 4 JVMs running on this machine. namely they are (the list may be viewed any time by running the command "jps"):
Jboss instance,
RMI instance (or slice ...)
Unknown java application which belongs to  company XXX.
Reported issues:There were issues with the maximum number of threads running on both Tomcat and Apache. One issue was resolved (according to company XXX this had to do with blocking on a Thread), but there are still problems with the Apache threads. Additionally, they reported an issue with a bulk of multiple HTTP requests emerging from the same IP which resulted in a server crash (this was logged on the apache error log). In addition, they reported having a very large Java Heap size while inspecting the process using the Linux TOP command. Another issue raised was related to the use of the standard java HashSet for populating 4 million records in memory on the fly. company XXX suspected that there might be memory leak in the program.
Apache: there is no distinction between dynamic and static requests, no cache is configured (such as mode_cache), thread and process configurations are sub optimal for a multi core CPU.
JBoss: Jboss uses Hibernate to connect to the Oracle DB however NO entity cache and NO collections cache are used ASFAIK, and coincidentally, JDBC connection pool with default pool size (20) is used.
The JVM parameters are sub optimal, a setting for max heap size of 6GB is set, but no initial heap size is set, a setting for the permanent generation heap is set, but no initial permanent generation is set.In addition, there is no setting for the new generation heap and the default is used. On the endorsed classpth, there are JARs written by company XXX which might use JNI. Similar settings are used for the RMI slice machine. The garbage collector thread on the JBoss instance runs every minute resulting in full garbage collection.  
HashSet issue:with a modest 512mb heap size, and 4 million recoreds, the JVM crashed on OOM error.
Action items:
Apache: I would start by tweaking the thread and process configurations of apache and tomcat in tandem. An excellent starting point is:;jsessionid=2D230676993736AA36152A795BBC6209.066ef7ba
There is also a distinction between the Apache prefork and Apache worker modules which has to be taken into consideration (see I would suggest testing the current state of Apache using the outstanding AB tool (apache benchmark, available on every Linux machine) in a sterile environment and then start the tweaking until an optimal performance is gained.  The separation of static and dynamic content and the cache configurations are IMOH minor at the moment.
Enable entity and collection cache using EhCache (non distributed cache). This has to be a long term action item.  Increase the number of JDBC connections to 50 or 100, I suspect 20 is very low for such a high end machine. Note however that each JDBC connection will allocate approximatly 1MB of heap space on the stack. 
As a starting point, alter the jvm parammeters as follows:
-Xms3600m -Xmx3600m -XX:PermSize=512m -XX:MaxPermSize=512m -XX:NewSize=768m -XX:MaxNewSize=768m
With respect to the company XXX jars on the endored dir ( i think there are 3 such jars), make sure they are NOT using JNI.To avoid the full garbage collection resulting from the RMI registeri, increase the GC interval to10 or 15 minutes and test again. company XXX reported they had previously done that on the RMI slice and had benefited performance wise.

HashSet issue:
using a small java program, I tested the regular HashSet, a generic HashSet <String>, a generic java.util.concurent.HashMap and a simple generic ArrayList<String>. My conclusion is that there is no memory issue at all, 4 million  String records loaded into memory SHOULD generate an OOM error on a small heap.

Final summary:
Inspection of the JVM using jstat, jprofiler and jconsole prooved that the JVM parameters were sub optiomal with the process spending a lot of time on GC. The statistics showed
that there wasn't much space left on the "young generation" heap even with a relative allocation of 756MB. In addition, even with 15 connected users the  "old generation" heap reached its maximum (5GB) in 5 minutes. The application seemed to just consume as much memory as it can however without producing an OOM error. Jprofiler showed several problems in the code allocating 600K char[] however they were not willing to inspect that further. The folowing JVM parameters are the ones that are used after several hours of tweaking and testing:

XX:PermSize=512m -Xms5g -Xmx5g -XX:NewSize=756m -XX:MaxNewSize=756m -XX:SurvivorRatio=6 -XX:GCTimeRatio=2 -XX:ParallelGCThreads=8 -XX:+UseParNewGC -XX:MaxGCPauseMillis=2000 -XX:+DisableExplicitGC