We are running a Java application that sometimes "freezes" because some thread is using almost all the heap. Despite the JVM doing Full GC that last more than 60 seconds the application never dies with OutOfMemoryError.
I read from Java documentation that :
The throughput collector will throw an out-of-memory exception if too much time is being spent doing garbage collection. For example, if the JVM is spending more than 98% of the total time doing garbage collection and is recovering less than 2% of the heap, it will throw an out-of-memory expection.
I would like more information about what this 98% of time means (what is the time frame ?), and if it is possible to lower this value, i.e. throw an OOME if the application is spending 90% of time in GC and cannot free more than 10% of the heap.
The goal is to make sure the application will die (instead of running doing only GC) with OOME so we can generate a dump on OOME.
Here are the memory and GC settings we use (OS is Solaris):
-Xms2048m -Xmx2048m \
-Xmn512m \
-XX:PermSize=256m
-XX:MaxPermSize=256m \
-XX:+UseParNewGC
-XX:ParallelGCThreads=16 \
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled \
-XX:+DisableExplicitGC \
-XX:+PrintGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps \
-XX:+PrintClassHistogram \
-Xloggc:/gcmonitor.log \
-XX:+HandlePromotionFailure \
-XX:SurvivorRatio=4
-XX:TargetSurvivorRatio=90
-XX:MaxTenuringThreshold=10 \
-XX:+UseTLAB
-XX:TLABSize=32k
-XX:+ResizeTLAB \
-XX:+UseMPSS \
Source: Tips4all, CCNA FINAL EXAM
I would like more information about what this 98% of time means (what is the time frame ?)
ReplyDeleteAnswer to this question: GC overhead limit exceeded suggestes it is 1 minute.
is possible to lower this value
Once again looking into the question mentioned above, looks like you can use GCTimeLimit and GCHeapFreeLimit parameters.
If you are only looking to force an OOM to get the side benefit of the heap dump, you can now do this on a running java process at any time:
ReplyDeleteFind the process:
jps -v
Force a dump
jmap -dump:file=heap.bin
Then analyze heap.bin in your tool of choice.
Taking a heap dump interactively on OOME or with jmap can cause the JVM to pause for minutes. It's generally more effective to use gcore to create a core dump manually, then use jmap to take the heap dump from the core.
ReplyDeleteI'd allocate more heap, see if that helps mitigate the problem. Also be careful about excessive GC tuning - generally the collectors have excellent defaults, I'd only recommend the options after Xloggc if you've determined that these significantly improve GC performance based on your application's object allocation/retention patterns. The parallel collector threads might also be too high, depending in the number of hardware threads available.
You should be able to determine the pattern for the heap usage from the GC logs and determine whether this is rapid utilisation by a single thread, performing an operation that rapidly exhausts the heap, or a slower 'leak' pattern where many objects are promoted over time causing the tenured generation to be contended, with few objects candidates for collection - the histograms will also help.
All that said, focusing on a heap dump is definitely the way to go. Eclipse MAT is the best analysis tool IMO. Here's a great place to start if you haven't used it before:
http://kohlerm.blogspot.com/2009/07/eclipse-memory-analyzer-10-useful.html