Object Count Impact on Garbage Collection Performance

来源:百度文库 编辑:神马文学网 时间:2024/04/27 21:55:18

Object Count Impact on Garbage Collection Performance

When developers complain about poor Java application performance they most often cite garbage collection as too slow and taking too much time. Moreover, longer garbage collection pauses cause serious problems to any interactive application, especially where screen content changes at a high frequency (games). The Java defenders usually reply that garbage collection algorithms are very efficient, sometimes even faster than manual memory management in C/C++. Unfortunately, it is only half the truth. The other half is that we, Java programmers can be real pigs and allocate excessive numbers of objects that can overwhelm any garbage collection algorithm, regardless of how fast it is.

The particular problem I am concerned with is when a large number of small objects is used long term. Note that a smaller number of large objects is much better, since garbage collection performance is proportional to the number of objects and not their sizes (for more details on garbage collection algorithms see Sun‘s documentation). In particular, the garbage collection phase that must suspend all threads needs to scan objects and their references, so minimizing the number of objects should minimize this pause. It is especially critical to reduce the number of long lived objects as they are scanned over and over again.

I was performing analysis of a server software that was experiencing very long garbage collection pauses, several seconds at a time. This application had a several hundreds of megabytes heap, so clearly allocated a lot of objects. It didn‘t take much effort to identify the culprit:
http://localhost:7002/showInstanceCounts/includePlatform/
475988 instances of class java.util.HashMap$Entry
150358 instances of class java.lang.Integer
134060 instances of class java.lang.String
128203 instances of class java.util.HashMap$Entry[]
128173 instances of class java.util.HashMap
126484 instances of class char[]
51877 instances of class java.lang.Boolean
...

HashMap related class is at the very top. As an aside, note the huge number of integer wrapper objects. Worse yet, I found 51877 Boolean instances. While we are told not to bother pooling small objects such as primitive wrappers, that doesn‘t mean we should always allocate them on heap using new. In fact, the wrapper classes implement a simple pooling mechanism when appropriate that can significantly reduce number of instances allocated. As I said before, even though integer and boolean objects are tiny and don‘t use much memory, it is the high object count that degrades garbage collection performance. After replacing new Integer(... & new Boolean(... with Integer.valueOf(... & Boolean.valueOf(... I had:

...
50853 instances of class java.lang.Integer
...
24 instances of class java.lang.Boolean
...

This is a very simple technique, but easily forgotten. Note that allocating short term objects using new is fine as they don‘t live long enough to burden garbage collection. For long lived objects make sure to use the valueOf factory methods. Back to our list of top object counts:

http://localhost:7002/showInstanceCounts/includePlatform/
476039 instances of class java.util.HashMap$Entry
138654 instances of class java.lang.String
132946 instances of class char[]
128197 instances of class java.util.HashMap$Entry[]
128175 instances of class java.util.HashMap
...

So we have a lot of string and characters, which is typical of most server software. Also, we have a large number of hash map objects. Let‘s break it down:
About 128K hash maps and about the same number of hash map entry arrays. Makes sense, as each map has a single array. Then we have 476K entry objects, which is about 476 / 128 = 3.7 entries per map. At this point an alarm is blaring, "Why so many small maps?!?!?!"

Map is one of the most useful and most frequently used data structures. It is defined by the java.util.Map interface. java.util.HashMap is the most popular implementation. Map is used to associate things whenever you cannot index something. Moreover we often are too lazy to define a new class and end up using a map instead. For example:

class PersonString name =null;int age =null;int weight =0f;int height =0f;...
How many objects? 1 Person + 1 string (for name) = 2 objects total used by a single person object (inclusive). However, I just don‘t feel like creating another class,... Hey, I‘ll just whip up a nice little map:

Map person = new HashMap();person.put("name", "Joe Doe");person.put("age", Integer.valueOf(22));person.put("weight", Integer.valueOf(80));person.put("height", Integer.valueOf(170));
Object count is bit tricky. The keys are likely to be shared by all maps, so in effect they don‘t count. The integer objects are probably shared as well, so we‘ll estimate that for 3 integer objects due to sharing we only allocate 2 new instances. Therefore we have: 1 Map + 1 entry array + 4 entries (1 entry/mapping) + 1 string (for name) + 2 integers = 9 objects total used by a single person map (inclusive).

We should have used only 2 objects for a person, yet ended up using 9!!! To make things worse most programmers simply don‘t pay attention to such nasty little side effects of using a map. Most other map implementations use just as many objects if not more. As we use maps often in many places extra 7 objects quickly becomes a major problem when you reach extra half a million objects in the server software!!!

One way to approach this problem is simply to declare appropriate classes and try to use primitives as much as possible. At the same time, there is no denying that maps are handy tools and we don‘t want to punish programmers for using them. Therefore, I decided to create another map implementation that would use the minimum number of objects internally. That way we could keep using maps without creating a huge number of objects. In particular, if I could eliminate the entry objects, the person map above would use 4 less objects and the difference would not be as bad.

Next time I will start the series of articles about the MocMap - Minimum Object Count Map. I must warn you that they may not be fun (compared to Swing posts and other cool stuff). In fact, some of you may criticize it as much effort for a problem that doesn‘t exist. Please keep in mind that this class is relevant only in certain contexts. In other cases java.util.* maps are just fine and my implementation is not applicable. Therefore, any time you discuss merits of a piece of code make sure that the context is well defined and clear. In my defense, I‘ll note that small details such as number of objects used by a data structure, may be what distinguishes a slow Java pig from a Java application that fulfills the potential of Java platform and pushes the performance to the limits.

 留言:
[Trackback] Slodoban Celenkovic published a good article in his blog, about the impact of the cantity ob objects over garbage collection. The post is centered on Java, but is applicable to C# too, and for any language that uses garbage collection....

{0}发表于 Dark Side Programming on 2005年09月28日, 10:59 上午 EDT
站点: http://www.darksideprogramming.org/archives/2005/09/object_count_an.html #

You should also take a look at Javolution: http://www.javolution.org.

It has lots of classes and utilities for creating low garbage applications. If you write everything using Javalution you can almost have no garbage collection at all.

In the apps I‘m working on XML parsing is one of the biggest generators of garbage. We use a 3rd party xml to object mapper which is very very wasteful.

{0}发表于 Dave Tauzell (65.29.10.141) on 2005年09月28日, 11:58 上午 EDT #

I found this article interesting and useful. Probably I would consider cost of Map‘s implementation next time when I create those.

{0}发表于 Tapshil on 2005年09月28日, 12:01 下午 EDT #

Your point about Boolean is really good. I‘m not certain how the other wrappers (Integer/Long) are pooling though.

The JDK 1.4.2 code for Integer.valueOf(...) looks like

public static Integer valueOf(String s, int radix) throws NumberFormatException {
return new Integer(parseInt(s,radix));
}

{0}发表于 Prashant on 2005年09月28日, 12:54 下午 EDT #

Interesting and Ok for the wrappers but before creating your own hashmap class or introduce another library, <> that some constructors can have a parameter to specify the initial size and some kind of load factor or extension size.
This is the case for ArrayList, Vector, Hashmap... and the automatic resizing is not so costly.
Also for an Hashmap (if this is the best storage choice)
- read the hints about the best initial size
- make sure that the hashcode() of the key is properly distributed or all you objects could end with the same hash and as a consequence would be in the same Hashmap entrybucket. This would ruin your other efforts.

And read your "effective Java", Joshua Bloch

{0}发表于 Olivier Dupuy (198.103.167.20) on 2005年09月28日, 12:55 下午 EDT #

PMD can help catch some of this sort of thing, i.e.:

http://pmd.sourceforge.net/rules/basic.html#BooleanInstantiation
http://pmd.sourceforge.net/rules/basic.html#UnnecessaryConversionTemporary

Yours,

Tom

{0}发表于 Tom Copeland on 2005年09月28日, 01:37 下午 EDT
站点: http://pmd.sf.net #

[OT] is that a picture of the montreal ville-marie tunnel you have in your header?

{0}发表于 192.197.69.29 on 2005年09月28日, 02:42 下午 EDT #

Hello, Slobodan.

Have you looked at Trove4j?

http://trove4j.sourceforge.net/

I am not completely sure about their object allocation numbers/strategy, but in the last 2 projects where I used it, just switching from build-in ones to this improved performance quite well.

Just checked on their site: they do claim to consume much less memory then java Collections

{0}发表于 Eduard Letifov on 2005年09月28日, 04:01 下午 EDT #

Dave,
Haven‘t seen Javolution library yet. I‘ll check it out. thanks

{0}发表于 Slobodan (72.1.195.198) on 2005年09月29日, 10:28 上午 EDT #

Prashant,
You are looking at the wrong method version. It is overloaded. The one in question is:

public static Integer valueOf(int i) {
final int offset = 128;
if (i >= -128 && i <= 127) { // must cache
return IntegerCache.cache[i + offset];
}
return new Integer(i);
}

{0}发表于 Slobodan (72.1.195.198) on 2005年09月29日, 10:31 上午 EDT #

Olivier,
I am not concerned with resize calls as much as with the entry objects. Map capacity doesn‘t affect the number of entry objects allocated. Look at my next post, the analysis, for details.

{0}发表于 Slobodan (72.1.195.198) on 2005年09月29日, 10:32 上午 EDT #

Eduard,
Thanks for the tip. I don‘t remember seeing trover before. I‘ll check it out.

thanks

{0}发表于 Slobodan (72.1.195.198) on 2005年09月29日, 10:33 上午 EDT #

You didn‘t metion String.intern(). Have you used this to reduce the number of Strings?

Does the use of String.intern() also apply here?

I‘ve had good success with performance improvements replacing string.equals() tests with simpler == test between the interned strings.

{0}发表于 Steven F. Lott on 2005年09月29日, 04:23 下午 EDT
站点: http://homepage.mac.com/s_lott/iblog/architecture/index.html #

Steven,
String.intern() is most of the times utilized by JVM behind the scenes. In general we don‘t create new string instances explicitly, but implicitly by converting ints to strings and concatenating literals. Therefore intern is usually automatic and need not be invoked explicitly.
Reference comparison (==) as a replacment for String.equals() usually works but is dangerous. You could end up with very difficult to find bugs. It is only safe when you can guarantee that both string values are interned, which is not always possible to determine.

In any case you do have a good point about having extra String objects that are not essential. That is yet another topic; I may tackle it in the future.

thanks

{0}发表于 Slobodan (209.71.200.54) on 2005年09月29日, 09:16 下午 EDT #

It‘s an excelent article about objects alocation. I‘m having several problems about this issue. The heap goes from 500Mb to 1Gb in few seconds at system‘s hush time.

{0}发表于 Givanildo on 2005年09月30日, 11:09 上午 EDT
站点: http://www.webintegrator.com.br #

The Integer.valueOf method you mention only exists since jdk1.5. That‘s why "Phrasant" cannot find it.

{0}发表于 DZ (82.214.1.110) on 2005年09月30日, 06:26 下午 EDT
站点: http://www.comers.se #

I did not understand why you have taken 2 integer objects instead of 3 in the example. Plz a bit more explanation. Looking ahead how you have reduced 9 objects to less than 4 obejcts in your next article.

{0}发表于 Sabyasachi on 2005年10月05日, 12:32 上午 EDT #

Object Count Impact on Garbage Collection Performance Object Count Impact on Garbage Collection Performance Garbage Collection: Automatic Memory Management in the Microsoft .NET Framework Online Advertisers Bet on Video but Count on Display Garbage Collection--Part 2: Automatic Memory Management in the Microsoft .NET Framework DVR Use Not Having Huge Impact on Ratings [object] collection... Impact of guidance on the problem-solving efforts of instructional design novices mysql count(*) count(val) count(1)比较 Hypothesized performance on complex tasks as a function of scaled instructional strategies optimal performance in production environments was not found on the java.library.path impact test 快乐collection google collection Collection 对象 happiness collection object - HTML元素 什么是object指向? 什么是object指向? [object]北京故宫博物院 HTML 标签 Impact Crusher - Liming Heavy Industry Hangzhou aims for global impact