Project

General

Profile

Task #3248

Avatar?id=6023&size=50x50 Avatar?id=6023&size=22x22

Investigate GC settings

Added by Artur Hefczyc TigaseTeam over 3 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Target version:
Start date:
2015-06-25
Due date:
2016-06-29
% Done:

100%

Estimated time:
Database:
n/a

Description

Looks like concurrent GC that we use by default is no longer as affective as it used to be. Even on our own production, public XMPP service the memory collections are not very effective. We see steady memory usage grow and then large collections from time to time.

This causes some problems of service delays during long collections and even OOM on some systems.

We need to find better GC settings which would really give us concurrent and steady collections while the server is running.


Related issues

Related to Tigase XMPP Server - Bug #4003: Using OldGen for memory reportingClosed2016-03-072016-09-23

Related to Tigase XMPP Server - Bug #4256: Improve Statistics memory usageClosed2016-06-172016-06-24

Associated revisions

Revision 0c551ddd (diff)
Added by Wojciech Kapcia TigaseTeam almost 3 years ago

Improve recommended JVM/GC settings; #3248

Revision c831d279 (diff)
Added by Wojciech Kapcia TigaseTeam almost 3 years ago

Documentation about JVM/GC settings; #3248

Revision f6d26af6 (diff)
Added by Wojciech Kapcia TigaseTeam almost 3 years ago

Explicit memory usage values in addition to percentage; #3248

Revision 53220581 (diff)
Added by Wojciech Kapcia TigaseTeam over 2 years ago

Include recommended settings for JVM/GC settings for particular installations; #3248

Revision b4e7740a (diff)
Added by Wojciech Kapcia TigaseTeam over 2 years ago

Add info about -XX:+UseCMSInitiatingOccupancyOnly JVM flag; #3248

History

#1 Avatar?id=6023&size=24x24 Updated by Artur Hefczyc TigaseTeam over 3 years ago

  • Due date changed from 2015-07-17 to 2015-10-31

#3 Avatar?id=6023&size=24x24 Updated by Artur Hefczyc TigaseTeam over 3 years ago

Testing and experimenting different GC settings is extremely time consuming and expensive. I think we should start with reviewing recent changes in GC for JVM8 and propose best settings this have most logical sense for a typical high load Tigase installation. Then, if we have time and resources we can experiment with different settings.

#4 Updated by Wojciech Kapcia TigaseTeam over 3 years ago

Agreed.

I think it could also be prudent to include short description of each GC setting that we would include in tigase.conf so it could be possible to enable them selectively (by uncommenting lines) only with intended settings.

#6 Updated by Wojciech Kapcia TigaseTeam over 3 years ago

  • Due date changed from 2015-10-31 to 2015-12-31
  • Assignee changed from Wojciech Kapcia to Eric Dziewa

Could you provide me with details about Tsung installation used for our daily load tests? I would like to utilize it for this task. I remember Tigase is on cXX machines - correct? And cron job runs sometime in (my) morning - between 1-6am CET ?

#7 Updated by Eric Dziewa over 3 years ago

  • Assignee changed from Eric Dziewa to Wojciech Kapcia

Cron jobs start at 2:50am and end at 7:10am your timezone. c400.xmpp-test.net is the controller.

You might prefer using the hardware machines which include backup.tigase.org, hw2.xmpp-test.net, and v33.tigase.org. They can create a bigger load, and faster than the VM machines.

Backup runs from 6am to 8am your time, this machine is idle the rest of the time. Login to tigase@hw2.xmpp-test.net and use comparison6.xmpp-test.net-ssl-tsung.xml as a template for two machine setup.

If you want to use all 3 use tigase@v33.tigase.org as controller comparison-72K-2HW.xml . v33 hosts the tsung cluster c40x so don't use it during cron jobs 2:50am - 8:00am.

#8 Updated by Wojciech Kapcia TigaseTeam about 3 years ago

  • Due date changed from 2015-12-31 to 2016-01-31

#9 Updated by Wojciech Kapcia TigaseTeam about 3 years ago

  • Status changed from New to In Progress

#11 Avatar?id=6023&size=24x24 Updated by Artur Hefczyc TigaseTeam about 3 years ago

  • Due date changed from 2016-01-31 to 2016-02-29

#12 Updated by Wojciech Kapcia TigaseTeam about 3 years ago

  • Due date changed from 2016-02-29 to 2016-03-07

#14 Updated by Wojciech Kapcia TigaseTeam about 3 years ago

  • Related to Bug #4003: Using OldGen for memory reporting added

#15 Updated by Wojciech Kapcia TigaseTeam about 3 years ago

  • Due date changed from 2016-03-07 to 2016-03-18

More tests

#16 Updated by Wojciech Kapcia TigaseTeam about 3 years ago

  • Due date changed from 2016-03-18 to 2016-03-25

#18 Updated by Wojciech Kapcia TigaseTeam almost 3 years ago

  • Due date changed from 2016-03-25 to 2016-05-15

#19 Updated by Wojciech Kapcia TigaseTeam almost 3 years ago

  • Due date changed from 2016-05-15 to 2016-06-09
  • % Done changed from 0 to 60

A couple of general, short remarks from the tests:

  • it's not only that default ratio of YoungGen/TenuredSpace is bad (2:1 by default) - it can get automatically adjusted and as a result usually ends up in way worse ratio (for example 150M when whole heap was configured as 6G!)

  • CMS seems to be slightly better on our installations while G1GC results in more and longer pauses;

  • our statistics doesn't play well with GC, especially when history is enabled (will create separate ticket for this later on):

    • statistics labels are not interned hence resulting in thousands of duplicated strings:
String                                                          Count   Size [B]
<All duplicate strings>                                         3259070 263459992
"IN_QUEUE processed IQ http://jabber.org/protocol/disco#info"   28080   4492640
"OUT_QUEUE processed IQ http://jabber.org/protocol/disco#info"  23760   3801440
"OUT_QUEUE processed IQ http://jabber.org/protocol/disco#items" 21600   3628632
  • numeric values are hold as strings (different data set):
String                  Count  Size [B]
<All duplicate strings> 412460 30871224
"2"                     40025  1921152
"1"                     36089  1732224
"4"                     22008  1056336
"3"                     16561  794880

To be done:

  • more tests with different scenarios;

  • process and share results;

  • update etc/tigase.conf to reflect conclusions;

  • finish Admin guide related to recommended JVM settings.

#20 Avatar?id=6023&size=24x24 Updated by Artur Hefczyc TigaseTeam almost 3 years ago

Wojciech Kapcia wrote:

** statistics labels are not interned hence resulting in thousands of duplicated strings:

** numeric values are hold as strings (different data set):

That's a good point and actually, relatively easy to fix.

#21 Updated by Wojciech Kapcia TigaseTeam almost 3 years ago

  • Related to Bug #4256: Improve Statistics memory usage added

#22 Updated by Wojciech Kapcia TigaseTeam almost 3 years ago

  • Due date changed from 2016-06-09 to 2016-06-17
  • Status changed from In Progress to Resolved
  • Assignee changed from Wojciech Kapcia to Artur Hefczyc
  • % Done changed from 60 to 100

Artur Hefczyc wrote:

Wojciech Kapcia wrote:

** statistics labels are not interned hence resulting in thousands of duplicated strings:

** numeric values are hold as strings (different data set):

That's a good point and actually, relatively easy to fix.

I've created task for that: #4256 and assigned it to version 7.1.0

The result of this ticket is a report from the tests: https://projects.tigase.org/documents/60 as well as updated documentation page "JVM settings and recommendations" in the Admin guide.

#23 Avatar?id=6023&size=24x24 Updated by Artur Hefczyc TigaseTeam almost 3 years ago

  • Status changed from Resolved to Closed

#24 Avatar?id=6023&size=24x24 Updated by Artur Hefczyc TigaseTeam almost 3 years ago

  • Status changed from Closed to Feedback
  • Assignee changed from Artur Hefczyc to Wojciech Kapcia

One last bit is missing. We need a concrete JMV settings that we recommend for our installations and for our customers.

#25 Updated by Wojciech Kapcia TigaseTeam almost 3 years ago

  • Due date changed from 2016-06-17 to 2016-06-24
  • Assignee changed from Wojciech Kapcia to Artur Hefczyc

Artur Hefczyc wrote:

One last bit is missing. We need a concrete JMV settings that we recommend for our installations and for our customers.

As concluded (and described in JVM settings and recommendations) - there is no one perfect setting that would be ideal for all installations. For majority of medium-to-big installations CMS with enforced NewSize=2 (or adjusted ratio depending on particular usage pattern) should be best choice. In addition, while describing it in the linked guide I've also updated the settings in etc/tigase.conf with those recommendation -- should it be made even more explicit?

#26 Avatar?id=6023&size=24x24 Updated by Artur Hefczyc TigaseTeam over 2 years ago

  • Assignee changed from Artur Hefczyc to Wojciech Kapcia

The top of the page contains old GC settings. So this should be updated.

The rest of the documentation is very good because it gives some default settings and suggestions on how to further tweak them.

However, what I would like to see is example GC settings for a few use-cases:

  1. Most typical deployment used by our customers (server class machine, at least 24GB RAM, 8 core CPU)

  2. Less typical deployment (VM with 16GB RAM, 4 core CPU - or whatever this is)

  3. A small, single installation with up to 10k users with let's say no more than 4GB RAM 4 core CPU

  4. Something else if you have any idea

The thing is that most of customers/users do not have enough knowledge and understanding of JVM and GC to tweak settings on their own, so we have to provide them with some ready to use defaults which are a good starting points.

#27 Updated by Wojciech Kapcia TigaseTeam over 2 years ago

  • Assignee changed from Wojciech Kapcia to Artur Hefczyc

Artur Hefczyc wrote:

The top of the page contains old GC settings. So this should be updated.

Done.

However, what I would like to see is example GC settings for a few use-cases:

Most typical deployment used by our customers (server class machine, at least 24GB RAM, 8 core CPU)

Less typical deployment (VM with 16GB RAM, 4 core CPU - or whatever this is)

A small, single installation with up to 10k users with let's say no more than 4GB RAM 4 core CPU

Something else if you have any idea

The thing is that most of customers/users do not have enough knowledge and understanding of JVM and GC to tweak settings on their own, so we have to provide them with some ready to use defaults which are a good starting points.

This may be related in a way to #3254.

At any rate - I've updated the documentation with settings for particular setups, which should be a good starting point.

#28 Avatar?id=6023&size=24x24 Updated by Artur Hefczyc TigaseTeam over 2 years ago

  • Assignee changed from Artur Hefczyc to Wojciech Kapcia

I can still see following at the beginning of the page:

GC="-XX:+UseBiasedLocking -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:ParallelCMSThreads=2 -XX:-ReduceInitialCardMarks"

Is this what we recommend?

#29 Updated by Wojciech Kapcia TigaseTeam over 2 years ago

  • Due date changed from 2016-06-24 to 2016-06-29
  • Assignee changed from Wojciech Kapcia to Artur Hefczyc

Where exactly? I've checked http://docs.tigase.org/tigase-server/snapshot/Administration_Guide/html_chunk/jvm_settings.html and it says:

#GC="-XX:+UseBiasedLocking -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:NewRatio=2 -XX:+CMSIncrementalMode -XX:-ReduceInitialCardMarks -XX:CMSInitiatingOccupancyFraction=70"

And this was done in the latest commit 5 days ago: https://projects.tigase.org/projects/tigase-server/repository/revisions/53220581636a58b07ebfa300eb42a77e4ad01eb4/diff/modules/documentation/adminguide/asciidoc/text/Admin_Guide_13_-_Configuration_-_E_-_JVM_settings.asciidoc

#30 Avatar?id=6023&size=24x24 Updated by Artur Hefczyc TigaseTeam over 2 years ago

  • Status changed from Feedback to Closed

Ok, I can now see updated settings. I was probably looking at the page before it was updated on our website.

Also available in: Atom PDF