Project

General

Profile

XMPP Service sharding - Tigase on Intel ATOMs

Avatar?id=6023&size=32x32

Artur Hefczyc TigaseTeam
Added almost 5 years ago

[] Since FOSDEM 2008, when I first met Florian Jensen, I wondered whether XMPP service sharding is even possible or feasible. He presented, there his FlexiServ machines, tiny servers with no hard drive and very limited CPU power. He asked how the Tigase would work on this kind of hardware and if running a cluster of the Tigase server on FlexiServ to provide XMPP service makes sense.

Of course, for small XMPP installations the Tigase server would work wonderfully on these tiny machines. However I was quite reluctant to recommend setting XMPP cluster up on them. To be honest, however, I have never stopped thinking about it. I was always very keen on implementing clustering in the Tigase which would work successfully on small machines in sharding configuration. Ideally you could stick more nodes (shards) to the cluster when you need them.

I have been working and improving clustering code in the Tigase server for over a year. It seems to me it works pretty well now so I decided to publish results from tests run on my development installation. The goal behind the tests was to determine whether clustering gives you any benefits other than just HA service. Whether adding more nodes to the cluster helps spread the load and improves the service throughput. This article presents all the details from tests I ran.

Again, I know many people are quite busy and prefer to get to the point as soon as possible. So I am starting with the conclusion and providing all the details later on. The table below presents CPU usage depending on the number of user connections to the cluster. RAM usage is displayed for the point when CPU usage was above 90% but not yet saturated.

I have run 5 tests. Each test puts the load to selected cluster nodes only.

Results

1 node

2 nodes

3 nodes

4 nodes

5 nodes

CPU usage <80%

<12k

 <18k

 <30k

 <36k

 <42k

CPU usage >90%

18k-24k

 18k-24k

 30k-36k

 36k-42k

 42k-50k

CPU Saturation

>24k

 >26k

 >36k

 >42k

 >50k

RAM Usage on node

 80%

 50%

 45%

 40%

 40%

Please note, even though the table shows number of user connections, this is not what really caused the load. Number of connections are presented here as kind of symbolic factor. Each connection generates some load on the server sending messages and changing presence status. The traffic generated by all the connections is what causes the load. Of course the more connections the more traffic, so number of connections is indirect factor of the total load.

If you are interested in all the details, please continue reading...

Hardware used for tests

The Tigase cluster is setup on 5 Intel ATOM N270 machines with 2GB of RAM and 16GB SSD hard drives. CPU - 32b with Hyper Threading, 1.6GHz, 1.1GB out of total 2GB of RAM are assigned to the Tigase server.

Specifically, no longer existed Dell Mini9s with slightly enhanced configuration are used for tests.

MySQL 5.1.31 database is installed on 6th - IBM Thinkpad machine.

Everything is connected through Netgear ProSafe 8 Port Gigabit Switch - GS108. Dell Mini9s use 100MB ethernet connections only and the Thinkpad uses 1GBit ethernet connection.

Software

Development version of the Tigase server, MySQL 5.1.31 as a database, Tsung 1.3.0 to generate client load. Ubuntu 9.04 server on the Thinkpad - database server and Ubuntu 9.04 Netbook Remix on Dell Mini9s.

Testing environment

  1. In all tests all 5 cluster nodes were connected together.
  2. Database with 10mln of registered users is used
  3. Each user has a roster with 150 contacts
  4. 20% (30) of the contacts are online during the test
  5. New user connections are made in 10/second rate
  6. Each connection sends a message every 100 seconds
  7. Each connection changes presence status every 10 minutes - this causes a significant jump in the traffic every 10 minutes or every 6k new user connections.
  8. User session length is 110 minutes
  9. User distribution in the cluster is even

Results

This section presents all the test results with detailed discussion and conclusions. What is the weakness and strength of each configuration.

Due to the nature of the test you can see that critical points of the test are always at about 6k connection edge. This is because presence status change is triggered every 10 minutes which is every new 6k user connections with 10 new connections per second rate. A message sent every 100 seconds also causes some load but this is much less significant than the presence change.

The memory consumption is mainly caused by the roster caching for all online users.

Intel ATOM N270 even though is offers very good hyper threading is not very powerful CPU so this is the main limitation for all the tests. The test ended when the machine couldn't keep up with the load, the CPU was saturated, internal queues started to grow and service response time was over 2 seconds.

One node

[]

Total max connections

24k

Max connections per node

24k

Optimal total connections

<12k

Optimal per node conns

<12k

Memory usage / node

90%

Cluster packets traffic / node

80

SM packets traffic / node

6k

Total Messages per second

700

Total Presences per second

5k

When the traffic was directed to a single node only it could handle up to max 24k connections. At this point the presence change was triggered again and the total load on the server was greater than what the machine could handle.

Most of the load was CPU bound but memory usage was also very high almost matching CPU usage that is 90% at the peak time.

Cluster traffic was kept to minimum which is understandable because all the user connections stayed on a single node and there was no need to send packets to the cluster.

The most active on the machine were threads handling client to server traffic and roster/presence packets processors.

Two nodes

[]

Total max connections

27k

Max connections per node

13.5k

Optimal total connections

<18k

Optimal per node conns

<9k

Memory usage / node

50%

Cluster packets traffic / node

1k

SM packets traffic / node

4k

Total Messages per second

900

Total Presences per second

5k

Surprisingly the difference between 1 and 2 nodes in the cluster was not too significant. The total max number of user connections handled was just a little bit greater than for single node. Even though I have put <18k as optimal max user connections the load was also just a little bit lower than for a single node installation. So probably more accurate would be <12k as optimal number of user connections.

Certainly memory usage was significantly lower on a single node and this is because user connections were spread over 2 nodes thus roster caching was spread too.

The results were so similar that I ran tests a couple of times to make sure they are accurate and correct. Every time I got the same results.

The only explanation which comes to my mind is that in a single node installation there is no cluster traffic thus the cluster traffic causes no load on the machine. In 2 cluster nodes installation part of the session manager load is released but instead we have significant cluster traffic which cases load on the CPU.

The most active on the machine were threads responsible for session manager processing, cluster traffic and network I/O.

The conclusion is: cluster of 2 nodes can be used solely to get HA service, and not really to handle significantly more load than on a single node. For installations with average roster size significantly bigger than 150 but average online users percentage similar to 30 contacts using 2 nodes over 1 node has a big advantage as it allows to spread the memory usage over 2 machines. If that was the only reason to use 2 nodes, then probably a cheaper option would be adding more RAM to a single node.

Three nodes

[]

Total max connections

37k

Max connections per node

12k

Optimal total connections

<24k

Optimal per node conns

<8k

Memory usage / node

45%

Cluster packets traffic / node

1k

SM packets traffic / node

3.5k

Total Messages per second

1.2k

Total Presences per second

6k

Three-nodes installation, in turn, behaves much better. Significantly better than 2-nodes cluster. It could handle up to 36k user connections in total and at this point a new wave of presence packets caused high load on the machines which was impossible to handle.

Optimal load is below 24k total user connections or below 8k connections to each of the nodes.

Memory consumption was a little bit lower then for 2-nodes installation, but in turn, there was much more user connections on each cluster and a little less connections per cluster node. As a result the memory usage pattern is consistent with the service load.

As in previous test the most active on the machines were session manager, cluster traffic and network I/O threads.

Cluster traffic was still on acceptable level at about 1k packets/second. The maximum cluster connection throughput was measured to be about 5k-7k packets per second so there was still plenty of space for occasional peak times.

This installation is certainly recommended for both HA and LB services. If you can guarantee even user distribution the service can easily handle quite high load with occasional traffic spikes.

Four nodes

[]

Total max connections

43k

Max connections per node

10.9k

Optimal total connections

<30k

Optimal per node conns

<7.5k

Memory usage / node

40%

Cluster packets traffic / node

1k

SM packets traffic / node

3.1k

Total Messages per second

1.4k

Total Presences per second

7k

Four-nodes installation offers consistent increase of the throughput. The maximum user connections was bigger by another 6k of user connections with the breaking point 42k connections when a new wave of presences was triggered.

The installation could cope with the load of up to 42k total connections and the optimal load below 30k total user connections or 7.5k connections per each cluster node.

Memory consumption was again similar to the previous test and the service was handling similar number of user connections per node. Thus the memory usage is consistent and expected.

As in previous test the most active on the machines were session manager, cluster traffic and network I/O threads.

Cluster traffic was still on acceptable level at about 1k packets/second with plenty of space for occasional peak times.

This installation is certainly recommended for both HA and LB services. If you can guarantee even user distribution the service can easily handle quite high load with occasional traffic spikes.

Five nodes

[]

Total max connections

53k

Max connections per node

10.7k

Optimal total connections

<36k

Optimal per node conns

<7.2k

Memory usage

40%

Cluster packets traffic

1k

SM packets traffic

3k

Messages per second

1.8k

Presences per second

7.8k

Five-node installation worked slightly better than expected based on previous tests. I it to fail at about 48k-49k total user connections as this was the point of a new presence wave in the test. To my surprise the installation handled this quite well and continued to work fine until about 50k total user connections. Above 50k it was still working but had difficulty handling the load.

Still, total increase of the system performance and ability to handle higher load was consistent and very good. The optimal load again increased by another 6k user connections for the whole cluster with about 7.2k connections per cluster node.

Memory consumption was the same as in previous test but also the average number of user connections per cluster node was the same so the memory usage was consistent and expected.

As in previous test the most active on the machines were session manager, cluster traffic and network I/O threads.

Cluster traffic was still on acceptable level at about 1k packets/second with plenty of space for occasional peak times.

Interesting notice is that if you look on charts for all tests, you can easily see the traffic increased every 6k new user connections caused by a new presence change wave. However, the more cluster nodes you have the more difficult is to notice the steps. For installation with more nodes the traffic increase is much more smooth and it looks like the traffic spikes are less likely to overload the service.

This installation is certainly recommended for both HA and LB services. If you can guarantee even user distribution the service can easily handle quite high load with occasional traffic spikes.

[]: /files/image/xmpp-sharding/all_53k_10sec.png
[]: /files/image/xmpp-sharding/pink_24k_10sec.png
[]: /files/image/xmpp-sharding/blue_green_27k_10sec.png
[]: /files/image/xmpp-sharding/blue_green_pink_37k_10sec.png
[]: /files/image/xmpp-sharding/blue_green_pink_white_43k_10sec.png
[]: /files/image/xmpp-sharding/all_53k_10sec.png