Load tests season


Artur Hefczyc TigaseTeam
Added almost 5 years ago

It looks like we have a season for load tests. High number of people contact me asking for some help and advice on the load tests running.

Most of cases are about comparing different server implementations and I am not going to talk about comparing. Comparing different server implementation is very difficult. So many factors and elements must be taken into consideration for a fair comparison. Even generating exactly the same database for each server is quite difficult task.

I would like to give a few hints about running load tests over the Tigase server. What needs to be considered, what settings are important, how to tweak the system to get best possible performance. How to find system bottlenecks and how to deal with them....

All these suggestion are put in a random order and I am writing them from the top of my head. Please don't hesitate to send me your comments about it. I will try to update this guide in time.

  1. Load generators. Most people use Tsung for load tests. This is a great tool and I use it too. However there are a few things which need to be taken into consideration.
    • Tsung doesn't validate data. I mean, if the user has a roster with 100 contacts and 50 of them are online Tsung won't tell you whether all 50 presences were really send by the server and received by Tsung.
    • Tsung needs plenty of resources. You really need to monitor machines running Tsung during tests. In many cases strange load tests results come from the fact that the load generating machines are overloaded.
  2. Logs. Logs are good during the initial phase of setting things up. However during load tests they just destroy your results. The server can easily get overloaded because it is just incredibly slowed down by logging.
  3. Database. Make sure the database is not the bottleneck. Don't really use the embedded - Derby database for load tests!! Whatever you say or whatever you think about MySQL database, this is the best choice for load tests. This is because it is much faster than others. Have a look at the MySQL config file I used for my tests: my.conf. If you can optimize it for speed even further let me know. If you use 'tigase-auth' authentication connector be aware that it uses stored procedures for user authentication which call update. You might want to remove all update call from these SP to speed DB access even further.
  4. Monitor the server during tests to find bottlenecks. You can connect to the Tigase server using Jconsole. This gives you insight in CPU and memory usage which is a very basic information and certainly not enough to find what is really going on. Tigase server statistics can give you a very detailed information about what is going on inside the se
    You have to use Psi client.
    • Browse the server service disco.
    • Find 'Server statistics' position
    • You can double click on this position to get basic server statistics.
    • You can expand this position to see a list of the server components.
    • Double click on the: 'sess-man' position
    • Change the 'Stats level:' to FINEST and click 'Finish'
    • Statistics will be refreshed with a long list of detailed elements.
    • Scroll down to 'sess-man/Processor:' items. Each item has detailed information:
      "Queue: 0, AvTime: 1, Runs: 183708, Lost: 0"
      This is what is the most important for you:
      • Queue: 0 - means the processor internal queue current size is 0. This is good. If it starts growing it means the processor can't cope with the load and you likely found a bottleneck.
        There might be a short time peaks and if they are eventually processed this is correct.
      • AvTime: 1 - mean the processor average processing time is 1ms. This is good. If for some processors this time is big then you likely found a bottleneck.
      • Runs: 183708 - this is just a counter how many packets the processor handled.
      • Lost: 0 - this is lost packets counter. The processor internal queue has a maximum size if it can not cope with the load the queue is overloaded and no more packets can be inserted into the queue to prevent out of memory errors. So if the queue is overloaded and the packet can not be inserted into the queue the packet is lost and this counter shows how many packets were lost. If this counter is bigger then 0, you have a problem with your installation
        and this is likely the bottleneck.
  5. If you find a bottleneck in one of the packet processors you can try increase number of threads for this processor. Tigase is highly multi-threaded server but it adjusts number of threads depending on the CPU/core number on the machine. For dual-core machine the default settings might not be good enough.
    Have a look at your logs/tigase-console.log. You can find there a list of plugins loaded and number of threads assigned for each plugin:
    Loading plugin: jabber:iq:register=2 ...
    Loading plugin: jabber:iq:auth=2 ...
    Loading plugin: urn:ietf:params:xml:ns:xmpp-sasl=2 ...
    Loading plugin: urn:ietf:params:xml:ns:xmpp-bind=2 ...
    Loading plugin: urn:ietf:params:xml:ns:xmpp-session=2 ...
    Loading plugin: roster-presence=4 ...
    Loading plugin: jabber:iq:privacy=2 ...
    Loading plugin: jabber:iq:version=2 ...
    Loading plugin: ...
    Loading plugin: starttls=2 ...
    Loading plugin: msgoffline=2 ...
    Loading plugin: vcard-temp=2 ...
    Loading plugin: ...
    Loading plugin: jabber:iq:private=2 ...
    Loading plugin: urn:xmpp:ping=2
    You may find that the default 2 threads for plugins using database is too small. You can increase the number of threads. Here is a guide how to do it.
  6. if you increase number of threads for plugins using DB it also does make sense to increase DB connection pool size.