Project

General

Profile

performance issue, message-router and sess-man have high in queue wait

cena cena
Added over 3 years ago

Hi, we deploy a single tigase server, normally there are around 1500 active clients. But after running for several hours the server will get a very poor performance. it will cost a long time to log in or receive message, even unable to log in any longer.

we use psi'service discovery and find that the in queue wait is very high, but every secend there are only about 300 packets, and the cpu, memory, and I/O is low.

below is our environment:

Tigase version: 5.2.1, 5.2.3, 7.0.2, we tried all the release version.

JDK: oracle's jdk-7u45-linux-x64

8 cores CPU and 16GB RAM, Ubuntu 12.04

the init.properties:

config-type=--gen-config-def
--admins=admin@tt.com 
--virt-hosts =tt.com 
--sm-plugins=session-close=16,session-open=16,default-handler=8,jabber:iq:auth=24,urn:xmpp:ping=20,presence=24,-msgoffline,-amp,+message=16
--new-connections-throttling=5222:5000
--user-db=mysql
--user-db-uri=jdbc:mysql://172.168.6.203:3306/tigasedb?user=root&password=root&useUnicode=true&characterEncoding=UTF-8
--cm-traffic-throttling=xmpp:0:0:disc,bin:0:0:disc

the jvm configure in tigase.conf

OSGI=false
ENC="-Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 -Djava.nio.channels.spi.SelectorProvider=sun.nio.ch.EPollSelectorProvider"
DRV="-Djdbc.drivers=com.mysql.jdbc.Driver"
GC="-XX:+UseBiasedLocking -XX:+UseConcMarkSweepGC -XX:+UseCMSCompactAtFullCollection -XX:CMSMaxAbortablePrecleanTime=500 -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled -XX:ParallelCMSThreads=8 -XX:-ReduceInitialCardMarks"
EX="-XX:+OptimizeStringConcat -XX:+DoEscapeAnalysis -XX:+UseNUMA"
JAVA_HOME="${JAVA_HOME}"
JAVA_OPTIONS="${GC} ${EX} ${ENC} ${DRV} -server -Xms8192M -Xmx8192M -XX:PermSize=256m -XX:MaxPermSize=256m -XX:MaxDirectMemorySize=128m "

Replies (8)

Added by Wojciech Kapcia TigaseTeam over 3 years ago

What is your use-case / workflow? From the statistics:

sess-man/Processor: session-close = , Queue: 106266, AvTime: 2, Runs: 494847, Lost: 81057
sess-man/Processor: jabber:iq:auth = , Queue: 164990, AvTime: 3, Runs: 870933, Lost: 49737

Have you measured the performance of the database?

Added by cena cena over 3 years ago

We use XMPP to realize the intelligent device remote control. Because the devices' hardware is different, some clients' software are smack, some clients' xmpp are implemented by ourselves with C. And the clients may disconnect and reconnect from time to time caused by the software's connect mechanism or hardware or unstable network.

And the DB's performance is ok when the in queue wait is high

Added by cena cena over 3 years ago

One thing confuse me is that the avtime is 3ms and the threads number are 24. How does the queue become high while the packets per second is around 300?

Added by cena cena over 3 years ago

Could somebody give me an advice? Thanks very much!

Added by Steffen Larsen over 3 years ago

Just a comment. I am running more or less the same specs as yours and I am having around 80k user online pr server.

My config does not have the --sm-plugin and the --cm-trafic flag as your and I am using a web service as authentication.

Added by Steffen Larsen over 3 years ago

Have you checked the OS settings: TCP/IP stack? ulimit for processes and files?

Added by cena cena over 3 years ago

Steffen Larsen wrote:

Have you checked the OS settings: TCP/IP stack? ulimit for processes and files?

the ulimit is 65535

This is the /etc/sysctl.conf

net.ipv4.tcp_keepalive_time=60

net.ipv4.tcp_keepalive_probes=2

net.ipv4.tcp_keepalive_intvl=15

net.ipv4.tcp_retries2=3

fs.file-max = 1048576

net.ipv4.ip_local_port_range =1024 65530

net.core.somaxconn = 2048

net.core.rmem_default = 262144

net.core.wmem_default = 262144

net.core.rmem_max = 16777216

net.core.wmem_max = 16777216

net.ipv4.tcp_rmem = 4096 4096 16777216

net.ipv4.tcp_wmem = 4096 4096 16777216

net.ipv4.tcp_mem = 786432 2097152 3145728

net.ipv4.tcp_max_syn_backlog = 16384

net.core.netdev_max_backlog = 20000

net.ipv4.tcp_fin_timeout =15

net.ipv4.tcp_tw_reuse = 1

net.ipv4.tcp_tw_recycle = 1

net.ipv4.tcp_syncookies = 1

net.ipv4.tcp_timestamps = 0

net.ipv4.tcp_max_orphans = 131072

kernel.pid_max = 800000

vm.max_map_count = 861072

kernel.threads-max = 800000

Added by Wojciech Kapcia TigaseTeam over 3 years ago

cena cena wrote:

And the DB's performance is ok when the in queue wait is high

  • are there any exceptions in the logs?

  • are there any deadlocks?

One thing that stands out as well is

sess-man/Processed packets thread: in_39-sess-man = 43301
sess-man/Processed packets thread: in_40-sess-man = 36189
sess-man/Processed packets thread: in_41-sess-man = 40649
sess-man/Processed packets thread: in_42-sess-man = 1537011
sess-man/Processed packets thread: in_43-sess-man = 52622
sess-man/Processed packets thread: in_44-sess-man = 50031
sess-man/Processed packets thread: in_45-sess-man = 44245

    (1-8/8)