Project

General

Profile

sess-man/Total queues overflow what does it means?

Andrew Beni
Added 9 months ago

Hi,
looking at the stats I found:

sess-man/Processor: presence-state = , Queue: 0, AvTime: 1, Runs: 148685, Lost: 12063;
sess-man/Total queues overflow = 12063;
total/Total queues overflow = 12063;

what does it means exactly this overflow? Can produce some problem on the server?

thanks


Replies (7)

Avatar?id=6023&size=32x32

Added by Artur Hefczyc TigaseTeam 9 months ago

This might be a problem. But not on the server. More for users.

What it means, is, the server could not handle the load and dropped some packets. Presence packets in this case. Dropped packets are lost and not delivered to the end-user. So this is why this might be a problem for your users - they do not get the presence information about their contacts.

Tigase is prepare for periodic traffic spikes and it keeps packets (messages, presences and other) in queues. Normally all the queues are being emptied right away, however, during spike traffic time, the queue may grow as the processing/delivering is slower than incoming traffic. If the traffic spike is short, then the queue is emptied later after the traffic is back to normal. However, when the traffic spike lasts longer, the queue grows too much and it could consume all the available memory. Therefore, Tigase has limits on the max size of each queue. When the max size is reached, subsequent packets are dropped and it "queue overflow" counter is increased.

There are many ways to avoid queue overflow but the best solution really depends on why this happens on your installation.

Added by Andrew Beni 9 months ago

Thanks for the explanation.

In my use case I have an admin user that can talk with all other users (about 30k growing), and all other users can talk only with the admin user. Do you think could be this the problem?

The admin user connects to server with a pool of connections with different Resources. Instead the client has only one resource.

It could be a problem related to the roster? The admin user that can have big roster, there is any limit?

Do you think I have to set some specific setting with this use case?

Thanks

Added by Andrew Beni 9 months ago

is also possible to increase the queue size?

is controlled by:
http://docs.tigase.org/tigase-server/7.1.2/Properties_Guide/html/#maxQueueSize
?

Avatar?id=6023&size=32x32

Added by Artur Hefczyc TigaseTeam 9 months ago

Andrew Beni wrote:

Thanks for the explanation.

In my use case I have an admin user that can talk with all other users (about 30k growing), and all other users can talk only with the admin user. Do you think could be this the problem?

Yes, it could be a problem. You have 30k users (and growing) talking to a single user. So, you try to squeeze data from 30k connections into a single (or a pool of a few) connection(s). It may be simply not possible to fit data from 30k pipes into single pipe.
Even if you have pool of connections for the admin, are you certain that all connections in the pool are used not just one? Even if all connections in the pool are used are you certain that you can ready data quickly enough on the admin side to empty queues from 30k connections?
Even if you can, there are certain traffic limitations than can be handled on a single connection (or on handful connections).

The admin user connects to server with a pool of connections with different Resources. Instead the client has only one resource.

It could be a problem related to the roster? The admin user that can have big roster, there is any limit?

There are no limits but (or depending on the DB used there might be) for the roster size. However, the roster size greatly affects the traffic. Each time a new resource for the admin connects/disconnects to the server it generates instant traffic of 60k packets or so (it's 2x the size of the roster). I guess, this is the case when queues are being filled up. And the fact that you experience presence queue overflow confirms this.

Do you think I have to set some specific setting with this use case?

I do not know your specific use-case so it is impossible to suggest something. I would need to know what you are trying to accomplish, what is your use-case and I could suggest the most optimal solution.

Avatar?id=6023&size=32x32

Added by Artur Hefczyc TigaseTeam 9 months ago

Andrew Beni wrote:

is also possible to increase the queue size?

Yes but I strongly recommend you do NOT do it.
The queues sizes are automatically calculated and called based on the system you are running Tigase on, number of CPUs, amount of available memory and so on.
If you increase queue size, you may just end-up with OOM and system failure instead.

Added by Andrew Beni 9 months ago

Artur Hefczyc wrote:

Andrew Beni wrote:

Thanks for the explanation.

In my use case I have an admin user that can talk with all other users (about 30k growing), and all other users can talk only with the admin user. Do you think could be this the problem?

Yes, it could be a problem. You have 30k users (and growing) talking to a single user. So, you try to squeeze data from 30k connections into a single (or a pool of a few) connection(s). It may be simply not possible to fit data from 30k pipes into single pipe.
Even if you have pool of connections for the admin, are you certain that all connections in the pool are used not just one? Even if all connections in the pool are used are you certain that you can ready data quickly enough on the admin side to empty queues from 30k connections?
Even if you can, there are certain traffic limitations than can be handled on a single connection (or on handful connections).

The admin user connects to server with a pool of connections with different Resources. Instead the client has only one resource.

It could be a problem related to the roster? The admin user that can have big roster, there is any limit?

There are no limits but (or depending on the DB used there might be) for the roster size. However, the roster size greatly affects the traffic. Each time a new resource for the admin connects/disconnects to the server it generates instant traffic of 60k packets or so (it's 2x the size of the roster). I guess, this is the case when queues are being filled up. And the fact that you experience presence queue overflow confirms this.

Do you think I have to set some specific setting with this use case?

I do not know your specific use-case so it is impossible to suggest something. I would need to know what you are trying to accomplish, what is your use-case and I could suggest the most optimal solution.

Hi,
now is more clear, here some more info: the registered users are about 30k but the connected one are about 1k, this from tigase stats:

c2s/Open connections 913
sess-man/Open user connections 912
sess-man/Open user sessions 883

The use case is this:

  • 2 admin users 1) backendOUT (send messages from our backend to users) 2) beckendIN (receive messages from users)
  • 1k online users growing with 30k registered users growing

There is a pool of connections with different resources on the backendOUT user, I see via session manager all the backendOUT resources connected correctly to the server.

The users cannot talk each others they can only receive messages from backendOUT user and send messages to beckendIN user.

The main goal of the system is sending about 100 messages/second from backendOUT to users.

As you correctly said the overflow is related to presence messages, when a backendOUT connection is made to the server, for each resource (so each backend connection of the pool) are generated 2 x 30k packets, correct? Or 2 x 1k (online users) ?

This generate the presence overflow.

Since the users are not interested to view the online status of the backendOUT connection the lost packet shouldn't generate any particular problem, correct? Is possibile to completely disable the presence? But I think is in the XMPP standard so is not too easy to remove.

We are in a cluster mode, the presence is forwarded to all the cluster nodes? In our use case to you think is better to have a two different "type" of cluster node? Segregating the backend connections from the user connections? In this case the presence overflow will occur only on the nodes where the backend is connected, correct?

Avatar?id=6023&size=32x32

Added by Artur Hefczyc TigaseTeam 9 months ago

Thank you for providing more information. However, before I can really suggest a good solution, I would need to know what you are going to accomplish. What functionality you need from the end-user point of view. Why do you have such a strange workflow - one user (admin or not) for communicating with thousands of other users. I am afraid that this kind of design/implementation will always have the weak point - admin user.

What if you have 100k online users? What if you have 1M online users? Will you still be able to handle this with above implementation? You know, traffic on a single user connection pool is one thing and then you will have to maintain roster (contact list) for 1M users. This is not how XMPP is designed to work. Even if you get it working it will be very hard to scale up. What would be the traffic at 1M online users?

I guess you need the roster just to know user's JIDs where you want to send messages from backendOUT. You also need to know presences of these users, to avoid sending messages to users who are not connected. Users are not interested in seeing the presence of the backendOUT so, you could reduce the traffic by having one-way presence subscription. Yes, this feature us available in XMPP spec, it is just a different subscription type. Instead of using subscription "both" you should have "from" and "to" or, if you want to completely disable presences you can have subscription "none". But then you do not know which users are online/offline, so you would be sending plenty of messages to offline users generating unnecessary traffic.

However, there are other ways, other than using a single user with a huge roster.

My guess is that you have some external third-party system which sends information to users/devices connected to the XMPP Server and which also collects information from all the connected users/devices. There are a few different ways you can do it correctly and efficiently in XMPP. The first thing I suggest you to explore is to use PubSub. It may seem complex and intimidating at first but it really is very simple thing. And it allows you to do your thing in an efficient, flexible and scalable way.

Instead of putting all users into a single huge roster, you can create PubSub channel (node) which can be uni directional or bi-directional communication channel. You can put as many users into a single channel as you want. You decide who can submit messages to the channel and who receives messages from the channel. Large PubSub channels can be distributed over many cluster nodes or you can even maintain multiple PubSub channels and assign smaller groups of users to each channel.

    (1-7/7)