Project

General

Profile

Tigase Cluster? Presence

John Catron
Added over 4 years ago

We are experiencing a new clustering presence issue.

Essentially an account that has 50 buddies. Where 28 of those buddies are logged in. Sometimes logs in and sees 28 buddies but sometimes it only sees 25 and sometimes it only sees 15 or so.

We have investigated and cannot know for sure that clustering has anything to do with this. On of the times my account showed 17/47 people online and then only showed 12/47 and when I checked all 5 of the missing people were logged into the same Tigase instance I was (meaning no clustering involved for the 5 missing people).

Our current setup for this beta environment is 3 Tigase nodes and our only custom plugins are for authentication (nothing for presence or buddies or rosters).

I can exhibit this behavior in our web client as well as standard XMPP clients (Pidgin and PSI) so I'm fairly certain this isn't a client display issue since all of these exhibit the same problem.

We have restarted Tigase on all the servers. We have let it sit overnight and allow everyone to re-login we have even gone so far as to completely uninstall everything and reinstall it to make sure there were no issues. The behavior stays as is.

I uploaded a small screencast where you can see the extremely varying roster sizes over just a few login attempts. http://screencast.com/t/R7tjjwWX

(3 or 4 login attempts you see enough online users to only show 1 offline within the view frame. Then some attempts have 2 offline in view frame and one showed 10 or more offline within view frame.)

When I do this with an external client you can even see that sometimes not only is the presence of buddies incorrect sometimes I actually receive different roster sizes despite not having added or removed buddies for days. http://screencast.com/t/auT4od3Sm

Any insight you guys can offer on this would be amazing.


Replies (9)

Added by John Catron over 4 years ago

Guess I should also note that all 3 Tigase servers use the same DB so varying roster size is extremely confusing.

Added by John Catron over 4 years ago

The logs show nothing abnormal that we can see and there are no errors present.

Avatar?id=6023&size=32x32

Added by Artur Hefczyc TigaseTeam over 4 years ago

This is very strange indeed. I have never seen anything like this and I cannot even imagine what may cause a different roster size between user's logins unless you use DynamicRoster API to generate user's roster. To be honest I do not have any suggestion to you from the top of my head. This is something definitely to investigate. Changing roster size is something very unusual and if this happens then different list of online buddies is somehow expected as online presence heavily depends on the roster content.

If you have Tigase running with server debug enabled, then you should be able to see roster requests/responses exchange between the client and the server. This way you could see what is sent to the client. You can also open Psi XML console and compare the roster content in Tigase logs and the roster content in Psi console. You could also see whether the entire user's roster is sent within a single roster result response or multiple set responses. The set responses should happen only if you use DynamicRoster.

If the entire roster is sent within a single result response then it means this is what Tigase loaded from the database. I do not know why/how Tigase can load a different roster content from DB on each query unless each time roster is loaded form DB on a different connection pool which somehow sees a different DB content (different transaction, different DB cluster node?)

We know too little about your deployment to suggest anything else. If something comes to our mind we will post more suggestions.

Added by John Catron over 4 years ago

The changing roster size seems to have resolved itself. I'll keep you updated if it reappears but I have been unable to reproduce today. My assumption would be it had something to do with caching and the connection pool like one of connections in pool was using cache when it shouldn't or something.

We are still seeing the presence issue though. It is slightly less drastic now that the number of buddies isn't Also fluctuating but the number of people online is still changing by 3-5 sometimes more each time you log in despite no one actually logging in/out.

Added by Wojciech Kapcia TigaseTeam over 4 years ago

Could you try reproducing the issue with disabling functionality that skip presence broadcast to seemingly offline users:

sess-man/plugins-conf/presence/skip-offline=false
sess-man/plugins-conf/presence/skip-offline-sys=false

?

Added by John Catron over 4 years ago

Okay before we were able to reproduce the presence bug at least 25-33% of the time. Since disabling skip-offline I have tried at least 100 times and had 0 signs of it so it seems the skip-offline was indeed the issue. We will continue to test over the next hour so to make absolutely sure as this is a major issue but it appears that fixed our issue.

We'll let you know if we gain any more information on this.

Added by Wojciech Kapcia TigaseTeam over 4 years ago

Could you provide information which strategy you use as well if the issue is reproducible in multi-node setup with all users connecting to single node and single node setup? (without disabling skip-offline)

Added by Subir Jolly over 4 years ago

I just tried to reproduce this issue on a single node(no cluster) and was able to reproduce this issue. We are using the default clustering strategy. I did not test for all users connecting to one node in multi-node setup.

Avatar?id=6023&size=32x32

Added by Artur Hefczyc TigaseTeam over 4 years ago

Thank you for all the details. If this is reproducible in a single mode (no cluster) then we should be able to track it down and fix it. I have added a bug report for the issue: #1962. Please add yourself to watchers for the ticket to be notified about progress.

    (1-9/9)