Project

General

Profile

Race condition for offline messages

Daniele Ricci
Added over 3 years ago

I'm having some difficulties receiving offline messages sometimes. Specifically, it happens sometimes that for a given group of offline messages, the first one (that is, the oldest) isn't delivered and it's bounced back to offline storage, inserted again much like it happens when it gets stored for the first time (i.e. client was not detected online).

When the message is one, this happens very often: I had to reconnect like 10 times and then I got the message (seems quite random).

This is just a preliminary analysis of course, I will investigate more and eventually report a bug if it's one, but I'd like your opinion on this first.

Given that OfflineMessages checks for Message.hasConnectionForMessageDelivery() to return true, which checks for active sessions with positive presence (and this behaviour only started when upgrading to 7.0.2 which contained this fix), is it possible that initial presence hasn't been processed yet and OfflineMessages is so fast that it doesn't see an active connection? I mean, packets are processed in parallel right? Couldn't this be the case for a race condition?

This would also explain the pseudorandom behaviour: one time OfflineMessages gets there first, another time Presence gets there first.

Thanks for any insight you might give me.


Replies (3)

Added by Daniele Ricci over 3 years ago

By the way, involved commit is 889b4891bb57c05fb9b9224ae1f124c21a6bd4bf.

Added by Daniele Ricci over 3 years ago

For the moment I implemented a workaround that waits for session's presence for at most 100 ms:

https://github.com/kontalk/tigase-extension/commit/d49f3d889e692aff1211fe8a158b97ecd43064ac

It's a dirty workaround, but it's preventing the issue... I really don't know how to fix that in another way: the impacts on the architecture of Tigase would be too great I'm afraid (we would need something like prioritized processors).

Added by Daniele Ricci over 3 years ago

Oh... just found a note by Andrzej:

https://projects.tigase.org/issues/2561#note-7

I'll keep my workaround for the time being.

    (1-3/3)