Project

General

Profile

Messages with AMP expired tag delivered after the expiration

Luca Stucchi
Added almost 4 years ago

Hi there,

I am using tigase 7.1 snapshot with stream management to deliver a message with an AMP tag like this one:

<message to='user@sub.acme.org/Smack' from='admin@acme.org/Server-2' id='4711237392056728779'>
  <subject>Subject</subject>
  <body>body</body>
  <request xmlns='urn:xmpp:receipts'/>
  <amp xmlns="http://jabber.org/protocol/amp">
    <rule action="drop" condition="expire-at" value="2015-06-09T08:57:26Z"/>
    <rule action="drop" condition="match-resource" value="other"/>
  </amp>
</message>

I am testing that the message won't be delivered once the expire time will be reached.

If the client is not logged, the message will just be dropped, as expected.

If the client is logged, then the connection fall (I simulate it on a mobile phone disconnecting it) then the stream management plugin keeps the message in memory (it is not stored on the msg_history) but, even 10 minutes after the expire date, when I re-enable the connectivity to my device, it automatically reconnects to the server and GETS THE MESSAGE even if it is more than expired.

My questions are:

  • is this an expected behavior ?

  • is it possible that the stream management processor does not check / fails to check the expire date ?

  • as far as you know, is there anything that I can do in order to have the message dropped ?

This seems to me like a server-side issue, what do you guys think about it ?

Thanks in advance,

Luca


Replies (10)

Added by Andrzej Wójcik IoT 1 CloudTigaseTeam almost 4 years ago

Hi,

I think you have more than one issue here but I will explain it later step by step.

From my point of message should be delivered to user as XMPP client was connected to XMPP server. I know that connection was broken but Stream Management is responsible for resuming XMPP stream so while connection was broken stream was still established. So from my point of view this is ok.

But as I mentioned I think you have other issues here as by default Stream Management resumption is set only to 60 seconds and if connection would be resumed after this time then message would not be delivered, as it would be saved to msg_history table and AMP rules would be checked once again as new XMPP stream would be established and this message would be dropped as you expected.

So either you increased Stream Management resumption timeout in configuration (which is now causing this not expected by you behavior) or Tigase XMPP Server never discovered that TCP connection to client was broken so resumption was not started, which allowed client to resume XMPP stream long after set resumption timeout. This issue with failure to discover that XMPP/TCP connection is broken is know and caused by paramters used by operating system to decide whether connection is broken. You can find at our forums informations how you can tune your operating system to speed up discovery of failure of XMPP/TCP connection.

Added by Luca Stucchi almost 4 years ago

Hi Andrzej,

let's start from the disconnection detection: I followed the Tigase Administration Guide / Linux Settings for High Load Systems information (http://docs.tigase.org/tigase-server/7.0.0/Administration_Guide/webhelp/_tcp_keepalive.html) to tune my OS to speed up discovery.

# sysctl -w net.ipv4.tcp_keepalive_time="60"
net.ipv4.tcp_keepalive_time = 60
# sysctl -w net.ipv4.tcp_keepalive_probes="3"
net.ipv4.tcp_keepalive_probes = 3
# sysctl -w net.ipv4.tcp_keepalive_intvl="90"
net.ipv4.tcp_keepalive_intvl = 90

With the values indicated here the disconnection is detected in 10 minutes. I know it's not a fault from Tigase since this is driven by OS settings, but 10 minutes is not a short time, if there was a way (any way) to shorten this detection time, it would be a wonderful way to work on reliability.

We didn't change the Stream Management Resumption, so my expectations are that my message will be stored after 11 minutes (since the 60 seconds of resumption timeout is counted AFTER the disconnection detection, am I right ?)

I understand your point when you say that because the message was sent the client was online when the message was sent, the message did not violate the AMP rule, so at the time it was a go.

But frankly I cannot think that delivering a message 10 minutes after the expire time may be considered correct by the end user, and even more importantly by the business rules we are implementing. I know AMP and SM are different XEPs, but when someone use AMP, the expectation is "... servers to perform advanced processing of XMPP message stanzas, including reliable data transport, time-sensitive delivery, and expiration of transient messages." (From http://www.xmpp.org/extensions/xep-0079.html ).

And this scenario looks to me like violating the time-sensitive delivery and the expiration of transient messages, even if we are still using AMP.

Reducing the disconnection detection time could mitigate the risk of delivering expired messages, and likewise reducing the stream resumption timeout, but we could not have any assurance on those two requirements. As I see it, the most secure way to implement this would be performing a check on AMP rules right before sending a message after the stream resumption, either directly (but that would be mixing XEPs and duplicating code), or passing the message to the AMP processor in order to check any AMP rule and react in the expected way.

Do you think this makes sense ? Do you see any problem in implementing such a feature in Tigase ? In this way Tigase would offer a truly robust solution strictly implementing both AMP And SM.

Looking forward to hearing from you

All the best,

Luca

Avatar?id=6023&size=32x32

Added by Artur Hefczyc TigaseTeam almost 4 years ago

Hi Luca,

A few general remarks, not necessarily directly related to this particular issue.

But frankly I cannot think that delivering a message 10 minutes after the expire time may be considered correct by the end user, and even more importantly by the business rules we are implementing.

And this scenario looks to me like violating the time-sensitive delivery and the expiration of transient messages, even if we are still using AMP.

It can be correct. It all depends on the requirements and tolerance margins that are put on the service. So, maybe the discussion should start from what are your requirements and tolerance margins.

Before getting into the actual issue in Tigase discussed here, I would like you to realize that there are some limitations out of Tigase and out of your control. Even TCP/IP layer does not give you any guarantees that a message is delivered within seconds or even minutes. A simple test: connect an XMPP client to an XMPP server on a different machine in LAN. Unplug ethernet cable and send a message from a client to the server. Wait 1 minute and plug the ethernet cable back. Message to the server arrives and neither the client or the server notice connectivity problem or delay. Of course for this test to work there are certain conditions that must be met, specific OS, specific OS settings, specific HW. In same cases it may work in some other it may not. And this is exactly what I mean, there are many factors that are out of your control. I mean out of your client software control and out of Tigase control. Please note, we are talking here only about LAN, if we get into real Internet it gets much more complex and unpredictable.

If you have mobile clients it gets much, much worse and you can forget about above 10 minutes. There is no guarantee, no control. Everything is in hands of mobile providers who like to drop connections from mobile devices in such a way that both the server and the client believe the connection is good.

Of course, message delivery within minutes on the TCP/IP level is extremely rare case but it still can happen and it is much more likely to happen on the XMPP level due to circumstances described in this topic. But still it should not happen often.

Now, the question is, if there are so many factors out of our control can we do anything about it to get 100% correct delivery? And the answer is similar to the security topic. Can we have 100% security? As far as I know, the correct answer is: We cannot have 100% security but we can get very close. However, the closer we get to the 100% the costs gets significantly higher. At some point costs are prohibitively high that it does not make business sense to go further.

This is why I asked the question about your requirements and tolerance margins. Once this is answered we can start thinking what we need to have or to do to meet these requirements.

Added by Luca Stucchi almost 4 years ago

Hi Artur

thanks for the explanation and for the patience. Let me say it once more, I really appreciate the always-open line with the Tigase team, and since I really like Tigase as a product, I am very happy of my experience using Tigase !

Let me explain a little more what I am trying to achieve with the system I am developing, of which Tigase is, obviously, one of the most important components:

I am implementing a layer or "reliability" and "traceability" on top of XMPP, and in particular I am overloading AMP tag with a new meaning, that is "if the message is not delivered when it expires, my system could perform some other operation to deliver the information" (outside the XMPP protocol) together with "If I will not be able to deliver the message, I will tell it to you". That's why I chose "DROP" in AMP tag: if I won't get the message, it's not a big deal, I could even have other ways to deliver the information. This because I am fully aware that communication to mobile devices is HARD, and there can be a million of reason for the message not to be delivered to the recipient before the expiration of the validity, so I would never dare to state that I will deliver 100% of the messages, that would be too optimistic, and I have absolutely no problem with that: I will do my best to deliver the message in time, but I can't grant the recipient will get it 100% of the times.

Said that, let me return on the issue we are discussing: while I am aware that the mobile world is a wild world, I chose to use the state of the art mechanism to grant time-sensitive delivery and expiration of transient messages (taken from AMP description). And I chose to drop messages when they are expired. In fact, when the client is correctly logged off, the message is saved in the msg_history and once processed will not be delivered. So far, so good.

But with stream management activated and in the situation described in the original post something unexpected happens: a message (stored in memory by SM) can be delivered after its expiration date. For the moment let's ignore the amount of time passed after the expiration: a message leaves the server heading toward the client in a moment in which the server could drop it because it is expired. Pass me the analogy, some minutes ago I was thinking at a similar scenario in the real world, that could be:

It's 9AM on sunday morning and I want to send an SMS to my gran mother saying "Let's meet at mass, at 10AM sharp".

But I am so tired that I fall asleep right before pressing the "send" button.

I wake up at 10:30 and press it, sending the message "Let's meet at mass, at 10AM sharp".

The problem is that it's too late, she's already at mass and not only the offer does not make any sense, but will make her quite angry because the phone will ring in the middle of the mass

What I think it would be wise for me, is checking the time right before pressing the send button, and likewise, the Stream Management component could check the AMP expire date in order not to send messages which are already expired.

That is what I was proposing, in order to prevent the delivery of messages that are already expired. About "how to do it", well, I think you guys are far more skilled in exploiting Tigase features than I will ever be... I was thinking about passing the message to a component that checks the AMP rules and react to them in the correct way, but that is an high level idea only.

This is not the solution for all the bad things that could happen trying to deliver message to a mobile device, but it perfectly addresses the problem of sending messages that are already expired when using stream management.

What do you think about this strategy ? Wouldn't it be an improvement to the reliability of the messaging solution ?

All the best,

Luca

Avatar?id=6023&size=32x32

Added by Artur Hefczyc TigaseTeam almost 4 years ago

Andrzej, do you have any suggestions how we could improve message handling by SM in such a use-case?

Maybe something not even directly related to AMP but maybe something more generic? For instance we have Packet priority metadata already, which affects the way it is handled. I can imagine some other metadata that could be added to Packets that can affect processing and handling. Some time ago we were thinking of "max hops" similar to what is in TCP/IP which could help us to prevent infinite loop processing, another potentially useful feature could be TTL (time to live) which could specific for how long a packet can live inside Tigase before it is either delivered to dropped.

The TTL has actually quite a few applications. When the server is overloaded or is just slow with processing certain type of requests (user login due to a slow DB) there are being long queues created, then at some point it does not make sense to process auth packets as authentication timeout already expired anyway. Similarly for other packets types we can think of max TTL to improve Tigase's data handling. And in this particular case of AMP and stream processing, the TTL could be potentially used to prevent packet delivery when the message is expired.

Added by Andrzej Wójcik IoT 1 CloudTigaseTeam almost 4 years ago

As for idea with adding a TTL, I would be against this as TTL will tell allow us only to drop message when it expires while it would not help if for expire-at we would have alert set as an action. We still would not have support for this.

Right now I understand requirements for this project but from my point of view AMP and StreamManagement work as expected as I explained above.

If we change current way of handling of AMP message by Stream Management we can have some new issues. Below is list of issues this may create (at least this are issues I can think about right now):

We can have some case in which message would be processed by AMP (and other processors ie. Message Carbons, Message Archiving processor) and all this processors would think that XMPP stream was active as C2S will not notify them about any issues due to fact that it may wait for resumption. Even if on resumption we would check AMP rules once again, message might be already delivered to Message Archive and archived as delivered.

Also if we would specify AMP rules to alert if message is delivered directly and drop if message is expired then we would have another issue here, as sender would get notification with confirmation of delivery of message while message would be dropped on second check of AMP rules. This would create a mess.

Also we may have a case in which message with expire-at would be sent to client as connection would be valid and there will be still possible to deliver this message due to AMP rule, but StreamManagement would not receive confirmation from client. In this case if we do second processing of AMP rules then we may send to sender information that message was not delivered (@notify@ action on @expire-at@) while in fact message could be delivered but not acked. We still have an issue here.

Also sending messages back to AMP component for processing may change order of some packets, ie. new messages might be delivered to client before non acked messages from resumed stream would be sent to client.

From my point of view we possibly could change logic to resend messages for processing by AMP component when they are not delivered and stream resumption succeeded but I would prefer to have this done only if some new parameter is set as from my experience usually stream/connection failure is discovered rather fast and as I pointed out above I think current way of handling AMP messages by StreamManagement is OK.

Added by Luca Stucchi almost 4 years ago

Hi Artur, Andrzej,

thanks for the time you spent in figuring out a possible solution. I was pretty sure that there was more than a possible way to solve it, and I understand that some is better than others.

@Andrzej, talking about your proposal in the last rows:

From my point of view we possibly could change logic to resend messages for processing by AMP component when they are not delivered and stream resumption succeeded but I would prefer to have this done only if some new parameter is set as from my experience usually stream/connection failure is discovered rather fast and as I pointed out above I think current way of handling AMP messages by StreamManagement is OK.

I completely agree with the idea of having a new parameter to set to activate this further check, aware that this is not a feature everyone needs, since it's related to a stricter-than-usual expiration policy, I think that something like "AMP expiration check on steam resumption" could be clear enough, what do you think ?

I really appreciate your help in this matter, and I am pretty sure that this feature will be VERY useful to any solution willing to focus their attention to communication reliability and time sensitiveness of the delivery ! That's one of the reasons we chose XMPP protocol, AMP and Tigase at first, so I wouldn't be surprised that other companies could follow our same path.

Please let me know if I can help you in any way, maybe with testing or validation !

All the best,

Luca

Avatar?id=6023&size=32x32

Added by Artur Hefczyc TigaseTeam almost 4 years ago

Andrzej Wójcik wrote:

As for idea with adding a TTL, I would be against this as TTL will tell allow us only to drop message when it expires while it would not help if for expire-at we would have alert set as an action. We still would not have support for this.

Andrzej,

My intention was not to replicate full AMP within Packet metadata. Instead, I only thought about TTL and only because the TTL parameter could be kind of useful in other cases mentioned above.

Added by Luca Stucchi over 3 years ago

Hi Artur and Andrzej ,

I just implemented some logic in my XMPP client to prevent problems in the case of delivery of a message after the AMP expire time.

But it's stil a (nasty) patch, and I just wanted to ask you if you could give me a confirmation that the developments of the TTL will be, sooner or later, integrated in Tigase.

Of course I am not in a position to ask you for an ETA, so I won't, but I just wanted to be sure that one shiny day we could count of it as an additional layer of time-sensitive delivery offered by tigase. What do you think about that ?

All the best,

Luca

Avatar?id=6023&size=32x32

Added by Artur Hefczyc TigaseTeam over 3 years ago

To be honest I am not certain yet on ETA. I am not sure what is the best way to approach this problem. We are still discussing what would be the best solution. TTL is just one of many similar issues, even AMP has more options and there are other use-cases besides AMP.

    (1-10/10)