Project

General

Profile

Impossible happened :)

Daniele Ricci
Added about 4 years ago

Hello devs,

today I got stuck in some kind of packet loop (Tigase taking lot of CPU and printing out huge quantities of log entries despite the server has a few dozens of users at most, not always active/online).

My Tigase is a patched version which includes a custom c2s/sm command to pass the SSL client certificate object over from c2s to the sm, for later inspection by the authentication plugin [1][2]. By the way, is there a more efficient way of passing the client certificate to a CallbackHandler? Or maybe I should implement these auth check in some other way?

2015-01-14 12:52:09.258 [in_14-sess-man]   SessionManager.processPacket()     INFO:     Impossible happened, please report to developer packet: from=sess-man@localhost, to=sess-man@localhost, DATA=<iq type="set" from="c2s@localhost/198.211.127.209_5999_89.204.137.38_61552" id="c2s--c2s29549" to="sess-man@localhost"><command xmlns="http://jabber.org/protocol/commands" node="CLIENT_AUTH"><x xmlns="jabber:x:data" type="submit"><field var="session-id"><value>69e4f971-f3ea-4ea2-ad01-25d3b92f2063</value></field><field var="peer-certificate"><value>true</value></field><field type="text-multi" var="jids"/><field var="peer-certificate-data"><value>MIIJ4zCCCMugAwIBAgIBATANBgkqhkiG9w0BAQUFADB+MSAwHgYDVQQKDBdPcGVuUEdQIHRvIFguNTA5IEJyaWRnZTFaMFgGA1UEAwxRRGVtYWRkaXMgKE5PIENPTU1FTlQpIDw1ZmQ4MWFjMzgxMjE3NGMxOTVmMmRjOGEwZmNkZDg1MWJkMGIzMTZjQGJldGEua29udGFsay5uZXQ+MB4XDTE1MDExMzIzMDMxOFoXDTE1MDExMzIzMDMxOFowfjEgMB4GA1UECgwXT3BlblBHUCB0byBYLjUwOSBCcmlkZ2UxWjBYBgNVBAMMUURlbWFkZGlzIChOTyBDT01NRU5UKSA8NWZkODFhYzM4MTIxNzRjMTk1ZjJkYzhhMGZjZGQ4NTFiZDBiMzE2Y0BiZXRhLmtvbnRhbGsubmV0PjCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBANIHHae3hRh4F/WVCOw/gsTFWTHGRZsoKJN7fz4vogvLl2FiRj3rF+GvTgi9LXlTk2PJeniGCuZEZanVmsXiTqIRhQGctwBOC1WssXg79m ... , SIZE=3865, XMLNS=null, PRIORITY=SYSTEM, PERMISSION=LOCAL, TYPE=set, connection: XMPPResourceConnection=[user_jid=sess-man@localhost, packets=729967, connectioId=null, domain=localhost, authState=NOT_AUTHORIZED, isAnon=false, isTmp=false].
2015-01-14 12:52:09.258 [in_14-sess-man]   OfflineMessages.savePacketForOffLineUser()  FINEST: Packet for offline user not suitable for storing: from=sess-man@localhost, to=sess-man@localhost, DATA=<iq type="set" from="c2s@localhost/198.211.127.209_5999_89.204.137.38_61552" id="c2s--c2s29549" to="sess-man@localhost"><command xmlns="http://jabber.org/protocol/commands" node="CLIENT_AUTH"><x xmlns="jabber:x:data" type="submit"><field var="session-id"><value>69e4f971-f3ea-4ea2-ad01-25d3b92f2063</value></field><field var="peer-certificate"><value>true</value></field><field type="text-multi" var="jids"/><field var="peer-certificate-data"><value>MIIJ4zCCCMugAwIBAgIBATANBgkqhkiG9w0BAQUFADB+MSAwHgYDVQQKDBdPcGVuUEdQIHRvIFguNTA5IEJyaWRnZTFaMFgGA1UEAwxRRGVtYWRkaXMgKE5PIENPTU1FTlQpIDw1ZmQ4MWFjMzgxMjE3NGMxOTVmMmRjOGEwZmNkZDg1MWJkMGIzMTZjQGJldGEua29udGFsay5uZXQ+MB4XDTE1MDExMzIzMDMxOFoXDTE1MDExMzIzMDMxOFowfjEgMB4GA1UECgwXT3BlblBHUCB0byBYLjUwOSBCcmlkZ2UxWjBYBgNVBAMMUURlbWFkZGlzIChOTyBDT01NRU5UKSA8NWZkODFhYzM4MTIxNzRjMTk1ZjJkYzhhMGZjZGQ4NTFiZDBiMzE2Y0BiZXRhLmtvbnRhbGsubmV0PjCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBANIHHae3hRh4F/WVCOw/gsTFWTHGRZsoKJN7fz4vogvLl2FiRj3rF+GvTgi9LXlTk2PJeniGCuZEZanVmsXiTqIRhQGctwBOC1WssXg79m ... , SIZE=3865, XMLNS=null, PRIORITY=SYSTEM, PERMISSION=LOCAL, TYPE=set

If you need any more info (I'd like to selectively send you log parts - it's much data to send), just ask.

Thank you

--

Daniele

[1] https://github.com/kontalk/tigase-server/commit/14c2d168631f5c663f91137a0b3c802f9253cf9d

[2] https://github.com/kontalk/tigase-server/commit/30caf23a8a78046693f6908d0b4424d1e2a0e098


Replies (12)

Avatar?id=6023&size=32x32

Added by Artur Hefczyc TigaseTeam about 4 years ago

Yes, such a loop may happen code inside Tigase does not handle correctly special or exceptional cases, such as an error packet or broken connections etc... Please examine your custom code to ensure it handles all the cases correctly instead of bouncing packets back.

As for the certificate passing. I think last version of Tigase supports external authentication mechanism, which might be what you are looking for.

Bartosz, please comment on this.

Added by Daniele Ricci about 4 years ago

Thanks for your fast reply.

Artur Hefczyc wrote:

Yes, such a loop may happen code inside Tigase does not handle correctly special or exceptional cases, such as an error packet or broken connections etc... Please examine your custom code to ensure it handles all the cases correctly instead of bouncing packets back.

That's strange... I mean all I did was adding another field (containing the Base64-encoded certificate data) to the already existing command CLIENT_AUTH which already carries JIDs found in the certificate.

SessionManager then re-generates a Certificate object from that data and puts it into session. I didn't change any routing or added new stanzas.

Nevertheless, debugging through all that log data is very difficult, especially because after the fourth file older data is lost; logs grow so fast when this happens that I can't realize it in time to look for the relevant log entries before they are rotated out - can I change that by the way? Maybe some configuration file?

As for the certificate passing. I think last version of Tigase supports external authentication mechanism, which might be what you are looking for.

Bartosz, please comment on this.

If you refer to a new SASL mechanism, I'd like to use EXTERNAL and just provide a custom certificate check behaviour (that is, not limiting it to issuer and JID checks).

I thought of the certificate passing to SM because it's the only way (that is, the session) to reach my custom CallbackHandler.

Added by Wojciech Kapcia TigaseTeam about 4 years ago

Daniele Ricci wrote:

Nevertheless, debugging through all that log data is very difficult, especially because after the fourth file older data is lost; logs grow so fast when this happens that I can't realize it in time to look for the relevant log entries before they are rotated out - can I change that by the way? Maybe some configuration file?

You can configure both file size and count:

basic-conf/logging/java.util.logging.FileHandler.limit=100000000
basic-conf/logging/java.util.logging.FileHandler.count=20

Added by Daniele Ricci about 4 years ago

This issue is really driving me mad.

In the meantime, I've patched this one:

https://github.com/kontalk/tigase-server/commit/a48425d52c247f1a8453254767be3dc6655e053f

To see what happens. I assure you, it's very hard to debug this kind of issues.

Added by Wojciech Kapcia TigaseTeam about 4 years ago

A few questions:

  • does it happen every time?

  • is the issue gone if you remove the additional field from the command?

  • have you made any other changes related to SessionManager handling and/or XMPPResourceConnection (I've briefly went over your github repository history but haven't noticed any significant changes)?

Added by Daniele Ricci about 4 years ago

The patch I've applied 2 days ago doesn't work. And neither returning from inside the catch (it actually makes things even worse, but the problem is that I've not fully grasped all of Tigase internals yet)

Wojciech Kapcia wrote:

A few questions:

  • does it happen every time?

It happens after some time, but it has always led me to restart Tigase.

  • is the issue gone if you remove the additional field from the command?

Well, that might be a problem... the field is needed for authentication, so I can't do that on production where the users are "unstable" enough (e.g. I'm guessing this happens because of faulty mobile connections).

  • have you made any other changes related to SessionManager handling and/or XMPPResourceConnection (I've briefly went over your github repository history but haven't noticed any significant changes)?

No, nothing.

I'm writing some stress tests to run locally to see if I can reproduce the issue with "good" (see: local) connections. If I'll not be successful, I'll try something to simulate broken sockets: maybe I can achieve something with iptables rules, I don't know...

Added by Daniele Ricci about 4 years ago

I've just reproduced the issue by doing a couple of hundred connections and killing them randomly after 2-3 seconds (so they should stop some time after SASL).

Tigase starts printing the same logs over and over, never stopping.

Time to sleep now; tomorrow I will try to reproduce it again, but in a more "controlled" environment.

Added by Wojciech Kapcia TigaseTeam about 4 years ago

OK, I may have an idea what's going on. Please note, that this is a command packet and the impossible happened stem from the fact that the command packets are addressed completely to session manager, which doesn't have connectionId which triggers the issues. Now, before processing in tigase.server.xmppsession.SessionManager.processPacket(tigase.server.Packet, tigase.xmpp.XMPPResourceConnection) takes place, first packet is checked for being a command in tigase.server.xmppsession.SessionManager.processPacket(tigase.server.Packet)

        if (packet.isCommand() && processCommand(packet)) {
            packet.processedBy("SessionManager");
            return;
        }

Now, you've modified the tigase.server.xmppsession.SessionManager.processCommand(tigase.server.Packet):boolean and it seems that flag processing_result = true; is not set for some reason - I would watch logs for whether received packed was treated as a command and also maybe add more debug to the processing of @case CLIENT_AUTH@.

Added by Daniele Ricci about 4 years ago

I'm trying to reproduce it here at work, but I can't. I'll try again at home, probably CPU power and other things were more favorable...

Anyway, the only condition that might make processing_result resulting to false is that the connection object is null... what would happen in that case?

I can see that STREAM_MOVED has processing_result = true outside the null connection check. Why CLIENT_AUTH has it inside the if block?

Added by Daniele Ricci about 4 years ago

Just to let you know that this situation happens very often and that this fixed it:

https://github.com/kontalk/tigase-server/commit/237c5302ef77a64abdfaa1dc06b20a361ab0f779

(it's the same as the other commit, just rebased)

You should include this as a hotfix to 7.0.0, it's a severe bug.

Do you want me to open an issue?

Avatar?id=6023&size=32x32

Added by Artur Hefczyc TigaseTeam about 4 years ago

Yes, please open an issue, thank you very much.

    (1-12/12)