Project

General

Profile

TLSIO.writeBuff() Infinite loop detected and Queue overflow

Andrew Beni
Added about 1 year ago

Hi,
I was making some stress test, with a client offline I sent about 100k message, when I try to connect the client I receive (with a custom SW o psi):

[in_22-c2s] TLSIO.writeBuff() WARNING: Infinite loop detected in writeBuff(buff) TLS code, tlsWrapper.getStatus(): NEED_READ, buff.remaining(): 16 io: TLS: c2s@localhost/172.31.17.24_5222_1.6.1.2_55441 Socket[addr=/1.6.1.2,port=55441,localport=5222]

I read in the forum that that problem happen with a big chunk of data to be downloaded but how can resolve it? There is some config to change? What is the limit of the number of message that can be downloaded without this problem?

my params:
7.1.1-SNAPSHOT-b4433

java -version
java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)

statup options:
-Xms2G -Xmx24G -XX:PermSize=256m -XX:MaxPermSize=1G -XX:MaxDirectMemorySize=256m

I would like also to know what does exactly means "sess-man/IN Queue overflow" I have read

The number of times the in or out queue has overflown for this component. That is there are more packets queues than the max queue size. A total statistic is also available that combines both results.

but what does it means exactly, the overflow packets are dropped?

thanks


Replies (12)

Added by Andrew Beni about 1 year ago

I saw the hack in the code for the old versions of Java


        // The loop below falls into infinite loop for some reason.
        // Let's try to detect it here and recover.
        // -- After some tests....
        // Looks like the cause has been detected. Sometimes the loop
        // below is executed a few times for some reason. It happens that
        // the tlsWarpper.getStatus() returns NEED_READ and it doesn't
        // accept any more data from the input buffer.
        // The proper handling would need reading from the socket to
        // reset TLS to the correct state, but this involves another problems.
        // What to do with possible user data received in such a call?
        // It happens extremely rarely and is hard to diagnose. Let's leave it
        // as it is now which just causes such connections to be closed.
        int loop_cnt = 0;
        int max_loop_runs = 100000;

        do {
            if (tlsWrapper.getStatus() == TLSStatus.NEED_READ) {

                // I wonder if some real data can be read from the socket here (and we
                // would
                // loose the data) or this is just TLS stuff here.....
                ByteBuffer rbuff = read(ByteBuffer.allocate(tlsWrapper.getNetBuffSize()));
            }

            ByteBuffer tlsOutput = ByteBuffer.allocate(tlsWrapper.getNetBuffSize());

            // Not sure if this is really needed, I guess not...
            tlsOutput.clear();
            tlsWrapper.wrap(buff, tlsOutput);

            if (tlsWrapper.getStatus() == TLSStatus.CLOSED) {
                throw new EOFException("Socket has been closed.");
            } // end of if (tlsWrapper.getStatus() == TLSStatus.CLOSED)

            tlsOutput.flip();
            wr = io.write(tlsOutput);
            result += wr;

            if (log.isLoggable(Level.FINEST)) {
                log.log(Level.FINER, "TLS - Writing data, remaining: {0}, {1}",
                        new Object[]{buff.remaining(), toString()});
            }

        } while (buff.hasRemaining() && (++loop_cnt < max_loop_runs));

        if (loop_cnt > (max_loop_runs / 2)) {
            log.warning("Infinite loop detected in writeBuff(buff) TLS code, " + "tlsWrapper.getStatus(): " +
                                tlsWrapper.getStatus() + ", buff.remaining(): " + buff.remaining() + " io: " +
                                toString());

            // Let's close the connection now
            throw new EOFException("Socket has been closed due to TLS problems.");
        }

this version require java 1.8, maybe you could remove the hack!
I image that there is no flag and to fix it must rebuild it

regards

Added by Andrew Beni about 1 year ago

is possible to delete all the message for a specif user via some admin task?

thanks

Added by Andrew Beni about 1 year ago

Hi, I'm still logging a lot of TLSIO.writeBuff() WARNING: Infinite loop detected in writeBuff(buff) TLS code

this problem attects only TLS and not SSL, correct?

regards

Added by Andrew Beni 12 months ago

Hi, any kind of advise for this problem?
regards

Added by Daniel Wisnewski IoT 1 Cloud 12 months ago

Hi Andrew, can you share some more from your logs? You can share your tigase.log.0 file here, or upload some of the text to give us context, thanks.

Added by Wojciech Kapcia TigaseTeam 12 months ago

Hi Andrew,
unfortunately this part of code can't be simply removed. It could be reworked to better handle some border-cases but it's further down the line on our to-do list.

Andrew Beni wrote:

I would like also to know what does exactly means "sess-man/IN Queue overflow" I have read

The number of times the in or out queue has overflown for this component. That is there are more packets queues than the max queue size. A total statistic is also available that combines both results.

but what does it means exactly, the overflow packets are dropped?

Exactly that - you have a queue and if it wasn't possible to insert packet into queue then the packet is dropped (increasing the overflow metric).

Andrew Beni wrote:

is possible to delete all the message for a specif user via some admin task?

Currently there is no such admin ad-hoc.

For the future reference - could you create single topic for each item? Thanking you in advance!

Added by Andrew Beni 12 months ago

Daniel Wisnewski wrote:

Hi Andrew, can you share some more from your logs? You can share your tigase.log.0 file here, or upload some of the text to give us context, thanks.

Hi,
thanks for the response, I don't see any particular additional log in tigase.log.0 around the "Infinite loop detected", I see only a lot of:

SessionManager$StaleConnectionCloser.closeConnections() INFO: Trying to find and remove stale XMPPResourceConnections

but nothing more.

I saw in my config an old setting:
c2s/net-buffer-limit[I]=4194304

I have removed it

Anyway do you think the problem is fixed using SSL instead of TLS?

regards

Added by Andrew Beni 12 months ago

Wojciech Kapcia wrote:

Hi Andrew,
unfortunately this part of code can't be simply removed. It could be reworked to better handle some border-cases but it's further down the line on our to-do list.

Hi,
I understand, to give you an idea I have got it logged 378 times in a week running the server with few users.

Maybe the version 7.1.3 improve this problem? I'm still using the 7.1.1

regards

Added by Andrew Beni 12 months ago

I have upgraded to 7.1.3 but the problem is still present

Added by Wojciech Kapcia TigaseTeam 12 months ago

Andrew Beni wrote:

I saw in my config an old setting:
c2s/net-buffer-limit[I]=4194304
I have removed it

Correctly - I would discourage from manually tweaking more advanced settings if it's not completely required

Anyway do you think the problem is fixed using SSL instead of TLS?

Pure SSL is considered legacy and not recommended. It won't sole the issue with the entries.

Andrew Beni wrote:

I understand, to give you an idea I have got it logged 378 times in a week running the server with few users.

What are the client software that you are using to connect to the server?

Maybe the version 7.1.3 improve this problem? I'm still using the 7.1.1

Mostly unlikely.

Added by Andrew Beni 12 months ago

Wojciech Kapcia wrote:

Andrew Beni wrote:

I saw in my config an old setting:
c2s/net-buffer-limit[I]=4194304
I have removed it

Correctly - I would discourage from manually tweaking more advanced settings if it's not completely required

yes, it was an old bug
http://www.tigase.net/blog-entry/uncontrolled-resource-consumption-highly-compressed-xmpp-messages
anyway I removed it

Anyway do you think the problem is fixed using SSL instead of TLS?

Pure SSL is considered legacy and not recommended. It won't sole the issue with the entries.

ok, good to know

Andrew Beni wrote:

I understand, to give you an idea I have got it logged 378 times in a week running the server with few users.

What are the client software that you are using to connect to the server?

XMPPFramework and Smack

not the latest versions, do you think it could be a client issue?

Maybe the version 7.1.3 improve this problem? I'm still using the 7.1.1

Mostly unlikely.

yes, same problem also with 7.1.3

Added by Andrzej Wójcik IoT 1 CloudTigaseTeam 12 months ago

I wouldn't say that this is a client issue, but it may be caused by a lot of data being sent to the client while client connection is poor/slow or client is unable to read data from the socket in a timely manner. It may be also caused by a lot of packets being sent to the single client or a small number of packets which are huge.

However, I've also seen those entries in some cases when the old version of OpenSSL was used by the client.

You mentioned that you were making a stress test. Was it done on the same installation? If so, are you sure that all old messages from the stress test were delivered so they are not generating this traffic?

    (1-12/12)