Project

General

Profile

Debug for SessionManagerClustered.closeSession()

Hamid Alimohammadi
Added almost 4 years ago

Hi,

I have been trying to find the reason for some errors in tigase-console log in my tigase cluster installation in acs mode.

To prevent this page from many console error logs, I pasted two error logs.

Actually in my Tigase Monitor dashboard, there has been a red alarm for "Overflow:13800" and sometimes CPU usage is high in nodes. I though the following console log may help to find the reason.

...
2015-05-25 15:26:28.850 [session-close Queue Worker 3]  SessionManagerClustered.closeSession()  WARNING: This should not happen, check it out!,
java.lang.NullPointerException
        at tigase.xmpp.impl.LastActivity.stopped(LastActivity.java:209)
        at tigase.server.xmppsession.SessionManager.closeSession(SessionManager.java:1076)
        at tigase.cluster.SessionManagerClustered.closeSession(SessionManagerClustered.java:503)
        at tigase.server.xmppsession.SessionManager.closeConnection(SessionManager.java:1006)
        at tigase.server.xmppsession.SessionManager$SessionCloseProc.process(SessionManager.java:2483)
        at tigase.server.xmppsession.SessionManager$ProcessorWorkerThread.process(SessionManager.java:2434)
        at tigase.util.WorkerThread.run(WorkerThread.java:128)
2015-05-25 15:28:58.893 [session-close Queue Worker 3]  SessionManagerClustered.closeSession()  WARNING: This should not happen, check it out!,
java.lang.NullPointerException
        at tigase.xmpp.impl.LastActivity.stopped(LastActivity.java:209)
        at tigase.server.xmppsession.SessionManager.closeSession(SessionManager.java:1076)
        at tigase.cluster.SessionManagerClustered.closeSession(SessionManagerClustered.java:503)
        at tigase.server.xmppsession.SessionManager.closeConnection(SessionManager.java:1006)
        at tigase.server.xmppsession.SessionManager$SessionCloseProc.process(SessionManager.java:2483)
        at tigase.server.xmppsession.SessionManager$ProcessorWorkerThread.process(SessionManager.java:2434)
        at tigase.util.WorkerThread.run(WorkerThread.java:128)
...

The following is the init.properties file of one node in the cluster:

config-type = --gen-config-all
--vhost-anonymous-enabled = false
--virt-hosts=xmpp.test.com
--admins = admin@x3.test.com,admin@xmpp.test.com
--user-db = mysql
--user-db-uri=jdbc:mysql://10.10.10.75/tigasedb?user=tigase&password=tigase12
message-archive/archive-repo-uri=jdbc:mysql://10.10.10.75/tigasedb?user=tigase&password=tigase12
--message-archive/auto=true
--cluster-mode = true
--cluster-connect-all=true
--sm-cluster-strategy-class = tigase.server.cluster.strategy.OnlineUsersCachingStrategy
--debug=server,db,archive,muc
--monitoring=jmx:9050,http:9080,snmp:9060
--sm-plugins=+message-archive-xep-0136,+jabber:iq:auth,+urn:ietf:params:xml:ns:xmpp-sasl,+urn:ietf:params:xml:ns:xmpp-bind,+urn:ietf:params:xml:ns:xmpp-session,-jabber:iq:register,+jabber:iq:roster,+presence,+jabber:iq:privacy,+jabber:iq:version,+http://jabber.org/protocol/stats,+starttls,+amp,-msgoffline,+vcard-temp,+http://jabber.org/protocol/commands,+jabber:iq:private,+urn:xmpp:ping,+basic-filter,+domain-filter,-pep,-zlib,+jabber:iq:last
--amp-security-level=NONE
sess-man/plugins-conf/message-archive-xep-0136/required-store-method=body
sess-man/plugins-conf/message-archive-xep-0136/component-jid=message-archive@xmpp.test.com
c2s/processors[s]=urn:xmpp:sm:3
c2s/watchdog_delay[L]=60000
c2s/watchdog_timeout[L]=40000
c2s/watchdog_ping_type=xmpp
c2s/max-inactivity-time[L]=100
--comp-class-6=tigase.archive.MessageArchiveComponent
--comp-name-6=message-archive
--comp-name-5=http
--comp-class-5=tigase.http.HttpMessageReceiver
--api-keys=open_access
--comp-class-1 = tigase.muc.cluster.MUCComponentClustered
--comp-name-1 = muc
muc/default_room_config/muc#roomconfig_persistentroom=true
muc/default_room_config/muc#roomconfig_publicroom=false
muc/default_room_config/muc#roomconfig_membersonly=true
muc/default_room_config/muc#roomconfig_changesubject=true
muc/default_room_config/muc#maxhistoryfetch=500
muc/default_room_config/muc#roomconfig_roomname=Group
muc/muc-allow-chat-states[B]=true

Just to add, I initially used "tigase-server-7.0.1-b3810.jar" console installer to ease the database creation,but I have used and run "tigase-server-7.0.1-b3810-dist-max.tar.gz" version.

Thank you for any help.


Replies (6)

Added by Wojciech Kapcia TigaseTeam almost 4 years ago

Those are the same binaries, only installation method is different.

There may have been a bug in jabber:iq:last plugin which was fixed later on. You can try disabling the plugin in your configuration by prefixing it with @-@, ie.: @-jabber:iq:last@.

Added by Hamid Alimohammadi almost 4 years ago

Thank you very much for help.

After disabling -jabber:iq:last, the above mentioned errors disappeared from my acs-cluster console-log files.

but I noticed there are still occasional CPU high load in Tigase Monitor dashboard, though its occurrence has been lessened.

I thought JDK version might be the issue, so I installed open-JDK in one cluster node and Oracle-JDK in the other cluster node, but again I observed occasional CPU high load in both nodes (not simultaneously).

After some tries I found some java errors in console-log that seems is related to DB like "...at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2302)..."

I have attached DB-related extracted from console-log and a snapshot of Tigase Monitoring CPU Load.

I also disabled some more plugins:

--sm-plugins=+message-archive-xep-0136,+jabber:iq:auth,+urn:ietf:params:xml:ns:xmpp-sasl,+urn:ietf:params:xml:ns:xmpp-bind,+urn:ietf:params:xml:ns:xmpp-session,-jabber:iq:register,+jabber:iq:roster,+presence,+jabber:iq:privacy,-jabber:iq:version,-http://jabber.org/protocol/stats,+starttls,+amp,-msgoffline,+vcard-temp,+http://jabber.org/protocol/commands,+jabber:iq:private,+urn:xmpp:ping,-basic-filter,-domain-filter,-pep,-zlib,-jabber:iq:last

I wonder if you can take a look at attached files to find the reason for occasional CPU loads.

Thanks again for help

Avatar?id=6023&size=32x32

Added by Artur Hefczyc TigaseTeam almost 4 years ago

The log indicates problems with connectivity to the database. This may also cause CPU spikes when the server attempts to reconnect to the DB.

Added by Hamid Alimohammadi almost 4 years ago

Thank you for reply,

I am trying to find the connectivity problem from Tigase to Database. Actually ping tests between two servers are OK, but there must have been another type of problem.

Added by Hamid Alimohammadi almost 4 years ago

As you mentioned the problem is related to our database that causes the CPU loads and errors in console logs.

We found out from your very last sentence in this topic https://projects.tigase.org/boards/4/topics/4642?r=5001

"... you have somewhere, in roster or in offline messages a JID with {invalid} as domain, hence the Tigase attempts to send something to that domain."

After some queries we found many many invalid JIDs in offline messages and rosters in database (probably left from some tsung tests).

Actually I have to keep and fix the current database. I read the following guide but not sure how the invalid JIDs can be deleted:

http://docs.tigase.org/tigase-server/7.0.0/Development_Guide/webhelp/_tigase_db_schema_explained.html

The following is a sample of invalid names "remote-hostname=4989" in the console log.

2015-06-02 13:25:45.910 [ConnectionOpenThread]  ConnectionOpenThread.addAllWaiting()  WARNING: Error: creating connection for: {cert-required-domain=4989, cid=test.com@4989, ifc=[Ljava.lang.String;@44c8a45f, local-hostname=test.com, port-no=5269, remote-hostname=4989, remote-ip=0.0.19.125, s2s-connection-key=S2S: null, socket=plain, srv-type=_xmpp-server._tcp, type=connect}
java.net.SocketException: Invalid argument
        at sun.nio.ch.Net.connect0(Native Method)
        at sun.nio.ch.Net.connect(Net.java:465)
        at sun.nio.ch.Net.connect(Net.java:457)
        at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
        at tigase.net.ConnectionOpenThread.addISA(ConnectionOpenThread.java:352)
        at tigase.net.ConnectionOpenThread.addPort(ConnectionOpenThread.java:373)
        at tigase.net.ConnectionOpenThread.addAllWaiting(ConnectionOpenThread.java:306)
        at tigase.net.ConnectionOpenThread.run(ConnectionOpenThread.java:235)
        at java.lang.Thread.run(Thread.java:745)

Thank you in advance for any hint.

Added by Wojciech Kapcia TigaseTeam almost 4 years ago

You can either remove directly remove roster entries from tig_paris of the users that contains invalid rosters, or remove such users completely by calling TigRemoveUser("jid") stored procedure.

    (1-6/6)