Project

General

Profile

Cluster Configuration Not Working

Balakumar M
Added almost 3 years ago

Our current cluster configuration works perfectly. We are trying to setup tigase in a new environment (in aws), tigase works but cluster nodes are not recognised. cluster_nodes are empty, PSI service discovery doesn't return any cluster nodes. I have attached our configuration and log below. Please advice. We use tigase 7.0.0.

init.properties :

config-type = --gen-config-def
--virt-hosts = im.domain
--user-db-uri = xxxxxxxxxxxxxxxx
--user-db = mysql
--admins = admin@im.domain

--cluster-mode = true
--cluster-nodes= ip-10-0-1-16.ap-southeast-1.compute.internal,ip-10-0-2-17.ap-southeast-1.compute.internal    #(Note : When I remove this line, auto discovery also not working)
--max-queue-size= 500000

--sm-plugins=+urn:xmpp:sm:3,-presence,+message-archive-xep-0136,+jabber:iq:auth,+urn:ietf:params:xml:ns:xmpp-sasl,+urn:ietf:params:xml:ns:xmpp-bind,+urn:ietf:params:xml:ns:xmpp-session,+jabber:iq:register,-jabber:iq:roster,+roster-presence,+jabber:iq:privacy,+jabber:iq:version,+http://jabber.org/protocol/stats,+starttls,+vcard-temp,+http://jabber.org/protocol/commands,+jabber:iq:private,-basic-filter,-domain-filter,-pep,-zlib,+jabber:iq:last

--debug=server,cluster

--comp-name-6 = http
--comp-name-5 = push
--comp-name-4 = pubsub
--comp-name-3 = message-archive
--comp-name-2 = proxy
--comp-name-1 = ext

--comp-class-1 = tigase.server.ext.ComponentProtocol
--external=muc.im.domain:tigase:listen:5270
--comp-class-3 = tigase.archive.MessageArchiveComponent
--comp-class-2 = tigase.socks5.Socks5ProxyComponent
--comp-class-4 = tigase.pubsub.PubSubComponent
--comp-class-5 = com.ncs.onechat.PushComponent
--comp-class-6 = tigase.http.HttpMessageReceiver

LOG (10.0.1.16) :

2016-02-04 17:25:32.560 [scheduler_pool-5-thread-1-cl-comp]  ConnectionManager$1.run()  FINE: Reconnecting service for component: cl-comp, to remote host: ip-10-0-2-17.ap-southeast-1.compute.internal on port: 5277
2016-02-04 17:25:32.560 [scheduler_pool-5-thread-1-cl-comp]  ConnectionManager$1.run()  FINE: Reconnecting service for component: cl-comp, to remote host: ip-10-0-2-17.ap-southeast-1.compute.internal on port: 5277
2016-02-04 17:25:32.562 [ConnectionOpenThread]  ConnectionManager$ConnectionListenerImpl.accept()  FINEST: Accept called for service: null@null, port_props: {socket=plain, port-no=5277, service-connected-task-future=tigase.cluster.ClusterConnectionManager$ServiceConnectedTimerTask@2f9bc4ca, remote-host=ip-10-0-2-17.ap-southeast-1.compute.internal, routing-table=[Ljava.lang.String;@1b94c231, sessionID=7a079e53-d2af-4435-80e2-39ba0894b917, local-host=ip-10-0-1-16.ap-southeast-1.compute.internal, secret=5afd0245d0884704b987d31d6c324feec661b6bd7972fb6ee7283348be305802, hostname-key=ip-10-0-2-17.ap-southeast-1.compute.internal, ifc=[Ljava.lang.String;@28688f7b, type=connect, max-reconnects=99999999}
2016-02-04 17:25:32.604 [ConnectionOpenThread]  ConnectionManager.serviceStarted()  FINER: [[cl-comp]] Connection started: null, type: connect, Socket: nullSocket[addr=ip-10-0-2-17.ap-southeast-1.compute.internal/10.0.2.17,port=5277,localport=33149], jid: null
2016-02-04 17:25:32.604 [ConnectionOpenThread]  ClusterConnectionManager.serviceStarted()  INFO: cluster connection opened: 10.0.2.17, type: connect, id=10.0.1.16_33149_10.0.2.17_5277
2016-02-04 17:25:32.604 [ConnectionOpenThread]  ClusterConnectionManager.serviceStarted()  INFO: cid: null, sending: <stream:stream xmlns='tigase:cluster' xmlns:stream='http://etherx.jabber.org/streams' from='ip-10-0-1-16.ap-southeast-1.compute.internal' to='ip-10-0-2-17.ap-southeast-1.compute.internal'>
2016-02-04 17:25:32.604 [ConnectionOpenThread]  ConnectionManager$ConnectionListenerImpl.accept()  FINEST: Accept called for service: null@null, port_props: {socket=plain, port-no=5277, service-connected-task-future=tigase.cluster.ClusterConnectionManager$ServiceConnectedTimerTask@13a0abcf, remote-host=ip-10-0-2-17.ap-southeast-1.compute.internal, routing-table=[Ljava.lang.String;@196fad1a, sessionID=0e30a0bb-000e-4c51-b5b2-282e968bdf1e, local-host=ip-10-0-1-16.ap-southeast-1.compute.internal, secret=5afd0245d0884704b987d31d6c324feec661b6bd7972fb6ee7283348be305802, hostname-key=ip-10-0-2-17.ap-southeast-1.compute.internal, ifc=[Ljava.lang.String;@454d6695, type=connect, max-reconnects=99999999}
2016-02-04 17:25:32.605 [ConnectionOpenThread]  ClConSQLRepository.reload()   FINEST:   Last reload performed in 43, skipping: 
2016-02-04 17:25:32.605 [ConnectionOpenThread]  ConnectionManager.serviceStarted()  FINER: [[cl-comp]] Connection started: null, type: connect, Socket: nullSocket[addr=ip-10-0-2-17.ap-southeast-1.compute.internal/10.0.2.17,port=5277,localport=33150], jid: null
2016-02-04 17:25:32.605 [ConnectionOpenThread]  ClusterConnectionManager.serviceStarted()  INFO: cluster connection opened: 10.0.2.17, type: connect, id=10.0.1.16_33150_10.0.2.17_5277
2016-02-04 17:25:32.605 [ConnectionOpenThread]  ClusterConnectionManager.serviceStarted()  INFO: cid: null, sending: <stream:stream xmlns='tigase:cluster' xmlns:stream='http://etherx.jabber.org/streams' from='ip-10-0-1-16.ap-southeast-1.compute.internal' to='ip-10-0-2-17.ap-southeast-1.compute.internal'>
2016-02-04 17:25:32.605 [ConnectionOpenThread]  ConnectionManager$ConnectionListenerImpl.accept()  FINEST: Accept called for service: null@null, port_props: {type=accept, socket=plain, ifc=[Ljava.lang.String;@e8fb7a8, remote-host=localhost, required=false, port-no=5277}
2016-02-04 17:25:32.605 [ConnectionOpenThread]  ClConSQLRepository.reload()   FINEST:   Last reload performed in 43, skipping: 
2016-02-04 17:25:32.605 [ConnectionOpenThread]  ConnectionManager.serviceStarted()  FINER: [[cl-comp]] Connection started: null, type: accept, Socket: nullSocket[addr=/10.0.2.17,port=49058,localport=5277], jid: null
2016-02-04 17:25:32.605 [ConnectionOpenThread]  ClusterConnectionManager.serviceStarted()  INFO: cluster connection opened: 10.0.2.17, type: accept, id=10.0.1.16_5277_10.0.2.17_49058
2016-02-04 17:25:32.605 [ConnectionOpenThread]  ConnectionManager$ConnectionListenerImpl.accept()  FINEST: Accept called for service: null@null, port_props: {type=accept, socket=plain, ifc=[Ljava.lang.String;@e8fb7a8, remote-host=localhost, required=false, port-no=5277}
2016-02-04 17:25:32.605 [ConnectionOpenThread]  ClConSQLRepository.reload()   FINEST:   Last reload performed in 43, skipping: 
2016-02-04 17:25:32.606 [ConnectionOpenThread]  ConnectionManager.serviceStarted()  FINER: [[cl-comp]] Connection started: null, type: accept, Socket: nullSocket[addr=/10.0.2.17,port=49059,localport=5277], jid: null
2016-02-04 17:25:32.606 [ConnectionOpenThread]  ClusterConnectionManager.serviceStarted()  INFO: cluster connection opened: 10.0.2.17, type: accept, id=10.0.1.16_5277_10.0.2.17_49059
2016-02-04 17:25:32.625 [pool-18-thread-9]  ClusterConnectionManager.xmppStreamOpened()  INFO: Stream opened: {to=ip-10-0-1-16.ap-southeast-1.compute.internal, xmlns:stream=http://etherx.jabber.org/streams, from=ip-10-0-2-17.ap-southeast-1.compute.internal, xmlns=tigase:cluster}, service: null, type: accept, Socket: nullSocket[addr=/10.0.2.17,port=49058,localport=5277], jid: null
2016-02-04 17:25:32.625 [pool-18-thread-9]  ClusterConnectionManager.updateConnectionDetails()  FINEST: ClusterRepoItem: null, port_props: {port-no=5277, socket=plain, service-connected-task-future=tigase.cluster.ClusterConnectionManager$ServiceConnectedTimerTask@42a1b124, remote-host=ip-10-0-2-17.ap-southeast-1.compute.internal, routing-table=[Ljava.lang.String;@574d4a9, sessionID=91a674f9-863f-41bd-8d4b-f9b42870c94d, hostname-key=ip-10-0-2-17.ap-southeast-1.compute.internal, ifc=[Ljava.lang.String;@e8fb7a8, type=accept, required=false, max-reconnects=0}
2016-02-04 17:25:32.626 [pool-18-thread-2]  ClusterConnectionManager.xmppStreamOpened()  INFO: Stream opened: {to=ip-10-0-1-16.ap-southeast-1.compute.internal, xmlns:stream=http://etherx.jabber.org/streams, from=ip-10-0-2-17.ap-southeast-1.compute.internal, xmlns=tigase:cluster}, service: null, type: accept, Socket: nullSocket[addr=/10.0.2.17,port=49059,localport=5277], jid: null
2016-02-04 17:25:32.626 [pool-18-thread-2]  ClusterConnectionManager.updateConnectionDetails()  FINEST: ClusterRepoItem: null, port_props: {port-no=5277, socket=plain, service-connected-task-future=tigase.cluster.ClusterConnectionManager$ServiceConnectedTimerTask@71283984, remote-host=ip-10-0-2-17.ap-southeast-1.compute.internal, routing-table=[Ljava.lang.String;@7d84326, sessionID=a56a56c8-5314-4acf-8625-5693c7349897, hostname-key=ip-10-0-2-17.ap-southeast-1.compute.internal, ifc=[Ljava.lang.String;@e8fb7a8, type=accept, required=false, max-reconnects=0}
2016-02-04 17:25:32.627 [pool-18-thread-6]  ConnectionManager.serviceStopped()  FINER:  [[cl-comp]] Connection stopped: null, type: accept, Socket: nullSocket[unconnected], jid: null
2016-02-04 17:25:32.627 [pool-18-thread-6]  ClusterConnectionManager.serviceStopped()  FINEST: serviceStopped: result=true / size=0 / connPool={ip-10-0-2-17.ap-southeast-1.compute.internal=[]} / serv=null, type: accept, Socket: nullSocket[unconnected], jid: null / conns=[] / type=accept
2016-02-04 17:25:32.627 [pool-18-thread-4]  ClusterConnectionManager.xmppStreamOpened()  INFO: Stream opened: {id=1607f9af-4556-4cd9-8895-fd4f236c8357, to=ip-10-0-1-16.ap-southeast-1.compute.internal, xmlns:stream=http://etherx.jabber.org/streams, from=ip-10-0-2-17.ap-southeast-1.compute.internal, xmlns=tigase:cluster}, service: null, type: connect, Socket: nullSocket[addr=ip-10-0-2-17.ap-southeast-1.compute.internal/10.0.2.17,port=5277,localport=33149], jid: null
2016-02-04 17:25:32.627 [pool-18-thread-4]  ConnectionManager.serviceStopped()  FINER:  [[cl-comp]] Connection stopped: null, type: connect, Socket: nullSocket[unconnected], jid: null
2016-02-04 17:25:32.627 [pool-18-thread-4]  ClusterConnectionManager.serviceStopped()  FINEST: serviceStopped: result=true / size=0 / connPool={ip-10-0-2-17.ap-southeast-1.compute.internal=[]} / serv=null, type: connect, Socket: nullSocket[unconnected], jid: null / conns=[] / type=connect
2016-02-04 17:25:32.627 [pool-18-thread-4]  ConnectionManager.reconnectService()  FINER: Reconnecting service for: cl-comp, scheduling next try in 5secs, cid: null@null, props: {port-no=5277, socket=plain, service-connected-task-future=tigase.cluster.ClusterConnectionManager$ServiceConnectedTimerTask@6499fc57, remote-host=ip-10-0-2-17.ap-southeast-1.compute.internal, routing-table=[Ljava.lang.String;@3985e75a, sessionID=1607f9af-4556-4cd9-8895-fd4f236c8357, local-host=ip-10-0-1-16.ap-southeast-1.compute.internal, secret=5afd0245d0884704b987d31d6c324feec661b6bd7972fb6ee7283348be305802, hostname-key=ip-10-0-2-17.ap-southeast-1.compute.internal, ifc=[Ljava.lang.String;@28688f7b, type=connect, max-reconnects=99999999}
2016-02-04 17:25:32.628 [pool-18-thread-5]  ClusterConnectionManager.xmppStreamOpened()  INFO: Stream opened: {id=4c72de96-b70e-4691-bc38-da4e097dd64f, to=ip-10-0-1-16.ap-southeast-1.compute.internal, xmlns:stream=http://etherx.jabber.org/streams, from=ip-10-0-2-17.ap-southeast-1.compute.internal, xmlns=tigase:cluster}, service: null, type: connect, Socket: nullSocket[addr=ip-10-0-2-17.ap-southeast-1.compute.internal/10.0.2.17,port=5277,localport=33150], jid: null
2016-02-04 17:25:32.628 [pool-18-thread-7]  ConnectionManager.serviceStopped()  FINER:  [[cl-comp]] Connection stopped: null, type: accept, Socket: nullSocket[unconnected], jid: null
2016-02-04 17:25:32.628 [pool-18-thread-7]  ClusterConnectionManager.serviceStopped()  FINEST: serviceStopped: result=true / size=0 / connPool={ip-10-0-2-17.ap-southeast-1.compute.internal=[]} / serv=null, type: accept, Socket: nullSocket[unconnected], jid: null / conns=[] / type=accept
2016-02-04 17:25:32.628 [pool-18-thread-5]  ConnectionManager.serviceStopped()  FINER:  [[cl-comp]] Connection stopped: null, type: connect, Socket: nullSocket[unconnected], jid: null
2016-02-04 17:25:32.628 [pool-18-thread-5]  ClusterConnectionManager.serviceStopped()  FINEST: serviceStopped: result=true / size=0 / connPool={ip-10-0-2-17.ap-southeast-1.compute.internal=[]} / serv=null, type: connect, Socket: nullSocket[unconnected], jid: null / conns=[] / type=connect
2016-02-04 17:25:32.628 [pool-18-thread-5]  ConnectionManager.reconnectService()  FINER: Reconnecting service for: cl-comp, scheduling next try in 5secs, cid: null@null, props: {port-no=5277, socket=plain, service-connected-task-future=tigase.cluster.ClusterConnectionManager$ServiceConnectedTimerTask@617e337, remote-host=ip-10-0-2-17.ap-southeast-1.compute.internal, routing-table=[Ljava.lang.String;@a463301, sessionID=4c72de96-b70e-4691-bc38-da4e097dd64f, local-host=ip-10-0-1-16.ap-southeast-1.compute.internal, secret=5afd0245d0884704b987d31d6c324feec661b6bd7972fb6ee7283348be305802, hostname-key=ip-10-0-2-17.ap-southeast-1.compute.internal, ifc=[Ljava.lang.String;@454d6695, type=connect, max-reconnects=99999999}

Replies (6)

Added by Balakumar M almost 3 years ago

Additional Info : connection and hostname seems don't have any issue.

Debug Info :

[ec2-user@ip-10-0-1-16 ~]$ hostname
ip-10-0-1-16.ap-southeast-1.compute.internal
[ec2-user@ip-10-0-1-16 ~]$ telnet ip-10-0-2-17.ap-southeast-1.compute.internal  5277
Trying 10.0.2.17...
Connected to ip-10-0-2-17.ap-southeast-1.compute.internal.
Escape character is '^]'.
^CConnection closed by foreign host.


[ec2-user@ip-10-0-2-17 ~]$ hostname
ip-10-0-2-17.ap-southeast-1.compute.internal
[ec2-user@ip-10-0-2-17 ~]$ telnet ip-10-0-1-16.ap-southeast-1.compute.internal 5277
Trying 10.0.1.16...
Connected to ip-10-0-1-16.ap-southeast-1.compute.internal.
Escape character is '^]'.
^CConnection closed by foreign host.



[ec2-user@ip-10-0-1-16 ~]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

10.0.1.16 ip-10-0-1-16.ap-southeast-1.compute.internal
10.0.2.17 ip-10-0-2-17.ap-southeast-1.compute.internal



[ec2-user@ip-10-0-2-17 ~]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

10.0.1.16 ip-10-0-1-16.ap-southeast-1.compute.internal
10.0.2.17 ip-10-0-2-17.ap-southeast-1.compute.internal

Added by Wojciech Kapcia TigaseTeam almost 3 years ago

Are the times on both machines synchronised and in the same timezone? Can you check all the logs for the informations about reloading cluster nodes configuration (ideally with removed --cluster-nodes= entry). Does the entries in cluster_nodes show up when only single node is running?

Added by Balakumar M almost 3 years ago

Yes both time and timezone are same. When single node running, it says connection refused with other node. I setup a default tigase and tried it again.

With default config (without @--cluster-nodes=@), there is no sign of other node in the log. I have following warning for same node,

Config

--comp-class-1 = tigase.muc.MUCComponent
--virt-hosts = im.domain
--user-db-uri = xxxxxxxxxxxxxxxxx
--user-db = mysql
--admins = admin@im.domain
--comp-name-4 = message-archive
--comp-name-3 = proxy
config-type = --gen-config-def
--comp-name-2 = pubsub
--comp-name-1 = muc
--cluster-mode = true
--sm-plugins = +message-archive-xep-0136
--debug = server
--comp-class-4 = tigase.archive.MessageArchiveComponent
--comp-class-3 = tigase.socks5.Socks5ProxyComponent
--comp-class-2 = tigase.pubsub.PubSubComponent

LOG

2016-02-04 19:43:00.557 [main]             ClusterConnectionManager.itemAdded()  WARNING: Incorrect ClusterRepoItem, skipping connection attempt: ip-10-0-1-16.ap-southeast-1.compute.internal:07f1d104c8bc9b1a892cf2dd1d52e5a4fdddb4d76a97a02d80b0aa074520c78c:5277:0:0.0:0.0

[ec2-user@ip-10-0-1-16 ~]$ hostname
ip-10-0-1-16.ap-southeast-1.compute.internal

When i add --cluster-nodes= in the config, getting NullPointerException as below

Config

--cluster-nodes= ip-10-0-1-16.ap-southeast-1.compute.internal,ip-10-0-2-17.ap-southeast-1.compute.internal
(or)
--cluster-nodes=ip-10-0-1-16.ap-southeast-1.compute.internal:00e649c8de086c1d4c381b90e7eb290e5b34a2b09c2bbe0560592b953c1f787d:5277,ip-10-0-2-17.ap-southeast-1.compute.internal:256aa5f455e08a7c2a030dd0fc5ad2e731d93e2655062b58455ea6491410f1a1:5277

LOG :

2016-02-04 19:38:21.779 [pool-15-thread-9]  ClusterConnectionManager.xmppStreamOpened()  INFO: Stream opened: {id=ed631115-96c1-48de-a700-52635eff650b, to=ip-10-0-1-16.ap-southeast-1.compute.internal, xmlns:stream=http://etherx.jabber.org/streams, from=ip-10-0-2-17.ap-southeast-1.compute.internal, xmlns=tigase:cluster}, service: null, type: connect, Socket: nullSocket[addr=ip-10-0-2-17.ap-southeast-1.compute.internal/10.0.2.17,port=5277,localport=46064], jid: null
2016-02-04 19:38:21.780 [pool-15-thread-9]  XMPPIOService.processSocketData()  INFO:    null, type: connect, Socket: nullSocket[addr=ip-10-0-2-17.ap-southeast-1.compute.internal/10.0.2.17,port=5277,localport=46064], jid: null, Incorrect XML data: <stream:stream xmlns='tigase:cluster' xmlns:stream='http://etherx.jabber.org/streams' from='ip-10-0-2-17.ap-southeast-1.compute.internal' to='ip-10-0-1-16.ap-southeast-1.compute.internal' id='ed631115-96c1-48de-a700-52635eff650b'>, stopping connection: null, exception: 
java.lang.NullPointerException
    at tigase.cluster.ClusterConnectionManager.xmppStreamOpened(ClusterConnectionManager.java:573)
    at tigase.xmpp.XMPPIOService.xmppStreamOpened(XMPPIOService.java:801)
    at tigase.xmpp.XMPPDomBuilderHandler.startElement(XMPPDomBuilderHandler.java:280)
    at tigase.xml.SimpleParser.parse(SimpleParser.java:314)
    at tigase.xmpp.XMPPIOService.processSocketData(XMPPIOService.java:686)
    at tigase.net.IOService.call(IOService.java:252)
    at tigase.net.IOService.call(IOService.java:94)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
2016-02-04 19:38:21.780 [pool-15-thread-9]  ConnectionManager.serviceStopped()  FINER:  [[cl-comp]] Connection stopped: null, type: connect, Socket: nullSocket[unconnected], jid: null

Added by Wojciech Kapcia TigaseTeam almost 3 years ago

After WARNING: Incorrect ClusterRepoItem, skipping connection attempt there should be exception - in this case it's either UnknownHostException or SocketException which would indicate problems with the network.

As I asked earlier: Does the entries in cluster_nodes show up when only single node is running? That is, configure single node with cluster auto-discovery and then check if the cluster_nodes table is correctly propagated.

To get more information I would recommend extending debugging with cluster package, i.e.:

--debug = server,cluster

Added by Balakumar M almost 3 years ago

Thanks Wojciech Kapcia. I managed to find out the issue. My new setup uses MySQL in AWS RDS where parameter timezone is default which is UTC. But my app servers are configured to use local timezone.

When I configure timezone to local timezone in RDS, cluster works as expected.

Added by Wojciech Kapcia TigaseTeam almost 3 years ago

Just FYI latest nightlies fixed that problem and timezone differences won't break cluster auto-discovery.

    (1-6/6)