Project

General

Profile

crash in cluster mode

firstime firstime
Added almost 5 years ago

Hello, everyone:

I've got a problem. When i enable cluster mode, then tigase crashed, and error log is following:
2014-06-09 11:39:55.271 [pool-13-thread-1]  XMPPIOService.processSocketData()  INFO:    null, type: connect, Socket: nullSocket[addr=left/192.168.2.232,port=5277,localport=22752], jid: null, Incorrect XML data: <stream:stream xmlns='tigase:cluster' xmlns:stream='http://etherx.jabber.org/streams' from='left' to='myhost' id='5d8e4851-6cdf-4c47-8fcf-e2036d5d2e65'>, stopping connection: null, exception: 
java.lang.NullPointerException
        at tigase.cluster.ClusterConnectionManager.xmppStreamOpened(ClusterConnectionManager.java:684)
        at tigase.xmpp.XMPPIOService.xmppStreamOpened(XMPPIOService.java:775)
        at tigase.xmpp.XMPPDomBuilderHandler.startElement(XMPPDomBuilderHandler.java:316)
        at tigase.xml.SimpleParser.parse(SimpleParser.java:314)
        at tigase.xmpp.XMPPIOService.processSocketData(XMPPIOService.java:655)
        at tigase.net.IOService.call(IOService.java:262)
        at tigase.net.IOService.call(IOService.java:1)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
I checked the source code, the No. 684 line is:
        String          secret = item.getPassword();
The crash is due to item being null. The item is form the No. 683 line:
        ClusterRepoItem item   = repo.getItem(getDefHostName().getDomain());
And, repo is initialized in _getDefaults()_ firstly, Now, the _getItem()_ is ok, But the repo initialized in _setProperties()_ secondly. After setProperties(), the _getItem()_ return null. How can i do ? thanks a lot.

Replies (11)

Added by firstime firstime almost 5 years ago

My version is 5.2.0

Added by Wojciech Kapcia TigaseTeam almost 5 years ago

Please share your etc/init.properties configuration. Have you changed any configuration regarding repository?

Added by firstime firstime almost 5 years ago

config-type=--gen-config-def
--admins=211567@wewe.66call.com
--virt-hosts = wewe.66call.com
--debug=server
--cluster-connect-all = true
--cluster-mode = true
--cluster-nodes= left
--cl-conn-repo-class      = tigase.cluster.wewe.ClConSQLRepository

--auth-db                 = tigase.db.wewe.WeWeAuthRepository
--auth-db-uri             = wewe
--user-db                 = tigase.db.wewe.WeWeUserRepository
--user-db-uri             = oracle
--com.mchange.v2.c3p0.cfg.xml = etc/c3p0-config.xml
--data-repo               = tigase.db.wewe.WeWeDataRepository
--data-repo-pool-size     = 1

--sm-plugins = -jabber:iq:roster,-jabber:iq:private,-jabber:iq:privacy,-presence,-zlib,-amp,-message-carbons,+weweMsg,+weweAVCall,+jabber:iq:contactstats2

the tigase.cluster.wewe.ClConSQLRepository is copy of tigase.cluster.repo.ClConConfigRepository, And, i remove the storeItem() and reload() because i don't need access any database.

Added by Wojciech Kapcia TigaseTeam almost 5 years ago

firstime firstime wrote:

the tigase.cluster.wewe.ClConSQLRepository is copy of tigase.cluster.repo.ClConConfigRepository, And, i remove the storeItem() and reload() because i don't need access any database.

OK, but you haven't specified any cluster node and cluster auto-discovery uses database to discover nodes hence in such setup it won't work. You can try specifying list of nodes using --cluster-nodes with included password.

The repo from getDefaults uses default implementation hence no error.

Added by firstime firstime almost 5 years ago

No, I have specified cluster node, you missed it:

--cluster-nodes= left

Added by Wojciech Kapcia TigaseTeam almost 5 years ago

OK, my bad - missed it. But please include password in your --cluster-nodes configuration and try again.

Added by firstime firstime almost 5 years ago

Which password ? the root's for login the mathine ? In addition, why there is no need for password in old version, e.g. 5.1.0 ?

Added by firstime firstime almost 5 years ago

I modify the storeItem() and reload() in class tigase.cluster.wewe.ClConSQLRepository, which is copy of tigase.cluster.repo.ClConConfigRepository, as following:

    @Override
    public void storeItem(ClusterRepoItem item) {
        items.add(item);
    }

    @Override
    public void reload() {
        super.reload();

        Iterator<ClusterRepoItem> iter  = items.iterator();
        while(iter.hasNext()){
            itemLoaded(iter.next());
        }
    }

And, the variable items is:

private static final LinkedList<ClusterRepoItem> items  = new LinkedList<ClusterRepoItem>();

It's not crash, But there is another error report as following and cluster doesn't work:

ClusterConnectionManager.processHandshake()  WARNING: Handshaking password doesn't match, disconnecting: null, type: accept, Socket: nullSocket[addr=/192.168.2.232,port=63005,localport=5277], jid: null

Added by Wojciech Kapcia TigaseTeam almost 5 years ago

firstime firstime wrote:

Which password ? the root's for login the mathine ? In addition, why there is no need for password in old version, e.g. 5.1.0 ?

Again, please read --cluster-nodes documentation.

In previous versions cluster nodes used same password. With 5.2.0, if you don't specify it, each node will generate it's own password and try to use it for handshaking while establishing cluster connection, but because you disabled storing this information into common database then there is no way that other nodes will know the password hence the connection will fail.

You can use:

--cluster-nodes=left:pass,right:pass

Added by firstime firstime almost 5 years ago

I append password onto cluster node, But there is some problem yet. Now, i use three servers.

                    * One ***********

/etc/hosts is:

127.0.0.1   left    localhost.localdomain   localhost
::1     localhost   localhost6.localdomain6 localhost6

192.168.2.228    myhost
192.168.2.170    hlbgp-170.66call.com

/etc/sysconfig/network is:

NETWORKING=yes
HOSTNAME=left

init.properties is:

config-type=--gen-config-def
--admins=211567@wewe.66call.com
--virt-hosts = wewe.66call.com
--debug=server
--cluster-connect-all = true
--cluster-mode = true
--cluster-nodes= left:wewe,myhost:wewe,hlbgp-170.66call.com:wewe
--cl-conn-repo-class      = tigase.cluster.wewe.ClConSQLRepository

--auth-db                 = tigase.db.wewe.WeWeAuthRepository
--auth-db-uri             = wewe
--auth-memcache           = 192.168.2.232:11211
--auth-memcache-pool-size = 10
--user-db                 = tigase.db.wewe.WeWeUserRepository
--user-db-uri             = oracle
--com.mchange.v2.c3p0.cfg.xml = etc/c3p0-config.xml
--user-db-redis-host      = 192.168.2.232
--user-db-redis-port      = 6379
--user-db-redis-db        = 2
--user-db-redis-timeout   = 1
--user-db-redis-maxactive = 100
--user-db-redis-minidle   = 10
--user-db-redis-maxidle   = 50
--data-repo               = tigase.db.wewe.WeWeDataRepository
--data-repo-pool-size     = 1

                    * Two ***********

/etc/hosts is:

192.168.0.3 localhost
127.0.0.1   localhost.localdomain   localhost
::1 myhost  localhost6.localdomain6 localhost6

192.168.2.232    left
192.168.2.170    hlbgp-170.66call.com

/etc/sysconfig/network is:

NETWORKING=yes
HOSTNAME=myhost

init.properties is same as the One

                    * Three ***********

/etc/hosts is:

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.2.232    left
192.168.2.228    myhost

/etc/sysconfig/network is:

NETWORKING=yes
HOSTNAME=hlbgp-170.66call.com

init.properties is same as the One

The problem is, the cluster between the One and the Two is ok, but the Three is failed. The log on the Three shows the crash yet:

ClusterConnectionManager.xmppStreamOpened()  INFO: Stream opened: {id=ad77c43a-5b9e-46b4-9370-8fcb1334cf59, to=localhost, xmlns:stream=http://etherx.jabber.org/streams, from=myhost, xmlns=tigase:cluster}
XMPPIOService.processSocketData()  INFO:    null, type: connect, Socket: nullSocket[addr=left/192.168.2.232,port=5277,localport=36643], jid: null, Incorrect XML data: <stream:stream xmlns='tigase:cluster' xmlns:stream='http://etherx.jabber.org/streams' from='left' to='localhost' id='acf6a64e-6626-4957-bd94-c9ad42e2492e'>, stopping connection: null, exception: 
java.lang.NullPointerException
        at tigase.cluster.ClusterConnectionManager.xmppStreamOpened(ClusterConnectionManager.java:684)
        at tigase.xmpp.XMPPIOService.xmppStreamOpened(XMPPIOService.java:775)
        at tigase.xmpp.XMPPDomBuilderHandler.startElement(XMPPDomBuilderHandler.java:316)
        at tigase.xml.SimpleParser.parse(SimpleParser.java:314)
        at tigase.xmpp.XMPPIOService.processSocketData(XMPPIOService.java:655)
        at tigase.net.IOService.call(IOService.java:262)
        at tigase.net.IOService.call(IOService.java:1)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

The log on the One and the Two shows following:

ClusterConnectionManager.processHandshake()  WARNING: Remote hostname not found in local configuration or time difference between cluster nodes is too big. Connection not accepted: null, type: accept, Socket: nullSocket[addr=/192.168.2.170,port=36311,localport=5277], jid: null
ConnectionManager.serviceStopped()  FINER:  [[cl-comp]] Connection stopped: null, type: accept, Socket: nullSocket[unconnected], jid: null

I print the variable repo before tigase.cluster.ClusterConnectionManager.xmppStreamOpened at the Three, it shows there is no localhost node, but the result of getDefHostName().getDomain() is localhost, so crash happend:

repo = {hlbgp-170.66call.com=hlbgp-170.66call.com:wewe:5277:0:0.0:0.0, left=left:wewe:5277:0:0.0:0.0, myhost=myhost:wewe:5277:0:0.0:0.0}

Added by Wojciech Kapcia TigaseTeam almost 5 years ago

Yet again, please read --cluster-nodes documentation:

Please note the proper DNS configuration is critical for the cluster to work correctly. Make sure the 'hostname' command returns a full DNS name on each cluster node. Nodes don't have to be in the same network although good network connectivity is also a critical element for an effective cluster performance.

Tigase use current hostname of the machine to identify the cluster node instance and components within such instance. You should configure OS on them to return such names when you execute $ hostname and $ hostname -f as well as make them accessible to each other using such name.

During server startup one of the first entries in logs/tigase-console.log is entry related to the hostname (i.e. DNSResolver.<clinit>() … - please check it and verify that the name is ok.

    (1-11/11)