MUC Clustering & failover scenario
Added over 5 years ago
I have two server instances (S1 and S2) and two MUC instances (M1 and M2).
Say to start with only S1 and M1 are running and users connect to S1 and a room R1 is created on M1.
Then S2 and M2 come up. At this point, the routing goes wrong and the MUC requests for R1 get forwarded to M2 somehow and I get a item not found from the MUC back to the server and an error gets sent to the client.
I know that the MUC instances do not communicate with each other. And when a given MUC instance goes down, all rooms on it are literally dead and the users would not be able to communicate in those rooms. But this scenario which I am testing is different and could happen during a failover, etc.
Regardless of the order in which the MUCs get registered or unregistered on the server(connections), how does the server ensure that it picks the correct MUC to service a room request? As long as the MUC instance where the room resides is running, the server should be able to identify the correct MUC instance for that room, regardless of the list of MUCs that it knows about.
Added by Anonymous over 5 years ago
Artur, Thanks for your response.
I was trying to figure out how to implement the required clustering support, so that we dont end up with Duplicate rooms when MUC instances come up(in addition to at least one existing active one). Dup rooms are a major problem and there is no way to address it right now, other than restart all mucs when adding a muc to the cluster.
The only way I think, to be consistent with the Server to server clustering model, is to persist the knowledge of the room location in memory on every server.
We currently have a "connections" map on the Server's ComponentProtocol(I dont know how it is all hooked up, but know this info from some minor groundwork) class instance which maintains the muc domain to a list of MUC connections mapping.
I am thinking if we introduce a new data structure(may be another map), which holds the MUC connection to room names mapping, we should be able to avoid the dup. room problem to a large extent. The assumption or the goal would be sync up all the servers when ever a new "unrecognized" room packet is being processed, in which case we will use one of the LB classes to decide where the room "should be present" and then add this new room entry to its own map and also send a cluster packet to the other server nodes its connected with. And at the same time, forwards the muc packet on that MUC connection.
From there on, any stanzas with this room name on any server from any user should be routed only to the MUC which "will have" this room. When the MUC instance hosting this room, goes down, we cant do much about it. Then its back to square one, which is ok.
When a server instance is brought it, it can either query one existing server which it connects to or query all the server nodes that it connects to(using cluster command? packet). This way it gets all the conn-room mapping and would be able to service MUC requests properly.
What do you think of this approach? I dont know how feasible it is in the existing component layout, but am hoping that atleast my thinking is inline with what you guys would think when addressing this issue.
Can you tell me if I can create Custom Command packet (i) to send a new conn-room mapping to all server nodes (ii) to query one or all server nodes to which a server node establishes a cluster connection with.
And Is there an example of how these packets can be received and processed on the receiving server node and the data saved/retrieved from the ComponentProtocol instance?
I would like to take a stab at it, if you think this approach would make sense and if it is possible without deviating much from the current design and not messing too much with the component interaction.
Thanks a lot.
Added by Artur Hefczyc over 5 years ago
There is an API for this in Tigase! ;-)
I mean there is an API for knowing in your component about other cluster nodes, components on these nodes and exchanging information between cluster nodes (MUC components on all cluster nodes for example). It works a bit like RMI. Have a look at the new_cluster_api branch and SessionManagerClustered.java in particular.
This API is mostly available form version 5.1.x but the code in a new branch is using it more extensively.
I do not fully understand your concept but I can tell you how I imagine the clustering implementation for MUC using Tigase existing API. It would be more or less similar to clustering implementation in SessionManager.
You do not modify existing MUC code or change data structure
Instead you create a new class MUCComponentClustered which extends existing MUC component class and this new class has to implement: ClusteredComponentIfc.
This new class controls data flow in such a way that it can send notifications to other cluster nodes about some events, such as: MUC room created, somebody joined a room, somebody left a room, somebody has posted a message to a room
MUC components on other cluster nodes can do something about the notifications, they can even keep a copy of each room on each node (or on some nodes only)
So the internal MUC logic stays the same but you just control the data flow. You can redirect all data related to some MUC rooms to selected cluster nodes or decide to send them to all nodes.