【原创】RabbitMQ官网文档翻译 -- Clustering Guide

原创
2012/12/03 12:02
阅读数 6K

      为了方便工作中使用,自己花费了周末空闲的时间对 RabbitMQ 的集群配置相关文档进行了翻译,鉴于自己水平有限,翻译中难免有纰漏产生,如果疑问,欢迎指出探讨。此文以中英对照方式呈现。

官方原文:http://www.rabbitmq.com/clustering.html

============== 我是分割线 ================

Clustering Guide
集群配置


A RabbitMQ broker is a logical grouping of one or several Erlang nodes, each running the RabbitMQ application and sharing users, virtual hosts, queues, exchanges, etc. Sometimes we refer to the collection of nodes as a cluster.

RabbitMQ 中的 broker 是指一个或多个 Erlang node 的逻辑分组,每个 node 上面都运行 RabbitMQ 应用程序并且共享 user、vhost、queue、exchange 等。通常我们将 node 的集合称之为集群 cluster


All data/state required for the operation of a RabbitMQ broker is replicated across all nodes, for reliability and scaling, with full ACID properties. An exception to this are message queues, which by default reside on the node that created them, though they are visible and reachable from all nodes. To replicate queues across nodes in a cluster, see the documentation on high availability (note that you will need a working cluster first).

运行 RabbitMQ broker 所需的全部 data/state 数据在所有 node 中均是可复制的,一方面是为了满足可靠性,另一方面是为了满足可扩展性,且符合 ACID 的要求。但是存在一个 例外情况 是针对 message queue 的,其默认是仅存在于创建它的那个 node 上面,尽管其同时对于所有其他 node 是可见和可达的。为了在 cluster 中的所有 node 上复制某个 queue 的内容,参考 [ high availability ] 相关文档(首先你可能需要一个建立可用的 cluster)


RabbitMQ clustering does not tolerate network partitions well, so it should not be used over a WAN. The shovel or federation plugins are better solutions for connecting brokers across a WAN.

RabbitMQ clustering 不能很好处理网络分裂的问题,故 RabbitMQ cluster 不应该用在 WAN 上。[shovel] 或者 [federation] 插件是用于 WAN 上的连接 broker 的比较好的解决方法。


network partition 即网络分裂。是指在系统中的任何两个分组之间的所有网络连接同时发生故障后所出现的情况。发生这种情况时,分裂的系统双方都会从本方一侧重新启动应用程序,进而导致重复服务或裂脑。如果一个群集中配置的两个独立系统具有对指定资源(通常是文件系统或卷)的独占访问权限,则会发生裂脑情况。由网络分裂造成的最为严重的问题是它会影响共享磁盘上的数据。


The composition of a cluster can be altered dynamically. All RabbitMQ brokers start out as running on a single node. These nodes can be joined into clusters, and subsequently turned back into individual brokers again.

cluster 的构成是可以动态改变的。 所有 RabbitMQ broker 在最初启动时都是从单独一个 node 上开始的。 这些 node 可以加入到同一个 cluster 中,之后还可以重新加回到不同的 broker 中。


RabbitMQ brokers tolerate the failure of individual nodes. Nodes can be started and stopped at will.

RabbitMQ broker 对单个 node 的失效是可以容忍的,node 可以随意地启动或者停止。


A node can be a disk node or a RAM node. (Note: disk and disc are used interchangeably. Configuration syntax or status messages normally use disc.) RAM nodes keep their state only in memory (with the exception of queue contents, which can reside on disc if the queue is persistent or too big to fit in memory). Disk nodes keep state in memory and on disk. As RAM nodes don't have to write to disk as much as disk nodes, they can perform better. However, not that since the queue data is always stored on disc, the performance improvements will affect only resources management (e.g. adding/removing queues, exchanges, or vhosts), but not publishing or consuming speed. Because state is replicated across all nodes in the cluster, it is sufficient (but not reccomended) to have just one disk node within a cluster, to store the state of the cluster safely.

node 的类型分为磁盘(disk) node 或者是内存(RAM) node 两种。(注:磁盘间可以相互替换,配置语法或者状态消息通常使用磁盘 node 进行存储) 内存 node 只在内存中保存状态信息(除了 queue 内容的特殊情况,即如果将 queue 的属性设置为 persistent 或者出现要存放的数据量太大不适合放在内存中的情况时,queue 中的内容会被存放到磁盘上)。 磁盘 node 同时在内存和磁盘上保存状态信息;而内存 node 不像磁盘 node 那样必须在磁盘上保存信息,故内存 node 具有更高效的性能。然而,并不是说 因为 queue 数据 总是保存在 disk 上, 所以只有资源管理(例如,增加/删除 quque 、exchange 或者 vhost) 才能够对性能提高产生影响, 还要考虑 publishing 和 consuming 速度的影响。 因为状态信息会在 cluster 包含的所有 node 中是可以进行复制,所以在一个 cluster 中只配置一个磁盘 node 便足够安全存储 cluster 的状态信息(但并不是说建议一定要这样做)。


Clustering transcript
集群配置操作示范


The following is a transcript of setting up and manipulating a RabbitMQ cluster across three machines - rabbit1, rabbit2, rabbit3, with two of the machines replicating data on ram and disk, and the other replicating data in ram only.

下面是一份建立和操控 RabbitMQ cluster 的示范。其中包括 3 台机器 - rabbit1,rabbit2,rabbit3,其中两台机器采用磁盘 node 方式复制数据,一台机器采用内存 node 方式复制数据。


We assume that the user is logged into all three machines, that RabbitMQ has been installed on the machines, and that the rabbitmq-server and rabbitmqctl scripts are in the user's PATH.

假定用户已经登录到全部 3 台已经安装好 RabbitMQ 的机器上了,并且 rabbitmq-server 和 rabbitmqctl 命令行脚本已经在系统路径 PATH 中配置好了。 


Initial setup
初始安装


Erlang nodes use a cookie to determine whether they are allowed to communicate with each other - for two nodes to be able to communicate they must have the same cookie.

Erlang node 使用 cookie 值来确定 node 间是否允许相互通信 - 两个 node 能够相互通信的前提是他们必须拥有相同的 cookie 值。


The cookie is just a string of alphanumeric characters. It can be as long or short as you like.

cookie 的值就是一串由字母和数字构成的字符串,其长度随大爷你的便。


Erlang will automatically create a random cookie file when the RabbitMQ server starts up. This will be typically located in /var/lib/rabbitmq/.erlang.cookie on Unix systems and C:\Users\Current User\.erlang.cookie or C:\Documents and Settings\Current User\.erlang.cookie on Windows systems. The easiest way to proceed is to allow one node to create the file, and then copy it to all the other nodes in the cluster.

Erlang 会在 RabbitMQ 服务启动后自动地创建一个具有随机 cookie 值的文件,该文件一般会位于 Unix 系统的 /var/lib/rabbitmq/.erlang.cookie 以及 Windows 系统的 C:\Users\Current User\.erlang.cookie 或者 C:\Documents and Settings\Current User\.erlang.cookie 。 最简单的方式就是让某一个 node 创建该 cookie 文件,然后收到将其拷贝到 cluster 中的所有其他 node 上。


As an alternative, you can insert the option "-setcookie cookie" in the erl call in the rabbitmq-server and rabbitmqctl scripts.

另外一种方法是,你可以使用在脚本命令 rabbitmq-server 和 rabbitmqctl 中使用选项 " -setcookie cookie" 。


Starting independent nodes
启动每一个单独的 node


Clusters are set up by re-configuring existing RabbitMQ nodes into a cluster configuration. Hence the first step is to start RabbitMQ on all nodes in the normal way:

要想建立一个 Cluster ,你就必须对每一个已经存在的 RabbitMQ node 按照 cluster 配置的方式重新进行配置。故第一步要做的就是在每一个 node 上都常规启动 RabbitMQ 服务:
rabbit1$ rabbitmq-server -detached
rabbit2$ rabbitmq-server -detached
rabbit3$ rabbitmq-server -detached


This creates three independent RabbitMQ brokers, one on each node, as confirmed by the cluster_status command:

这样就创建了 3 个独立的 RabbitMQ broker ,每一个 node 上一个,可以通过 cluster_status 命令来确认:

rabbit1$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit1 ...
[{nodes,[{disc,[rabbit@rabbit1]}]},{running_nodes,[rabbit@rabbit1]}]
...done.
rabbit2$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2 ...
[{nodes,[{disc,[rabbit@rabbit2]}]},{running_nodes,[rabbit@rabbit2]}]
...done.
rabbit3$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit3 ...
[{nodes,[{disc,[rabbit@rabbit3]}]},{running_nodes,[rabbit@rabbit3]}]
...done.

The node name of a RabbitMQ broker started from the rabbitmq-server shell script is rabbit@shorthostname, where the short node name is lower-case (as in rabbit@rabbit1, above). If you use the rabbitmq-server.bat batch file on Windows, the short node name is upper-case (as in rabbit@RABBIT1). When you type node names, case matters, and these strings must match exactly.

通过 rabbitmq-server 脚本命令创建的 RabbitMQ broker 对应的 node 的名字是 rabbit@shorthostname 样式,其中 short node 名字在 Linux 下是小写字母形式(如 rabbit@rabbit1)。如果您是在 Windows 上使用 rabbitmq-server.bat 批处理来执行的上述命令,short node 名字会是大写字母形式(如 rabbit@RABBIT1)。所以, 当你要使用 node 名字时,要注意大小写的问题,因为匹配时要求完全一致。


Creating the cluster
创建集群


In order to link up our three nodes in a cluster, we tell two of the nodes, say rabbit@rabbit2 and rabbit@rabbit3, to join the cluster of the third, say rabbit@rabbit1.

为了将我们创建的 3 个 node 连接成一个 cluster ,需要将其中两个 node(如 rabbit@rabbit2 和 rabbit@rabbit3)加入到第三个 node(如 rabbit@rabbit1)所在的 cluster 中。


We first join rabbit@rabbit2 as a ram node in a cluster with rabbit@rabbit1 in a cluster. To do that, on rabbit@rabbit2 we stop the RabbitMQ application and join the rabbit@rabbit1 cluster enabling the --ram flag, and restart the RabbitMQ application. Note that joining a cluster implicitly resets the node, thus removing all resources and data that were previously present on that node.

我们首先将 rabbit@rabbit2 按照内存 node 的方式加入到 rabbit@rabbit1 所在 cluster 中。我们需要先停止 rabbit@rabbit2 上的 RabbitMQ 应用,然后以使能 " --ram " 标识的方式加入到 rabbit@rabbit1 所在 cluster ,最后重新启动 RabbitMQ 应用。 注意:加入 cluster 的过程隐式包含了重置 node 的动作,即移除了当前 node 上之前存放的所的资源和数据。

rabbit2$ rabbitmqctl stop_app
Stopping node rabbit@rabbit2 ...done.
rabbit2$ rabbitmqctl join_cluster --ram rabbit@rabbit1
Clustering node rabbit@rabbit2 with [rabbit@rabbit1] ...done.
rabbit2$ rabbitmqctl start_app
Starting node rabbit@rabbit2 ...done.

We can see that the two nodes are joined in a cluster by running the cluster_status command on either of the nodes:

我们可以从 rabbit@rabbit1 或者 rabbit@rabbit2 上通过命令 cluster_status 看到两个 node 已经加入到同一个 cluster 中了:

rabbit1$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit1 ...
[{nodes,[{disc,[rabbit@rabbit1]},{ram,[rabbit@rabbit2]}]},
 {running_nodes,[rabbit@rabbit2,rabbit@rabbit1]}]
...done.
rabbit2$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2 ...
[{nodes,[{disc,[rabbit@rabbit1]},{ram,[rabbit@rabbit2]}]},
 {running_nodes,[rabbit@rabbit1,rabbit@rabbit2]}]
...done.

Now we join rabbit@rabbit3 as a disk node to the same cluster. The steps are identical to the ones above, except that we omit the --ram flag in order to turn it into a disk rather than ram node. This time we'll cluster to rabbit2 to demonstrate that the node chosen to cluster to does not matter - it is enough to provide one online node and the node will be clustered to the cluster that the specified node belongs to.

现在我们将 rabbit@rabbit3 以磁盘 node 的形式加入到同一个 cluster 中。步骤和上面的相同,除了需要省掉 "--ram" 标识以便按照磁盘 node 的形式加入。这次我们将加入 rabbit2 所在的 cluster (其实也是 rabbit1 所在的 cluster)以证明在这种情况下通过哪一个 node 加入 cluster 都是一样一样一样的。即只要我们提供了处于某个 cluster 中的可被其他人访问的 node ,那么该 node 所在的 cluster 就可以被其他 node 加入。

rabbit3$ rabbitmqctl stop_app
Stopping node rabbit@rabbit3 ...done.
rabbit3$ rabbitmqctl join_cluster rabbit@rabbit2
Clustering node rabbit@rabbit3 with rabbit@rabbit2 ...done.
rabbit3$ rabbitmqctl start_app
Starting node rabbit@rabbit3 ...done.

We can see that the three nodes are joined in a cluster by running the cluster_status command on any of the nodes:

我们可以从任意一个 node 上通过命令 cluster_status 看到三个 node 已经加入到同一个 cluster 中了:

rabbit1$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit1 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit3]},{ram,[rabbit@rabbit2]}]},
 {running_nodes,[rabbit@rabbit3,rabbit@rabbit2,rabbit@rabbit1]}]
...done.
rabbit2$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit3]},{ram,[rabbit@rabbit2]}]},
 {running_nodes,[rabbit@rabbit3,rabbit@rabbit1,rabbit@rabbit2]}]
...done.
rabbit3$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit3 ...
[{nodes,[{disc,[rabbit@rabbit3,rabbit@rabbit1]},{ram,[rabbit@rabbit2]}]},
 {running_nodes,[rabbit@rabbit2,rabbit@rabbit1,rabbit@rabbit3]}]
...done.

By following the above steps we can add new nodes to the cluster at any time, while the cluster is running.

按照上面的步骤,我们可以在任意时间添加新的 node 到 cluster 中,只要 cluster 处于运行状态。


Changing node types
改变 node 的类型


We can change the type of a node from ram to disk and vice versa. Say we wanted to reverse the types of rabbit@rabbit2 and rabbit@rabbit3, turning the former from a ram node into a disk node and the latter from a disk node into a ram node. To do that we can use the change_cluster_node_type command. The node must be stopped first.

我们可以改变 node 的类型,如磁盘 node 到内存 node ,或者相反。比如将 rabbit@rabbit2 和 rabbit@rabbit3 的 node 类型都变成和之前不同的种类。我们可以使用命令 change_cluster_node_type 来进行转换,但是首先需要将 node 停止。

rabbit2$ rabbitmqctl stop_app
Stopping node rabbit@rabbit2 ...done.
rabbit2$ rabbitmqctl change_cluster_node_type disc
Turning rabbit@rabbit2 into a disc node ...
...done.
rabbit2$ rabbitmqctl start_app
Starting node rabbit@rabbit2 ...done.

rabbit3$ rabbitmqctl stop_app
Stopping node rabbit@rabbit3 ...done.
rabbit3$ rabbitmqctl change_cluster_node_type ram
Turning rabbit@rabbit3 into a ram node ...
rabbit3$ rabbitmqctl start_app
Starting node rabbit@rabbit3 ...done.


Restarting cluster nodes
重新启动 cluster node


Nodes that have been joined to a cluster can be stopped at any time. It is also ok for them to crash. In both cases the rest of the cluster continues operating unaffected, and the nodes automatically "catch up" with the other cluster nodes when they start up again.

cluster 中的 node 在任何时候都可以被停止。 同样地如果他们崩溃了也是没有任何问题的。在上述两种情况中,cluster 中的其他 node 都可以不受任何影响的继续运行,这些“非正常” node 重新启动后会自动地与 cluster 中的其他 node 取得联系。


We shut down the nodes rabbit@rabbit1 and rabbit@rabbit3 and check on the cluster status at each step:

我们手动关闭 rabbit@rabbit1 和 rabbit@rabbit3 后,通过命令查看 cluster 的状态:

rabbit1$ rabbitmqctl stop
Stopping and halting node rabbit@rabbit1 ...done.

rabbit2$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]},{ram,[rabbit@rabbit3]}]},
 {running_nodes,[rabbit@rabbit3,rabbit@rabbit2]}]
...done.

rabbit3$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit3 ...
[{nodes,[{disc,[rabbit@rabbit2,rabbit@rabbit1]},{ram,[rabbit@rabbit3]}]},
 {running_nodes,[rabbit@rabbit2,rabbit@rabbit3]}]
...done.

rabbit3$ rabbitmqctl stop
Stopping and halting node rabbit@rabbit3 ...done.

rabbit2$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]},{ram,[rabbit@rabbit3]}]},
 {running_nodes,[rabbit@rabbit2]}]
...done.

Now we start the nodes again, checking on the cluster status as we go along:

现在我们重新启动 node ,并查看 cluster 的状态:

rabbit1$ rabbitmq-server -detached

rabbit1$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit1 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]},{ram,[rabbit@rabbit3]}]},
 {running_nodes,[rabbit@rabbit2,rabbit@rabbit1]}]
...done.

rabbit2$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]},{ram,[rabbit@rabbit3]}]},
 {running_nodes,[rabbit@rabbit1,rabbit@rabbit2]}]
...done.

rabbit3$ rabbitmq-server -detached

rabbit1$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit1 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]},{ram,[rabbit@rabbit3]}]},
 {running_nodes,[rabbit@rabbit2,rabbit@rabbit1,rabbit@rabbit3]}]
...done.

rabbit2$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]},{ram,[rabbit@rabbit3]}]},
 {running_nodes,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]
...done.

rabbit3$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit3 ...
[{nodes,[{disc,[rabbit@rabbit2,rabbit@rabbit1]},{ram,[rabbit@rabbit3]}]},
 {running_nodes,[rabbit@rabbit2,rabbit@rabbit1,rabbit@rabbit3]}]
...done.

There are some important caveats:

有几个需要注意的地方:


At least one disk node should be running at all times to prevent data loss. RabbitMQ will prevent the creation of a RAM-only cluster in many situations, but it still won't stop you from stopping and forcefully resetting all the disc nodes, which will lead to a RAM-only cluster. Doing this is not advisable and makes losing data very easy.

为了防止数据丢失的发生,在任何情况下都应该保证至少有一个 node 是采用磁盘 node 方式。RabbitMQ 在很多情况下会阻止创建仅有内存 node 的 cluster ,但是如果你通过手动将 cluster 中的全部磁盘 node 都停止掉或者强制 reset 所有的磁盘 node 的方式间接导致生成了仅有内存 node 的 cluster ,RabbitMQ 无法阻止你。你这么做本身是很不明智的,因为会导致你的数据非常容易丢失。


When the entire cluster is brought down, the last node to go down must be the first node to be brought online. If this doesn't happen, the nodes will wait 30 seconds for the last disc node to come back online, and fail afterwards. If the last node to go offline cannot be brought back up, it can be removed from the cluster using the forget_cluster_node command - consult the rabbitmqctl manpage for more information.

当整个 cluster 不能工作了,最后一个失效的 node 必须是第一个重新开始工作的那一个。如果这种情况得不到满足,所有 node 将会为最后一个磁盘 node 的恢复等待 30 秒。如果最后一个离线的 node 无法重新上线,我们可以通过命令 forget_cluster_node 将其从 cluster 中移除 - 具体参考 rabbitmqctl 的使用手册。


Breaking up a cluster
拆分 cluster


Nodes need to be removed explicitly from a cluster when they are no longer meant to be part of it. We first remove rabbit@rabbit3 from the cluster, returning it to independent operation. To do that, on rabbit@rabbit3 we stop the RabbitMQ application, reset the node, and restart the RabbitMQ application.

当 node 不应该继续存在于一个 cluster 中时,我们需要显式的将这些 node 移除。我们首先从 cluster 中移除 rabbit@rabbit3 ,将其还原为独立运行状态。具体做法为,在 rabbit@rabbit3 上先停止 RabbitMQ 应用,再重置 node ,最后重新启动  RabbitMQ 应用。

rabbit3$ rabbitmqctl stop_app
Stopping node rabbit@rabbit3 ...done.
rabbit3$ rabbitmqctl reset
Resetting node rabbit@rabbit3 ...done.
rabbit3$ rabbitmqctl start_app
Starting node rabbit@rabbit3 ...done.

Note that it would have been equally valid to list rabbit@rabbit3 as a node.

值得注意的是,此时仍旧可以通过 list 命令发现 rabbit@rabbit3 仍然作为 node 显示出来。


Running the cluster_status command on the nodes confirms that rabbit@rabbit3 now is no longer part of the cluster and operates independently:

在 node 上运行 cluster_status 命令可以发现 rabbit@rabbit3 已经不再是 cluster 中的一员,且已经处于独立运行状态:

rabbit1$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit1 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]}]},
 {running_nodes,[rabbit@rabbit2,rabbit@rabbit1]}]
...done.

rabbit2$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]}]},
 {running_nodes,[rabbit@rabbit1,rabbit@rabbit2]}]
...done.

rabbit3$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit3 ...
[{nodes,[{disc,[rabbit@rabbit3]}]},{running_nodes,[rabbit@rabbit3]}]
...done.

We can also remove nodes remotely. This is useful, for example, when having to deal with an unresponsive node. We can for example remove rabbit@rabbi1 from rabbit@rabbit2.

我们还可以利用远端移除 node 的操作,这在有些情况下是很有用的,比如对无任何反应的 node 的 处理 。例如,我们可以在 rabbit@rabbit2 上执行移除 rabbit@rabbit1 的操作。

rabbit1$ rabbitmqctl stop_app
Stopping node rabbit@rabbit1 ...done.

rabbit2$ rabbitmqctl forget_cluster_node rabbit@rabbit1
Removing node rabbit@rabbit1 from cluster ...
...done.

Note that rabbit1 still thinks its clustered with rabbit2, and trying to start it will result in an error. We will need to reset it to be able to start it again.

注意到,rabbit1 仍旧会认为自己与 rabbit2 处于同一个 cluster 中,但是此时在 rabbit1 上执行 start_app 操作会提示相应错误信息。如果需要,我们可以将 rabbit1 重置成与 rabbit2 处于 同一 cluster 的状态。

rabbit1$ rabbitmqctl start_app
Starting node rabbit@rabbit1 ...
Error: inconsistent_cluster: Node rabbit@rabbit1 thinks it's clustered with node rabbit@rabbit2, but rabbit@rabbit2 disagrees

rabbit1$ rabbitmqctl reset
Resetting node rabbit@rabbit1 ...done.

rabbit1$ rabbitmqctl start_app
Starting node rabbit@rabbit1 ...
...done.

The cluster_status command now shows all three nodes operating as independent RabbitMQ brokers:

此时执行 cluster_status 命令可以显示出当前所有 3 个 node 均是作为独立的 RabbitMQ broker 处于运行状态:

rabbit1$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit1 ...
[{nodes,[{disc,[rabbit@rabbit1]}]},{running_nodes,[rabbit@rabbit1]}]
...done.

rabbit2$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2 ...
[{nodes,[{disc,[rabbit@rabbit2]}]},{running_nodes,[rabbit@rabbit2]}]
...done.

rabbit3$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit3 ...
[{nodes,[{disc,[rabbit@rabbit3]}]},{running_nodes,[rabbit@rabbit3]}]
...done.

Note that rabbit@rabbit2 retains the residual state of the cluster, whereas rabbit@rabbit1 and rabbit@rabbit3 are freshly initialised RabbitMQ brokers. If we want to re-initialise rabbit@rabbit2 we follow the same steps as for the other nodes:

注意到 rabbit@rabbit2 会保有 cluster 的残余状态信息,而 rabbit@rabbit1 和 rabbit@rabbit3 却可以看成是新初始化的 RabbitMQ broker 。如果我们想要重新初始化 rabbit@rabbit2 ,我们可以按照下面的方式执行:

rabbit2$ rabbitmqctl stop_app
Stopping node rabbit@rabbit2 ...done.

rabbit2$ rabbitmqctl reset
Resetting node rabbit@rabbit2 ...done.

rabbit2$ rabbitmqctl start_app
Starting node rabbit@rabbit2 ...done.


Auto-configuration of a cluster
cluster 的自动配置


Instead of configuring clusters "on the fly" using the cluster command, clusters can also be set up via the RabbitMQ configuration file. The file should set the cluster_nodes field in the rabbit application to a tuple contanining a list of rabbit nodes, and an atom - either disc or ram - indicating whether the node should join them as a disc node or not.

替代采用 cluster 命令“匆忙地”进行相关 cluster 配置的方式,我们还可以采用通过 [ RabbitMQ configuration file ] 来进行 cluster 配置。该配置文件必须以 tuple 的形式设置在 rabbit 应用中需要的 cluster_nodes 域,其中 tuple 中包含了 rabbit 的 node 以及一个 atom 形式的标识 - 或者 disc 或者 ram - 表明当前 node 是什么类型的 node 加入到 cluster 中的。


If cluster_nodes is specified, RabbitMQ will try to cluster to each node provided, and stop after it can cluster with one of them. RabbitMQ will try cluster to any node which is online that has the same version of Erlang and RabbitMQ. If no suitable nodes are found, the node is left unclustered.

如果指定了 cluster_nodes 字段,RabbitMQ 将尝试对给出的 node 进行 cluster 操作,然后在与这些 node 之中的一个构成 cluster 之后停止。RabbitMQ 将尝试对任何在线的且具有相同 Erlang 和 RabbitMQ 版本的 node 进行 cluster 操作。如果没有发现合适的 node ,当前 node 将以非 cluster 的状态离开。


Note that the cluster configuration is applied only to fresh nodes. A fresh nodes is a node which has just been reset or is being start for the first time. Thus, the automatic clustering won't take place after restarts of nodes. This means that any change to the clustering via rabbitmqctl will take precedence over the automatic clustering configuration.

注意到, cluster 配置仅被用于 fresh node 。 fresh node 是指刚刚被 reset 或者首次被 start 的 node 。这样,自动 cluster 行为不会在重启 node 之后发生。 这意味着任何通过 rabbitmqctl 命令对 cluster 进行地改变将地位高于(覆盖)自动 cluster 配置。


A common use of cluster configuration via the RabbitMQ config file is to automatically configure nodes to join a common cluster. For this purpose the same cluster nodes can be specified on all cluster, plus the boolean to determine disc nodes.

运用 RabbitMQ 配置文件进行 cluster 配置的最常见形式是可以使得 node 自动加入到 cluster 中去。为了达到该目的,在所有的 cluster 上均指定相同的 cluster node ,且包含一个表明是否为磁盘 node 的布尔值。


Say we want to join our three separate nodes of our running example back into a single cluster, with rabbit@rabbit1 and rabbit@rabbit2 being the disk nodes of the cluster. First we reset and stop all nodes, to make sure that we're working with fresh nodes:

例如,我们想将之前拆开运行的 node 重新加入到同一个 cluster 中去,且 rabbit@rabbit1 和 rabbit@rabbit2 的 node 类型为磁盘 node 。首先,我们要 reset 和 stop 所有 node 以确保我们是以 fresh node 开始后续工作:

rabbit1$ rabbitmqctl stop_app
Stopping node rabbit@rabbit1 ...done.

rabbit1$ rabbitmqctl reset
Resetting node rabbit@rabbit1 ...done.

rabbit1$ rabbitmqctl stop
Stopping and halting node rabbit@rabbit1 ...done.

rabbit2$ rabbitmqctl stop_app
Stopping node rabbit@rabbit2 ...done.

rabbit2$ rabbitmqctl reset
Resetting node rabbit@rabbit2 ...done.

rabbit2$ rabbitmqctl stop
Stopping and halting node rabbit@rabbit2 ...done.

rabbit3$ rabbitmqctl stop_app
Stopping node rabbit@rabbit3 ...done.

rabbit3$ rabbitmqctl reset
Resetting node rabbit@rabbit3 ...done.

rabbit3$ rabbitmqctl stop
Stopping and halting node rabbit@rabbit3 ...done.

Now we set the relevant field in the config file:

此时我们在配置文件的相关字段上进行设置:

[
  ...
  {rabbit, [
        ...
        {cluster_nodes, {['rabbit@rabbit1', 'rabbit@rabbit2', 'rabbit@rabbit3'], disc}},
        ...
  ]},
  ...
].

For instance, if this were the only field we needed to set, we would simply create the RabbitMQ config file with the contents:

例如,如果我们只需要设置上面给出的字段,我们只需使用如下内容创建 RabbitMQ 配置文件:

[{rabbit,
  [{cluster_nodes, {['rabbit@rabbit1', 'rabbit@rabbit2', 'rabbit@rabbit3'], disc}}]}].
  

Since we want rabbit@rabbit3 to be a ram node, we need to specify that in its configuration file:

如果我们想将 rabbit@rabbit3 设置为内存 node ,我们需要在配置文件中具体指出:

[{rabbit,
  [{cluster_nodes, {['rabbit@rabbit1', 'rabbit@rabbit2', 'rabbit@rabbit3'], ram}}]}].
  

(Note for Erlang programmers and the curious: this is a standard Erlang configuration file. For more details, see the configuration guide and the Erlang Config Man Page.)

(注:上述配置文件是标准的 Erlang 配置文件,更多细节参考 [ configuration guide ] 和 [ Erlang Config Man Page ])


Once we have the configuration files in place, we simply start the nodes:

一旦我们准备好了配置文件,就可以简单地 start 相应的 node :

rabbit1$ rabbitmq-server -detached
rabbit2$ rabbitmq-server -detached
rabbit3$ rabbitmq-server -detached

We can see that the three nodes are joined in a cluster by running the cluster_status command on any of the nodes:

我们可以通过 cluster_status 命令看到 3 个 node 确实已经加入了同一个 cluster 中:

rabbit1$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit1 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]},{ram,[rabbit@rabbit3]}]},
 {running_nodes,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]
...done.

rabbit2$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]},{ram,[rabbit@rabbit3]}]},
 {running_nodes,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]
...done.

rabbit3$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit3 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]},{ram,[rabbit@rabbit3]}]},
 {running_nodes,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]
...done.

Note that, in order to remove a node from an auto-configured cluster, it must first be removed from the rabbitmq.config files of the other nodes in the cluster. Only then, can it be reset safely.

需要注意的是:为了从通过自动配置方式配置的 cluster 中移除 node ,你首先需要将该 node 从 cluster 中的其他 node 上的 rabbitmq.config 文件中移除,只有这样做,才能保证安全 reset 。


Upgrading clusters
cluster 升级


When upgrading from one version of RabbitMQ to another, RabbitMQ will automatically update its persistent data structures if necessary. In a cluster, this task is performed by the first disc node to be started (the "upgrader" node). Therefore when upgrading a RabbitMQ cluster, you should not attempt to start any RAM nodes first; any RAM nodes started will emit an error message and fail to start up.

当 RabbitMQ 从一个版本升级到另一个版本时,如果必要,RabbitMQ 会自动升级持久化数据结构。在 cluster 中,上述工作会由第一个被启动的磁盘 node 进行(即“负责升级的” node )。所以,当你升级一个 RabbitMQ cluster 的时候,不可以首先启动任何内存 node ,任何内存 node 的启动将产生一条错误消息并且启动失败。


All nodes in a cluster must be running the same versions of Erlang and RabbitMQ, although they may have different plugins installed. Therefore it is necessary to stop all nodes in the cluster, then start all nodes when performing an upgrade.

cluster 中的所有 node 必须运行在相同的 Erlang 和 RabbitMQ 版本之上,尽管他们都可以安装很多不同的插件。所以在升级 cluster 的时候有必要先将全部 node 都停止,升级之后再将全部 node 重新启动。


While not strictly necessary, it is a good idea to decide ahead of time which disc node will be the upgrader, stop that node last, and start it first. Otherwise changes to the cluster configuration that were made between the upgrader node stopping and the last node stopping will be lost.

尽管不是一定必要,但是建议你事先决定好使用哪个磁盘 node 作为升级点(upgrader),然后在升级过程中,最后停止那个 node ,最先启动那个 node 。否则,在 升级点 node 停止和最后停止的 node 之间所做的对于 cluster 配置的修改将会被丢失掉。


Automatic upgrades are only possible from RabbitMQ versions 2.1.1 and later. If you have an earlier cluster, you will need to rebuild it to upgrade.

自动升级的功能仅在 RabbitMQ 2.1.1 和之后的版本中才具有。如果你使用了更早版本的 cluster ,你讲需要通过重新构建的方式来升级。


A cluster on a single machine
单机上的 cluster


Under some circumstances it can be useful to run a cluster of RabbitMQ nodes on a single machine. This would typically be useful for experimenting with clustering on a desktop or laptop without the overhead of starting several virtual machines for the cluster. The two main requirements for running more than one node on a single machine are that each node should have a unique name and bind to a unique port / IP address combination for each protocol in use.

在一些情况下,在单机上运行 RabbitMQ node 的 cluster 可能对你很有实用价值。其中之一是,你可以在你的台式机或者笔记本上运行 cluster 而不用额外跑多个虚拟机。 想要在单机上运行超过一个 node 的两个主要要求是,每一个 node 应该具有一个唯一的名字,并且与唯一的 port/ip 绑定,以使得每一份协议都可用。


You can start multiple nodes on the same host manually by repeated invocation of rabbitmq-server ( rabbitmq-server.bat on Windows). You must ensure that for each invocation you set the environment variables RABBITMQ_NODENAME and RABBITMQ_NODE_PORT to suitable values.

你可以通过手动重复执行 rabbitmq-server 命令在同一主机上启动多个 node ,你必须确保你每次执行该命令时都对环境变量 RABBITMQ_NODENAME RABBITMQ_NODE_PORT 设置了合适的值。


举例:

$ RABBITMQ_NODE_PORT=5672 RABBITMQ_NODENAME=rabbit rabbitmq-server -detached
$ RABBITMQ_NODE_PORT=5673 RABBITMQ_NODENAME=hare rabbitmq-server -detached
$ rabbitmqctl -n hare stop_app
$ rabbitmqctl -n hare reset
$ rabbitmqctl -n hare cluster rabbit@`hostname -s`
$ rabbitmqctl -n hare start_app

will set up a two node cluster with one disc node and one ram node. Note that if you have RabbitMQ opening any ports other than AMQP, you'll need to configure those not to clash as well - for example:

上述命令建立了两个 node 的 cluster ,其中包含一个磁盘 node 一个内存 node 。注意到如果你令 RabbitMQ 使用了非 AMQP 协议指定的任何其他端口,你将需要通过配置保证不会出现端口冲突 - 例如:

$ RABBITMQ_NODE_PORT=5672 RABBITMQ_SERVER_START_ARGS="-rabbitmq_management listener [{port,15672}]" RABBITMQ_NODENAME=rabbit rabbitmq-server -detached
$ RABBITMQ_NODE_PORT=5673 RABBITMQ_SERVER_START_ARGS="-rabbitmq_management listener [{port,15673}]" RABBITMQ_NODENAME=hare rabbitmq-server -detached

will start two nodes (which can then be clustered) when the management plugin is installed.

上述命令同样建立了两个 node 的 cluster ,但是使用了管理插件。


Firewalled nodes
防火墙后的 node


The case for firewalled clustered nodes exists when nodes are in a data center or on a reliable network, but separated by firewalls. Again, clustering is not recommended over a WAN or when network links between nodes are unreliable.

这种情况是指数据中心或者可靠网络上的 cluster 中的 node 彼此之间存在防火墙的情况。再一次重申,不建议在 WAN 或者 node 之间的网络连接不可靠的情况下创建 cluster 。


If different nodes of a cluster are in the same data center, but behind firewalls then additional configuration will be necessary to ensure inter-node communication. Erlang makes use of a Port Mapper Daemon (epmd) for resolution of node names in a cluster. Nodes must be able to reach each other and the port mapper daemon for clustering to work.

如果 cluster 中的不同 node 均处于同一个数据中,但是处于防火墙之后,那么就需要进行额外的配置以保证 node 之间的正常通信。 Erlang 利用了端口映射守护进程(epmd)用于解析 cluster 中的 node 名字,node 之间以及 node 和 epmd 之间必须保证能够进行通信。


The default epmd port is 4369, but this can be changed using the ERL_EPMD_PORT environment variable. All nodes must use the same port. Firewalls must permit traffic on this port to pass between clustered nodes. For further details see the Erlang epmd manpage.

epmd 的默认端口是 4369 ,但是可以通过使用环境变量 ERL_EPMD_PORT 进行重新设置。 所有的 node 都必须使用相同的端口。 防火墙必须允许 cluster 中 node 之间在该端口上的相互通信。 进一步信息可以参考 [ Erlang epmd manpage ] 。


Once a distributed Erlang node address has been resolved via epmd, other nodes will attempt to communicate directly with that address using the Erlang distributed node protocol. The port range for this communication can be configured with two parameters for the Erlang kernel application:

一旦分布式 Erlang node 地址被 epmd 成功解析,其他 node 将尝试使用解析出的地址,通过 Erlang 分布式 node 协议进行直连。用于该通信的 端口范围 可以通过 Erlang kernel 应用的两个参数进行配置:

inet_dist_listen_min
inet_dist_listen_max

Firewalls must permit traffic in this range to pass between clustered nodes (assuming all nodes use the same port range). The default port range is unrestricted.

防火墙必须允许 cluster 中的 node 在这个端口范围内的通信(假定所有 node 都使用同样的端口范围)。 默认端口范围是无限制。


The Erlang kernel_app manpage contains more details on the port range that distributed Erlang nodes listen on. See the configuration page for information on how to create and edit a configuration file.

[ Erlang kernel_app manpage] 包含了更多关于分布式 Erlang node 可监听端口范围的细节。参考配置页[ configuration]


Connecting to Clusters from Clients
从客户端连接 cluster


A client can connect as normal to any node within a cluster. If that node should fail, and the rest of the cluster survives, then the client should notice the closed connection, and should be able to reconnect to some surviving member of the cluster. Generally, it's not advisable to bake in node hostnames or IP addresses into client applications: this introduces inflexibility and will require client applications to be edited, recompiled and redeployed should the configuration of the cluster change or the number of nodes in the cluster change. Instead, we recommend a more abstracted approach: this could be a dynamic DNS service which has a very short TTL configuration, or a plain TCP load balancer, or some sort of mobile IP achieved with pacemaker or similar technologies. In general, this aspect of managing the connection to nodes within a cluster is beyond the scope of RabbitMQ itself, and we recommend the use of other technologies designed specifically to solve these problems.

客户端可以透明地连接到 cluster 中的任意一个 node 上。 如果当前与客户端处于连接状态的那个 node 失效了,但是 cluster 中的其他 node 正常工作,那么客户端应该发现当前连接的关闭,然后应该可以重新连接到 cluster 中的其他正常的 node 上。一般来讲,将 node 的主机名或者 IP 地址 硬编码 到客户端应用程序中是非常不明智的:这会导致各种坑爹问题的出现,因为一旦 cluster 的配置改变或者 cluster 中的 ndoe 数目改变,客户端将面临重新编码、编译和重新发布的问题。作为替代,我们建议一种更加一般化的方式:采用 动态 DNS 服务 ,其具有非常短的 TTL 配置,或者 普通 TCP 负载均衡器 ,或者通过随机行走或者类似技术实现的某种形式的 mobile IP 。通常来讲,关于如何成功连接 cluster 中的 node 已经超出了 RabbitMQ 本身要说明的范畴,我们建议你使用其他的专门用于处理这方面问题的技术来解决这种问题。


展开阅读全文
打赏
2
23 收藏
分享
加载中
赞赞赞
2016/08/31 17:41
回复
举报
摩云飞博主

引用来自“jay_”的评论

引用来自“摩云飞”的评论

引用来自“jay_”的评论

请教下,当集群建立后,client的代码中HOST应该如何设置, 是可以配置多个IP还是配置disk node的IP,由rabbit自动分发? 这样如果disk node挂掉了就连不上了吗?

如果client所连接的node挂了,那么你当然就无法连接上这个node了~~如果存在LB,则应该会直接令你的client连接到其他node上。

也就是生产者连接时使用LB? 求教实现方式……………… 和web应用使用apache、nginx下面挂载多个实例一样吗? 我使用IBM的websphere MQ的时候它自带一个getway机制,下面挂载N个实例

建议你先研究一下《RabbitMQ in Action》的第五章内容 -- Clustering and dealing
with failure 。看看能否解决你的问题~~
2013/04/15 13:27
回复
举报

引用来自“摩云飞”的评论

引用来自“jay_”的评论

请教下,当集群建立后,client的代码中HOST应该如何设置, 是可以配置多个IP还是配置disk node的IP,由rabbit自动分发? 这样如果disk node挂掉了就连不上了吗?

如果client所连接的node挂了,那么你当然就无法连接上这个node了~~如果存在LB,则应该会直接令你的client连接到其他node上。

也就是生产者连接时使用LB? 求教实现方式……………… 和web应用使用apache、nginx下面挂载多个实例一样吗? 我使用IBM的websphere MQ的时候它自带一个getway机制,下面挂载N个实例
2013/04/15 12:45
回复
举报
摩云飞博主

引用来自“jay_”的评论

请教下,当集群建立后,client的代码中HOST应该如何设置, 是可以配置多个IP还是配置disk node的IP,由rabbit自动分发? 这样如果disk node挂掉了就连不上了吗?

如果client所连接的node挂了,那么你当然就无法连接上这个node了~~如果存在LB,则应该会直接令你的client连接到其他node上。
2013/04/15 10:02
回复
举报
摩云飞博主

引用来自“jay_”的评论

请教下,当集群建立后,client的代码中HOST应该如何设置, 是可以配置多个IP还是配置disk node的IP,由rabbit自动分发? 这样如果disk node挂掉了就连不上了吗?

集群解决的是服务的可用性,一般集群会配合LB一起使用,这样就不存在多个IP的问题了。另外,集群是会共享queue的元信息的,所以你的client无论连接到集群中哪个ip上都应该能正常消费message。
2013/04/15 09:59
回复
举报
请教下,当集群建立后,client的代码中HOST应该如何设置, 是可以配置多个IP还是配置disk node的IP,由rabbit自动分发? 这样如果disk node挂掉了就连不上了吗?
2013/04/15 08:56
回复
举报
摩云飞博主

引用来自“智深”的评论

这个必须得顶!!!对这个挺感兴趣的,过段时间看看撒。。。

谢谢关注,相互交流学习
2012/12/03 13:57
回复
举报
这个必须得顶!!!对这个挺感兴趣的,过段时间看看撒。。。
2012/12/03 13:50
回复
举报
更多评论
打赏
8 评论
23 收藏
2
分享
返回顶部
顶部