# 【原创】RabbitMQ官网文档翻译 -- Clustering Guide

2012/12/03 12:02

为了方便工作中使用，自己花费了周末空闲的时间对 RabbitMQ 的集群配置相关文档进行了翻译，鉴于自己水平有限，翻译中难免有纰漏产生，如果疑问，欢迎指出探讨。此文以中英对照方式呈现。

============== 我是分割线 ================

## Clustering Guide 集群配置

A RabbitMQ broker is a logical grouping of one or several Erlang nodes, each running the RabbitMQ application and sharing users, virtual hosts, queues, exchanges, etc. Sometimes we refer to the collection of nodes as a cluster.

RabbitMQ 中的 broker 是指一个或多个 Erlang node 的逻辑分组，每个 node 上面都运行 RabbitMQ 应用程序并且共享 user、vhost、queue、exchange 等。通常我们将 node 的集合称之为集群 cluster

All data/state required for the operation of a RabbitMQ broker is replicated across all nodes, for reliability and scaling, with full ACID properties. An exception to this are message queues, which by default reside on the node that created them, though they are visible and reachable from all nodes. To replicate queues across nodes in a cluster, see the documentation on high availability (note that you will need a working cluster first).

RabbitMQ clustering does not tolerate network partitions well, so it should not be used over a WAN. The shovel or federation plugins are better solutions for connecting brokers across a WAN.

RabbitMQ clustering 不能很好处理网络分裂的问题，故 RabbitMQ cluster 不应该用在 WAN 上。[shovel] 或者 [federation] 插件是用于 WAN 上的连接 broker 的比较好的解决方法。

network partition 即网络分裂。是指在系统中的任何两个分组之间的所有网络连接同时发生故障后所出现的情况。发生这种情况时，分裂的系统双方都会从本方一侧重新启动应用程序，进而导致重复服务或裂脑。如果一个群集中配置的两个独立系统具有对指定资源（通常是文件系统或卷）的独占访问权限，则会发生裂脑情况。由网络分裂造成的最为严重的问题是它会影响共享磁盘上的数据。

The composition of a cluster can be altered dynamically. All RabbitMQ brokers start out as running on a single node. These nodes can be joined into clusters, and subsequently turned back into individual brokers again.

cluster 的构成是可以动态改变的。 所有 RabbitMQ broker 在最初启动时都是从单独一个 node 上开始的。 这些 node 可以加入到同一个 cluster 中，之后还可以重新加回到不同的 broker 中。

RabbitMQ brokers tolerate the failure of individual nodes. Nodes can be started and stopped at will.

RabbitMQ broker 对单个 node 的失效是可以容忍的，node 可以随意地启动或者停止。

A node can be a disk node or a RAM node. (Note: disk and disc are used interchangeably. Configuration syntax or status messages normally use disc.) RAM nodes keep their state only in memory (with the exception of queue contents, which can reside on disc if the queue is persistent or too big to fit in memory). Disk nodes keep state in memory and on disk. As RAM nodes don't have to write to disk as much as disk nodes, they can perform better. However, not that since the queue data is always stored on disc, the performance improvements will affect only resources management (e.g. adding/removing queues, exchanges, or vhosts), but not publishing or consuming speed. Because state is replicated across all nodes in the cluster, it is sufficient (but not reccomended) to have just one disk node within a cluster, to store the state of the cluster safely.

node 的类型分为磁盘（disk） node 或者是内存（RAM） node 两种。（注：磁盘间可以相互替换，配置语法或者状态消息通常使用磁盘 node 进行存储） 内存 node 只在内存中保存状态信息（除了 queue 内容的特殊情况，即如果将 queue 的属性设置为 persistent 或者出现要存放的数据量太大不适合放在内存中的情况时，queue 中的内容会被存放到磁盘上）。 磁盘 node 同时在内存和磁盘上保存状态信息；而内存 node 不像磁盘 node 那样必须在磁盘上保存信息，故内存 node 具有更高效的性能。然而，并不是说 因为 queue 数据 总是保存在 disk 上， 所以只有资源管理（例如，增加/删除 quque 、exchange 或者 vhost） 才能够对性能提高产生影响， 还要考虑 publishing 和 consuming 速度的影响。 因为状态信息会在 cluster 包含的所有 node 中是可以进行复制，所以在一个 cluster 中只配置一个磁盘 node 便足够安全存储 cluster 的状态信息（但并不是说建议一定要这样做）。

## Clustering transcript 集群配置操作示范

The following is a transcript of setting up and manipulating a RabbitMQ cluster across three machines - rabbit1, rabbit2, rabbit3, with two of the machines replicating data on ram and disk, and the other replicating data in ram only.

We assume that the user is logged into all three machines, that RabbitMQ has been installed on the machines, and that the rabbitmq-server and rabbitmqctl scripts are in the user's PATH.

### Initial setup初始安装

Erlang nodes use a cookie to determine whether they are allowed to communicate with each other - for two nodes to be able to communicate they must have the same cookie.

Erlang node 使用 cookie 值来确定 node 间是否允许相互通信 - 两个 node 能够相互通信的前提是他们必须拥有相同的 cookie 值。

The cookie is just a string of alphanumeric characters. It can be as long or short as you like.

Erlang will automatically create a random cookie file when the RabbitMQ server starts up. This will be typically located in /var/lib/rabbitmq/.erlang.cookie on Unix systems and C:\Users\Current User\.erlang.cookie or C:\Documents and Settings\Current User\.erlang.cookie on Windows systems. The easiest way to proceed is to allow one node to create the file, and then copy it to all the other nodes in the cluster.

Erlang 会在 RabbitMQ 服务启动后自动地创建一个具有随机 cookie 值的文件，该文件一般会位于 Unix 系统的 /var/lib/rabbitmq/.erlang.cookie 以及 Windows 系统的 C:\Users\Current User\.erlang.cookie 或者 C:\Documents and Settings\Current User\.erlang.cookie 。 最简单的方式就是让某一个 node 创建该 cookie 文件，然后收到将其拷贝到 cluster 中的所有其他 node 上。

As an alternative, you can insert the option "-setcookie cookie" in the erl call in the rabbitmq-server and rabbitmqctl scripts.

### Starting independent nodes启动每一个单独的 node

Clusters are set up by re-configuring existing RabbitMQ nodes into a cluster configuration. Hence the first step is to start RabbitMQ on all nodes in the normal way:

rabbit1$rabbitmq-server -detached rabbit2$ rabbitmq-server -detached
rabbit3$rabbitmq-server -detached This creates three independent RabbitMQ brokers, one on each node, as confirmed by the cluster_status command: 这样就创建了 3 个独立的 RabbitMQ broker ，每一个 node 上一个，可以通过 cluster_status 命令来确认： rabbit1$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit1 ...
[{nodes,[{disc,[rabbit@rabbit1]}]},{running_nodes,[rabbit@rabbit1]}]
...done.
rabbit2$rabbitmqctl cluster_status Cluster status of node rabbit@rabbit2 ... [{nodes,[{disc,[rabbit@rabbit2]}]},{running_nodes,[rabbit@rabbit2]}] ...done. rabbit3$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit3 ...
[{nodes,[{disc,[rabbit@rabbit3]}]},{running_nodes,[rabbit@rabbit3]}]
...done.

The node name of a RabbitMQ broker started from the rabbitmq-server shell script is rabbit@shorthostname, where the short node name is lower-case (as in rabbit@rabbit1, above). If you use the rabbitmq-server.bat batch file on Windows, the short node name is upper-case (as in rabbit@RABBIT1). When you type node names, case matters, and these strings must match exactly.

### Creating the cluster创建集群

In order to link up our three nodes in a cluster, we tell two of the nodes, say rabbit@rabbit2 and rabbit@rabbit3, to join the cluster of the third, say rabbit@rabbit1.

We first join rabbit@rabbit2 as a ram node in a cluster with rabbit@rabbit1 in a cluster. To do that, on rabbit@rabbit2 we stop the RabbitMQ application and join the rabbit@rabbit1 cluster enabling the --ram flag, and restart the RabbitMQ application. Note that joining a cluster implicitly resets the node, thus removing all resources and data that were previously present on that node.

rabbit2$rabbitmqctl stop_app Stopping node rabbit@rabbit2 ...done. rabbit2$ rabbitmqctl join_cluster --ram rabbit@rabbit1
Clustering node rabbit@rabbit2 with [rabbit@rabbit1] ...done.
rabbit2$rabbitmqctl start_app Starting node rabbit@rabbit2 ...done. We can see that the two nodes are joined in a cluster by running the cluster_status command on either of the nodes: 我们可以从 rabbit@rabbit1 或者 rabbit@rabbit2 上通过命令 cluster_status 看到两个 node 已经加入到同一个 cluster 中了： rabbit1$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit1 ...
[{nodes,[{disc,[rabbit@rabbit1]},{ram,[rabbit@rabbit2]}]},
{running_nodes,[rabbit@rabbit2,rabbit@rabbit1]}]
...done.
rabbit2$rabbitmqctl cluster_status Cluster status of node rabbit@rabbit2 ... [{nodes,[{disc,[rabbit@rabbit1]},{ram,[rabbit@rabbit2]}]}, {running_nodes,[rabbit@rabbit1,rabbit@rabbit2]}] ...done. Now we join rabbit@rabbit3 as a disk node to the same cluster. The steps are identical to the ones above, except that we omit the --ram flag in order to turn it into a disk rather than ram node. This time we'll cluster to rabbit2 to demonstrate that the node chosen to cluster to does not matter - it is enough to provide one online node and the node will be clustered to the cluster that the specified node belongs to. 现在我们将 rabbit@rabbit3 以磁盘 node 的形式加入到同一个 cluster 中。步骤和上面的相同，除了需要省掉 "--ram" 标识以便按照磁盘 node 的形式加入。这次我们将加入 rabbit2 所在的 cluster （其实也是 rabbit1 所在的 cluster）以证明在这种情况下通过哪一个 node 加入 cluster 都是一样一样一样的。即只要我们提供了处于某个 cluster 中的可被其他人访问的 node ，那么该 node 所在的 cluster 就可以被其他 node 加入。 rabbit3$ rabbitmqctl stop_app
Stopping node rabbit@rabbit3 ...done.
rabbit3$rabbitmqctl join_cluster rabbit@rabbit2 Clustering node rabbit@rabbit3 with rabbit@rabbit2 ...done. rabbit3$ rabbitmqctl start_app
Starting node rabbit@rabbit3 ...done.

We can see that the three nodes are joined in a cluster by running the cluster_status command on any of the nodes:

rabbit1$rabbitmqctl cluster_status Cluster status of node rabbit@rabbit1 ... [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit3]},{ram,[rabbit@rabbit2]}]}, {running_nodes,[rabbit@rabbit3,rabbit@rabbit2,rabbit@rabbit1]}] ...done. rabbit2$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit3]},{ram,[rabbit@rabbit2]}]},
{running_nodes,[rabbit@rabbit3,rabbit@rabbit1,rabbit@rabbit2]}]
...done.
rabbit3$rabbitmqctl cluster_status Cluster status of node rabbit@rabbit3 ... [{nodes,[{disc,[rabbit@rabbit3,rabbit@rabbit1]},{ram,[rabbit@rabbit2]}]}, {running_nodes,[rabbit@rabbit2,rabbit@rabbit1,rabbit@rabbit3]}] ...done. By following the above steps we can add new nodes to the cluster at any time, while the cluster is running. 按照上面的步骤，我们可以在任意时间添加新的 node 到 cluster 中，只要 cluster 处于运行状态。 ### Changing node types改变 node 的类型 We can change the type of a node from ram to disk and vice versa. Say we wanted to reverse the types of rabbit@rabbit2 and rabbit@rabbit3, turning the former from a ram node into a disk node and the latter from a disk node into a ram node. To do that we can use the change_cluster_node_type command. The node must be stopped first. 我们可以改变 node 的类型，如磁盘 node 到内存 node ，或者相反。比如将 rabbit@rabbit2 和 rabbit@rabbit3 的 node 类型都变成和之前不同的种类。我们可以使用命令 change_cluster_node_type 来进行转换，但是首先需要将 node 停止。 rabbit2$ rabbitmqctl stop_app
Stopping node rabbit@rabbit2 ...done.
rabbit2$rabbitmqctl change_cluster_node_type disc Turning rabbit@rabbit2 into a disc node ... ...done. rabbit2$ rabbitmqctl start_app
Starting node rabbit@rabbit2 ...done.

rabbit3$rabbitmqctl stop_app Stopping node rabbit@rabbit3 ...done. rabbit3$ rabbitmqctl change_cluster_node_type ram
Turning rabbit@rabbit3 into a ram node ...
rabbit3$rabbitmqctl start_app Starting node rabbit@rabbit3 ...done. ### Restarting cluster nodes 重新启动 cluster node Nodes that have been joined to a cluster can be stopped at any time. It is also ok for them to crash. In both cases the rest of the cluster continues operating unaffected, and the nodes automatically "catch up" with the other cluster nodes when they start up again. cluster 中的 node 在任何时候都可以被停止。 同样地如果他们崩溃了也是没有任何问题的。在上述两种情况中，cluster 中的其他 node 都可以不受任何影响的继续运行，这些“非正常” node 重新启动后会自动地与 cluster 中的其他 node 取得联系。 We shut down the nodes rabbit@rabbit1 and rabbit@rabbit3 and check on the cluster status at each step: 我们手动关闭 rabbit@rabbit1 和 rabbit@rabbit3 后，通过命令查看 cluster 的状态： rabbit1$ rabbitmqctl stop
Stopping and halting node rabbit@rabbit1 ...done.

rabbit2$rabbitmqctl cluster_status Cluster status of node rabbit@rabbit2 ... [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]},{ram,[rabbit@rabbit3]}]}, {running_nodes,[rabbit@rabbit3,rabbit@rabbit2]}] ...done. rabbit3$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit3 ...
[{nodes,[{disc,[rabbit@rabbit2,rabbit@rabbit1]},{ram,[rabbit@rabbit3]}]},
{running_nodes,[rabbit@rabbit2,rabbit@rabbit3]}]
...done.

rabbit3$rabbitmqctl stop Stopping and halting node rabbit@rabbit3 ...done. rabbit2$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]},{ram,[rabbit@rabbit3]}]},
{running_nodes,[rabbit@rabbit2]}]
...done.

Now we start the nodes again, checking on the cluster status as we go along:

rabbit1$rabbitmq-server -detached rabbit1$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit1 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]},{ram,[rabbit@rabbit3]}]},
{running_nodes,[rabbit@rabbit2,rabbit@rabbit1]}]
...done.

rabbit2$rabbitmqctl cluster_status Cluster status of node rabbit@rabbit2 ... [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]},{ram,[rabbit@rabbit3]}]}, {running_nodes,[rabbit@rabbit1,rabbit@rabbit2]}] ...done. rabbit3$ rabbitmq-server -detached

rabbit1$rabbitmqctl cluster_status Cluster status of node rabbit@rabbit1 ... [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]},{ram,[rabbit@rabbit3]}]}, {running_nodes,[rabbit@rabbit2,rabbit@rabbit1,rabbit@rabbit3]}] ...done. rabbit2$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]},{ram,[rabbit@rabbit3]}]},
{running_nodes,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]
...done.

rabbit3$rabbitmqctl cluster_status Cluster status of node rabbit@rabbit3 ... [{nodes,[{disc,[rabbit@rabbit2,rabbit@rabbit1]},{ram,[rabbit@rabbit3]}]}, {running_nodes,[rabbit@rabbit2,rabbit@rabbit1,rabbit@rabbit3]}] ...done. There are some important caveats: 有几个需要注意的地方： At least one disk node should be running at all times to prevent data loss. RabbitMQ will prevent the creation of a RAM-only cluster in many situations, but it still won't stop you from stopping and forcefully resetting all the disc nodes, which will lead to a RAM-only cluster. Doing this is not advisable and makes losing data very easy. 为了防止数据丢失的发生，在任何情况下都应该保证至少有一个 node 是采用磁盘 node 方式。RabbitMQ 在很多情况下会阻止创建仅有内存 node 的 cluster ，但是如果你通过手动将 cluster 中的全部磁盘 node 都停止掉或者强制 reset 所有的磁盘 node 的方式间接导致生成了仅有内存 node 的 cluster ，RabbitMQ 无法阻止你。你这么做本身是很不明智的，因为会导致你的数据非常容易丢失。 When the entire cluster is brought down, the last node to go down must be the first node to be brought online. If this doesn't happen, the nodes will wait 30 seconds for the last disc node to come back online, and fail afterwards. If the last node to go offline cannot be brought back up, it can be removed from the cluster using the forget_cluster_node command - consult the rabbitmqctl manpage for more information. 当整个 cluster 不能工作了，最后一个失效的 node 必须是第一个重新开始工作的那一个。如果这种情况得不到满足，所有 node 将会为最后一个磁盘 node 的恢复等待 30 秒。如果最后一个离线的 node 无法重新上线，我们可以通过命令 forget_cluster_node 将其从 cluster 中移除 - 具体参考 rabbitmqctl 的使用手册。 ### Breaking up a cluster拆分 cluster Nodes need to be removed explicitly from a cluster when they are no longer meant to be part of it. We first remove rabbit@rabbit3 from the cluster, returning it to independent operation. To do that, on rabbit@rabbit3 we stop the RabbitMQ application, reset the node, and restart the RabbitMQ application. 当 node 不应该继续存在于一个 cluster 中时，我们需要显式的将这些 node 移除。我们首先从 cluster 中移除 rabbit@rabbit3 ，将其还原为独立运行状态。具体做法为，在 rabbit@rabbit3 上先停止 RabbitMQ 应用，再重置 node ，最后重新启动 RabbitMQ 应用。 rabbit3$ rabbitmqctl stop_app
Stopping node rabbit@rabbit3 ...done.
rabbit3$rabbitmqctl reset Resetting node rabbit@rabbit3 ...done. rabbit3$ rabbitmqctl start_app
Starting node rabbit@rabbit3 ...done.

Note that it would have been equally valid to list rabbit@rabbit3 as a node.

Running the cluster_status command on the nodes confirms that rabbit@rabbit3 now is no longer part of the cluster and operates independently:

rabbit1$rabbitmqctl cluster_status Cluster status of node rabbit@rabbit1 ... [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]}]}, {running_nodes,[rabbit@rabbit2,rabbit@rabbit1]}] ...done. rabbit2$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]}]},
{running_nodes,[rabbit@rabbit1,rabbit@rabbit2]}]
...done.

rabbit3$rabbitmqctl cluster_status Cluster status of node rabbit@rabbit3 ... [{nodes,[{disc,[rabbit@rabbit3]}]},{running_nodes,[rabbit@rabbit3]}] ...done. We can also remove nodes remotely. This is useful, for example, when having to deal with an unresponsive node. We can for example remove rabbit@rabbi1 from rabbit@rabbit2. 我们还可以利用远端移除 node 的操作，这在有些情况下是很有用的，比如对无任何反应的 node 的 处理 。例如，我们可以在 rabbit@rabbit2 上执行移除 rabbit@rabbit1 的操作。 rabbit1$ rabbitmqctl stop_app
Stopping node rabbit@rabbit1 ...done.

rabbit2$rabbitmqctl forget_cluster_node rabbit@rabbit1 Removing node rabbit@rabbit1 from cluster ... ...done. Note that rabbit1 still thinks its clustered with rabbit2, and trying to start it will result in an error. We will need to reset it to be able to start it again. 注意到，rabbit1 仍旧会认为自己与 rabbit2 处于同一个 cluster 中，但是此时在 rabbit1 上执行 start_app 操作会提示相应错误信息。如果需要，我们可以将 rabbit1 重置成与 rabbit2 处于 同一 cluster 的状态。 rabbit1$ rabbitmqctl start_app
Starting node rabbit@rabbit1 ...
Error: inconsistent_cluster: Node rabbit@rabbit1 thinks it's clustered with node rabbit@rabbit2, but rabbit@rabbit2 disagrees

rabbit1$rabbitmqctl reset Resetting node rabbit@rabbit1 ...done. rabbit1$ rabbitmqctl start_app
Starting node rabbit@rabbit1 ...
...done.

The cluster_status command now shows all three nodes operating as independent RabbitMQ brokers:

rabbit1$rabbitmqctl cluster_status Cluster status of node rabbit@rabbit1 ... [{nodes,[{disc,[rabbit@rabbit1]}]},{running_nodes,[rabbit@rabbit1]}] ...done. rabbit2$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2 ...
[{nodes,[{disc,[rabbit@rabbit2]}]},{running_nodes,[rabbit@rabbit2]}]
...done.

rabbit3$rabbitmqctl cluster_status Cluster status of node rabbit@rabbit3 ... [{nodes,[{disc,[rabbit@rabbit3]}]},{running_nodes,[rabbit@rabbit3]}] ...done. Note that rabbit@rabbit2 retains the residual state of the cluster, whereas rabbit@rabbit1 and rabbit@rabbit3 are freshly initialised RabbitMQ brokers. If we want to re-initialise rabbit@rabbit2 we follow the same steps as for the other nodes: 注意到 rabbit@rabbit2 会保有 cluster 的残余状态信息，而 rabbit@rabbit1 和 rabbit@rabbit3 却可以看成是新初始化的 RabbitMQ broker 。如果我们想要重新初始化 rabbit@rabbit2 ，我们可以按照下面的方式执行： rabbit2$ rabbitmqctl stop_app
Stopping node rabbit@rabbit2 ...done.

rabbit2$rabbitmqctl reset Resetting node rabbit@rabbit2 ...done. rabbit2$ rabbitmqctl start_app
Starting node rabbit@rabbit2 ...done.

### Auto-configuration of a clustercluster 的自动配置

Instead of configuring clusters "on the fly" using the cluster command, clusters can also be set up via the RabbitMQ configuration file. The file should set the cluster_nodes field in the rabbit application to a tuple contanining a list of rabbit nodes, and an atom - either disc or ram - indicating whether the node should join them as a disc node or not.

If cluster_nodes is specified, RabbitMQ will try to cluster to each node provided, and stop after it can cluster with one of them. RabbitMQ will try cluster to any node which is online that has the same version of Erlang and RabbitMQ. If no suitable nodes are found, the node is left unclustered.

Note that the cluster configuration is applied only to fresh nodes. A fresh nodes is a node which has just been reset or is being start for the first time. Thus, the automatic clustering won't take place after restarts of nodes. This means that any change to the clustering via rabbitmqctl will take precedence over the automatic clustering configuration.

A common use of cluster configuration via the RabbitMQ config file is to automatically configure nodes to join a common cluster. For this purpose the same cluster nodes can be specified on all cluster, plus the boolean to determine disc nodes.

Say we want to join our three separate nodes of our running example back into a single cluster, with rabbit@rabbit1 and rabbit@rabbit2 being the disk nodes of the cluster. First we reset and stop all nodes, to make sure that we're working with fresh nodes:

rabbit1$rabbitmqctl stop_app Stopping node rabbit@rabbit1 ...done. rabbit1$ rabbitmqctl reset
Resetting node rabbit@rabbit1 ...done.

rabbit1$rabbitmqctl stop Stopping and halting node rabbit@rabbit1 ...done. rabbit2$ rabbitmqctl stop_app
Stopping node rabbit@rabbit2 ...done.

rabbit2$rabbitmqctl reset Resetting node rabbit@rabbit2 ...done. rabbit2$ rabbitmqctl stop
Stopping and halting node rabbit@rabbit2 ...done.

rabbit3$rabbitmqctl stop_app Stopping node rabbit@rabbit3 ...done. rabbit3$ rabbitmqctl reset
Resetting node rabbit@rabbit3 ...done.

rabbit3$rabbitmqctl stop Stopping and halting node rabbit@rabbit3 ...done. Now we set the relevant field in the config file: 此时我们在配置文件的相关字段上进行设置： [ ... {rabbit, [ ... {cluster_nodes, {['rabbit@rabbit1', 'rabbit@rabbit2', 'rabbit@rabbit3'], disc}}, ... ]}, ... ]. For instance, if this were the only field we needed to set, we would simply create the RabbitMQ config file with the contents: 例如，如果我们只需要设置上面给出的字段，我们只需使用如下内容创建 RabbitMQ 配置文件： [{rabbit, [{cluster_nodes, {['rabbit@rabbit1', 'rabbit@rabbit2', 'rabbit@rabbit3'], disc}}]}]. Since we want rabbit@rabbit3 to be a ram node, we need to specify that in its configuration file: 如果我们想将 rabbit@rabbit3 设置为内存 node ，我们需要在配置文件中具体指出： [{rabbit, [{cluster_nodes, {['rabbit@rabbit1', 'rabbit@rabbit2', 'rabbit@rabbit3'], ram}}]}]. (Note for Erlang programmers and the curious: this is a standard Erlang configuration file. For more details, see the configuration guide and the Erlang Config Man Page.) （注：上述配置文件是标准的 Erlang 配置文件，更多细节参考 [ configuration guide ] 和 [ Erlang Config Man Page ]） Once we have the configuration files in place, we simply start the nodes: 一旦我们准备好了配置文件，就可以简单地 start 相应的 node ： rabbit1$ rabbitmq-server -detached
rabbit2$rabbitmq-server -detached rabbit3$ rabbitmq-server -detached

We can see that the three nodes are joined in a cluster by running the cluster_status command on any of the nodes:

rabbit1$rabbitmqctl cluster_status Cluster status of node rabbit@rabbit1 ... [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]},{ram,[rabbit@rabbit3]}]}, {running_nodes,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}] ...done. rabbit2$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]},{ram,[rabbit@rabbit3]}]},
{running_nodes,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]
...done.

rabbit3$rabbitmqctl cluster_status Cluster status of node rabbit@rabbit3 ... [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]},{ram,[rabbit@rabbit3]}]}, {running_nodes,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}] ...done. Note that, in order to remove a node from an auto-configured cluster, it must first be removed from the rabbitmq.config files of the other nodes in the cluster. Only then, can it be reset safely. 需要注意的是：为了从通过自动配置方式配置的 cluster 中移除 node ，你首先需要将该 node 从 cluster 中的其他 node 上的 rabbitmq.config 文件中移除，只有这样做，才能保证安全 reset 。 ### Upgrading clusterscluster 升级 When upgrading from one version of RabbitMQ to another, RabbitMQ will automatically update its persistent data structures if necessary. In a cluster, this task is performed by the first disc node to be started (the "upgrader" node). Therefore when upgrading a RabbitMQ cluster, you should not attempt to start any RAM nodes first; any RAM nodes started will emit an error message and fail to start up. 当 RabbitMQ 从一个版本升级到另一个版本时，如果必要，RabbitMQ 会自动升级持久化数据结构。在 cluster 中，上述工作会由第一个被启动的磁盘 node 进行（即“负责升级的” node ）。所以，当你升级一个 RabbitMQ cluster 的时候，不可以首先启动任何内存 node ，任何内存 node 的启动将产生一条错误消息并且启动失败。 All nodes in a cluster must be running the same versions of Erlang and RabbitMQ, although they may have different plugins installed. Therefore it is necessary to stop all nodes in the cluster, then start all nodes when performing an upgrade. cluster 中的所有 node 必须运行在相同的 Erlang 和 RabbitMQ 版本之上，尽管他们都可以安装很多不同的插件。所以在升级 cluster 的时候有必要先将全部 node 都停止，升级之后再将全部 node 重新启动。 While not strictly necessary, it is a good idea to decide ahead of time which disc node will be the upgrader, stop that node last, and start it first. Otherwise changes to the cluster configuration that were made between the upgrader node stopping and the last node stopping will be lost. 尽管不是一定必要，但是建议你事先决定好使用哪个磁盘 node 作为升级点（upgrader），然后在升级过程中，最后停止那个 node ，最先启动那个 node 。否则，在 升级点 node 停止和最后停止的 node 之间所做的对于 cluster 配置的修改将会被丢失掉。 Automatic upgrades are only possible from RabbitMQ versions 2.1.1 and later. If you have an earlier cluster, you will need to rebuild it to upgrade. 自动升级的功能仅在 RabbitMQ 2.1.1 和之后的版本中才具有。如果你使用了更早版本的 cluster ，你讲需要通过重新构建的方式来升级。 ### A cluster on a single machine单机上的 cluster Under some circumstances it can be useful to run a cluster of RabbitMQ nodes on a single machine. This would typically be useful for experimenting with clustering on a desktop or laptop without the overhead of starting several virtual machines for the cluster. The two main requirements for running more than one node on a single machine are that each node should have a unique name and bind to a unique port / IP address combination for each protocol in use. 在一些情况下，在单机上运行 RabbitMQ node 的 cluster 可能对你很有实用价值。其中之一是，你可以在你的台式机或者笔记本上运行 cluster 而不用额外跑多个虚拟机。 想要在单机上运行超过一个 node 的两个主要要求是，每一个 node 应该具有一个唯一的名字，并且与唯一的 port/ip 绑定，以使得每一份协议都可用。 You can start multiple nodes on the same host manually by repeated invocation of rabbitmq-server ( rabbitmq-server.bat on Windows). You must ensure that for each invocation you set the environment variables RABBITMQ_NODENAME and RABBITMQ_NODE_PORT to suitable values. 你可以通过手动重复执行 rabbitmq-server 命令在同一主机上启动多个 node ，你必须确保你每次执行该命令时都对环境变量 RABBITMQ_NODENAME RABBITMQ_NODE_PORT 设置了合适的值。 举例： $ RABBITMQ_NODE_PORT=5672 RABBITMQ_NODENAME=rabbit rabbitmq-server -detached
$RABBITMQ_NODE_PORT=5673 RABBITMQ_NODENAME=hare rabbitmq-server -detached$ rabbitmqctl -n hare stop_app
$rabbitmqctl -n hare reset$ rabbitmqctl -n hare cluster rabbit@hostname -s
$rabbitmqctl -n hare start_app will set up a two node cluster with one disc node and one ram node. Note that if you have RabbitMQ opening any ports other than AMQP, you'll need to configure those not to clash as well - for example: 上述命令建立了两个 node 的 cluster ，其中包含一个磁盘 node 一个内存 node 。注意到如果你令 RabbitMQ 使用了非 AMQP 协议指定的任何其他端口，你将需要通过配置保证不会出现端口冲突 - 例如： $ RABBITMQ_NODE_PORT=5672 RABBITMQ_SERVER_START_ARGS="-rabbitmq_management listener [{port,15672}]" RABBITMQ_NODENAME=rabbit rabbitmq-server -detached
\$ RABBITMQ_NODE_PORT=5673 RABBITMQ_SERVER_START_ARGS="-rabbitmq_management listener [{port,15673}]" RABBITMQ_NODENAME=hare rabbitmq-server -detached

will start two nodes (which can then be clustered) when the management plugin is installed.

### Firewalled nodes防火墙后的 node

The case for firewalled clustered nodes exists when nodes are in a data center or on a reliable network, but separated by firewalls. Again, clustering is not recommended over a WAN or when network links between nodes are unreliable.

If different nodes of a cluster are in the same data center, but behind firewalls then additional configuration will be necessary to ensure inter-node communication. Erlang makes use of a Port Mapper Daemon (epmd) for resolution of node names in a cluster. Nodes must be able to reach each other and the port mapper daemon for clustering to work.

The default epmd port is 4369, but this can be changed using the ERL_EPMD_PORT environment variable. All nodes must use the same port. Firewalls must permit traffic on this port to pass between clustered nodes. For further details see the Erlang epmd manpage.

epmd 的默认端口是 4369 ，但是可以通过使用环境变量 ERL_EPMD_PORT 进行重新设置。 所有的 node 都必须使用相同的端口。 防火墙必须允许 cluster 中 node 之间在该端口上的相互通信。 进一步信息可以参考 [ Erlang epmd manpage ] 。

Once a distributed Erlang node address has been resolved via epmd, other nodes will attempt to communicate directly with that address using the Erlang distributed node protocol. The port range for this communication can be configured with two parameters for the Erlang kernel application:

inet_dist_listen_min
inet_dist_listen_max

Firewalls must permit traffic in this range to pass between clustered nodes (assuming all nodes use the same port range). The default port range is unrestricted.

The Erlang kernel_app manpage contains more details on the port range that distributed Erlang nodes listen on. See the configuration page for information on how to create and edit a configuration file.

[ Erlang kernel_app manpage] 包含了更多关于分布式 Erlang node 可监听端口范围的细节。参考配置页[ configuration]

### Connecting to Clusters from Clients从客户端连接 cluster

A client can connect as normal to any node within a cluster. If that node should fail, and the rest of the cluster survives, then the client should notice the closed connection, and should be able to reconnect to some surviving member of the cluster. Generally, it's not advisable to bake in node hostnames or IP addresses into client applications: this introduces inflexibility and will require client applications to be edited, recompiled and redeployed should the configuration of the cluster change or the number of nodes in the cluster change. Instead, we recommend a more abstracted approach: this could be a dynamic DNS service which has a very short TTL configuration, or a plain TCP load balancer, or some sort of mobile IP achieved with pacemaker or similar technologies. In general, this aspect of managing the connection to nodes within a cluster is beyond the scope of RabbitMQ itself, and we recommend the use of other technologies designed specifically to solve these problems.

2
23 收藏

### 作者的其它热门文章

2016/08/31 17:41

#### 引用来自“jay_”的评论

with failure 。看看能否解决你的问题~~
2013/04/15 13:27

2013/04/15 12:45

2013/04/15 10:02

2013/04/15 09:59

2013/04/15 08:56

2012/12/03 13:57

2012/12/03 13:50

8 评论
23 收藏
2