【原创】RabbitMQ官网文档翻译 -- Highly Available Queues
【原创】RabbitMQ官网文档翻译 -- Highly Available Queues
摩云飞 发表于5年前
【原创】RabbitMQ官网文档翻译 -- Highly Available Queues
  • 发表于 5年前
  • 阅读 2829
  • 收藏 18
  • 点赞 2
  • 评论 1

腾讯云 新注册用户 域名抢购1元起>>>   

      为了方便工作中使用,对 RabbitMQ 的【高可用】相关文档进行了翻译,鉴于自己水平有限,翻译中难免有纰漏产生,如果疑问,欢迎指出探讨。此文以中英对照方式呈现。

官方原文:http://www.rabbitmq.com/ha.html


Highly Available Queues
高可用 queue

If your RabbitMQ broker consists of a single node, then a failure of that node will cause downtime, temporary unavailability of service, and potentially loss of messages (especially non-persistent messages held by non-durable queues). You could publish all messages persistent, to durable queues, but even then, due to buffering there is an amount of time between the message being sent and the message being written to disk and fsync'd. Using publisher confirms is one means to ensure the client understands which messages have been written to disk, but even so, you may not wish to suffer the downtime and inconvenience of the unavailability of service caused by a node failure, or the performance degradation of having to write every message to disk.

如果你的 RabbitMQ broker 只是由单独一个 node 构成,那么该 node 的失效将导致整个服务临时性的不可用,并且可能会导致 message 丢失(尤其是在非持久化 message 存储于非持久化的 queue 中的时候)。你当然可以将所有 publish 的 message 都设置为持久化的,并且使用持久化的 queue ,但是你仍然无法避免 由于 buffering 导致的问题 :因为在 message 被发出后 和被 写入磁盘fsync 之间存在一个虽然短暂但是会产生问题的时间窗。通过 [ publisher confirms ] 机制能够确保客户端知道哪些消息已经写入磁盘,尽管如此,你一定仍不希望遇到因为单点故障导致的服务器停用,进而导致服务不可用的尴尬局面,同样,你也一定不喜欢将每一条 message 都写入磁盘导致的服务器性能退化。

You could use a cluster of RabbitMQ nodes to construct your RabbitMQ broker. This will be resilient to the loss of individual nodes in terms of the overall availability of service, but some important caveats apply: whilst exchanges and bindings survive the loss of individual nodes, queues and their messages do not. This is because a queue and its contents reside on exactly one node, thus the loss of a node will render its queues unavailable.

你可以使用 RabbitMQ node 构成的 [ cluster ] 来构建属于你的 RabbitMQ broker 。如此做,从服务的整体可用性上来讲,该 cluster 对于单独 node 的失效是具有弹性的,但是同时存在一些 需要重点注意的点 :尽管 exchange 和 binding 能够在单点失效的问题上幸免于难,但是 queue 和其上持有的 messgage 却不行。这是 因为 queue 及其内容仅仅贮存于单个 node 之上,所以一个 node 的失效将表现为其对应的 queue 的不可用。

You could use an active/passive pair of nodes such that should one node fail, the passive node will be able to come up and take over from the failed node. This can even be combined with clustering. Whilst this approach ensures that failures are quickly detected and recovered from, there can be reasons why the passive node can take a long time to start up, or potentially even fail to start. This can cause at best, temporary unavailability of queues which were located on the failed node.

你可以使用 [ active/passive ] 形式的 node 对,一旦 active node 失效了,passive node 将会启用并从失效 node 处接管工作。这种方式甚至可以同 cluster 应用结合起来使用。尽管该方式可以确保对失效问题的快速检测和恢复,但是仍然会 存在问题 :如 passive node 会花费很长时间才能完全启动起来或者甚至根本启动不起来。这在最坏情况下会导致存在于失效 node 上的 queue 的临时不可用状态。

To solve these various problems, we have developed active/active high availability for queues. This works by allowing queues to be mirrored on other nodes within a RabbitMQ cluster. The result is that should one node of a cluster fail, the queue can automatically switch to one of the mirrors and continue to operate, with no unavailability of service. This solution still requires a RabbitMQ cluster, which means that it will not cope seamlessly with network partitions within the cluster and, for that reason, is not recommended for use across a WAN (though of course, clients can still connect from as near and as far as needed).

为了解决上述各种问题,我们开发了 active/active HA queue 。从原理上讲,是 采用将 queue 镜像到 cluster 中的其他 node 上的方式实现 的。在该实现下,如果 cluster 中的一个 node 失效了,queue 能够自动地切换到镜像 queue 中的一个继续工作以保证服务的可用性。该解决方案仍然要求使用 RabbitMQ cluster ,这也意味着其无法在 cluster 内无缝地处理 network partition 问题。因此,不推荐跨 WAN 使用(虽然如此,客户端当然可以从远端或者近端进行连接)。

Mirrored Queue Behaviour
镜像 queue 的行为

In normal operation, for each mirrored-queue, there is one master and several slaves, each on a different node. The slaves apply the operations that occur to the master in exactly the same order as the master and thus maintain the same state. All actions other than publishes go only to the master, and the master then broadcasts the effect of the actions to the slaves. Thus clients consuming from a mirrored queue are in fact consuming from the master.

在通常的用法中,针对 每一个 mirrored-queue 都包含一个 master 和多个 slave分别对应于不同的 node 。 slave 会准确地按照 master 执行命令的顺序进行命令执行,故 slave 与 master 上维护的状态应该是相同的。 除 publish 外的所有动作都只会向 master 发送,然后再由 master 将命令执行的结果广播给 slave 们 ,故看似从 mirrored queue 中 consuming 的客户端实际上是从 master 上进行的 consuming 。

Should a slave fail, there is little to be done other than some bookkeeping: the master remains the master and no client need take any action or be informed of the failure. Note that slave failures may not be detected immediately and the interruption of the per-connection flow control mechanism can delay message publication. The details are described here.

如果某个 slave 失效 了,系统除了做些许记录外几乎啥都不做:master 仍旧是 master ,客户端不需要采取任何行动,或者被通知 slave 的失效。值得注意的是,slave 的失效可能不会被立即检测出来,并且以每个连接为作用对象的 flow control 机制的中止,将导致 message 被延迟发送。细节在这里描述。

If the master fails, then one of the slaves must be promoted. At this point, the following happens:

如果 master 失效 了,那么 slave 中的一个必须被提升为 master ,在这种情况下,将发生下面的事情:

A slave is promoted to become the new master. The slave chosen for promotion is the eldest slave. As such, it has the best chance of being synchronised with the master. However, note that should there be no slave that is synchronised with the master, messages that only the master held will be lost.

某一个 slave 会被提升为新的 master 。 被选中作为新的 master 的 slave 通常是看哪个 slave 最老(这个也论资排辈!~),因为最老的 slave 与前任 master 之间的同步状态应该是最好的(估计一起吃过不少饭喝过不少酒)。然而,需要注意的是,如果存在没有任何一个 slave 与 master 进行过 [ synchronised ]的情况(新官上任没干多就 master 就挂了),那么前任 master “私有的” message 将会丢失。

The slave considers all previous consumers to have been abruptly disconnected. As such, it requeues all messages that have been delivered to clients but are pending acknowledgement. This can include messages for which a client has issued acknowledgements: either the acknowledgement was lost on the wire before reaching the master, or it was lost during broadcast from the master to the slaves. In either case, the new master has no choice but to requeue all messages it thinks have not been acknowledged.

slave 会认为所有之前有联系的 consumer 都被突然地(粗鲁地)断开了。 在这种情况下,slave (根据上下文补充翻译:被提升为 master 的那个家伙)会 requeue 所有已经投递给客户端但尚未收到 acknowledgement 的 message 。这类 message 中将包括那些客户端已发送过 acknowledgement 进行确认的消息:或者因为 acknowledgement 抵达 master 之前其丢失了,或者因为 acknowledgement 在 master 向 slave 进行广播的时候丢失了。无论是上述哪种情况,新 master 除了将其认为尚未 acknowledged 的消息进行 requeue 外没有更好的处理办法。

Clients that were consuming from the mirrored-queue and support our Consumer Cancellation Notifications extension will receive a notification that their subscription to the mirrored-queue has been abruptly cancelled. At this point they should re-consume from the queue, which will pick up the new master. The reason for sending this notification is that informing clients of the loss of the master is essential: otherwise the client may continue to issue acknowledgements for messages they were sent by the old, failed master, and not expect that they might be about to see the same messages again, this time sent by the new master. Of course, clients that were connected to the failed node will find their connections failed, and will need to reconnect to a surviving node of the cluster.

从 mirrored-queue 处 consume 消息的客户端如果支持我们提供的 [ Consumer Cancellation Notifications ]机制,将可以收到关于他们对 mirrored-queue 的订阅被突然地(粗鲁地)取消掉的通知。 在这种情况下,他们应该对 queue 执行 re-consume ,此时将会从新任 master 处获取消息。发送该通知的原因是:通知客户端“master 已经失效”这个结果是必要的;否则客户端可能继续对之前过时的、已经失效的前任 master 发送确认消息,并期望能够再次从前任 master 处收 message (此处为本人按照自己的理解进行解读),然后之后的 message 将是从新的 master 处发来。可以确定的是,连接到失效 node 上的客户端必然会发现之前的连接已经断开,之后也必然需要重新连接到 cluster 中的其他存活的 node 上。

As a result of the requeuing, clients that re-consume from the queue must be aware that they are likely to subsequently receive messages that they have seen previously.

作为 requeue 的结果,从 queue 中 re-consume 的客户端必须意识到自己非常可能在随后的交互过程中收到自己之前已经收到过的消息。

As the chosen slave becomes the master, no messages that are published to the mirrored-queue during this time will be lost: messages published to a mirrored-queue are always published directly to the master and all slaves. Thus should the master fail, the messages continue to be sent to the slaves and will be added to the queue once the promotion of a slave to the master completes.

一旦完成了选中的 slave 被 提升成 master 的动作,发送到 mirrored-queue 的所有 message 将不会再丢失:publish 到 mirrored-queue 的所有消息总是被直接 publish 到 master 和所有的 slave 上。这样一旦 master 失效了,message 仍然可以继续发送到其他 slave 上,并且在新 slave 被提升为 master 之后,将这些 message 添加到(该 master 所在的) queue 中。

Similarly, messages published by clients using publisher confirms will still be confirmed correctly even if the master (or any slaves) fail between the message being published and the message being able to be confirmed to the publisher. Thus from the point of view of the publisher, publishing to a mirrored-queue is no different from publishing to any other sort of queue. It is only consumers that need to be aware of the possibility of needing to re-consume from a mirrored-queue upon receipt of a Consumer Cancellation Notification.

同样地, 如果客户端使用了 [ publisher confirm ] 机制,即使“在 message 被 publish 后和 message 被确认前”的期间,出现 master(或者任何 slave)失效的情况,其所 publish 的 message 仍旧可以被正确无误地确认。 故从 publisher 的角度来看,将消息 publish 到 mirrored-queue 与 publish 到任何种类的 queue 中没有任何差别。只有 consumer 需要意识到当收到 [ Consumer Cancellation Notification ] 时,自己可能需要再次从 mirrored-queue 中 re-consume 。

If you are consuming from a mirrored-queue with noAck=true (i.e. the client is not sending message acknowledgements) then messages can be lost. This is no different from the norm of course: the broker considers a message acknowledged as soon as it has been sent to a noAck=true consumer, and should the client disconnect abruptly, the message may never be received. In the case of a mirrored-queue, should the master die, messages that are in-flight on their way to noAck=true consumers may never be received by those clients, and will not be requeued by the new master. Because of the possibility that the consuming client is connected to a node that survives, the Consumer Cancellation Notification is useful in identifying when such events may have occurred. Of course, in practise, if you care about not losing messages then you are advised to consume with noAck=false.

如果你使用 noAck=true 属性从 mirrored-queue 中 consume message(即客户端不发送 message 确认),则消息存在丢失的可能。这个和标准情形没有任何差别: broker 认为 message 一旦向具有 noAck=true 属性的 consumer 执行了发送行为,broker 就认为该消息已经被确认了。 此时如果客户端突然地断开了,message 将会丢失(假设客户端此时尚未收到该 message)。在采用 mirrored-queue 的情况下,如果 master 失效了,那些仍处于发送给具有 noAck=true 属性的 consumer 路上的 message 将不会被这些客户端接收到,并且不会被新任 master 执行 requeue 操作。因为有可能处于 consuming 状态的客户端是与存活着的 node 连接着的,此时可以采用 [ Consumer Cancellation Notification ] 机制在此类事件发生时用于进行相关处理。当然,在实际中,如果你比较关心丢失 message 的问题,则建议你在 consume 时使用 noAck=false 。

Publisher Confirms and Transactions
发布者确认机制和事务机制

Mirrored queues support both Publisher Confirms and Transactions. The semantics chosen are that in the case of both confirms and transactions, the action spans all mirrors of the queue. So in the case of a transaction, a tx.commit-ok will only be returned to a client when the transaction has been applied across all mirrors of the queue. Equally, in the case of publisher confirms, a message will only be confirmed to the publisher when it has been accepted by all of the mirrors. It is correct to think of the semantics as being the same as a message being routed to multiple normal queues, and of a transaction with publications within that similarly are routed to multiple queues.

Mirrored queue 同时支持 [Publisher Confirm] 和 [Transaction] 两种机制。在两种机制中进行选择的依据是,其在 queue 的全部镜像中产生波及的范围。在 Transaction 机制中,只有在当前事务在全部镜像 queue 中执行后,客户端才会在收到 tx.commit-ok 消息。同样地,在 publisher confirm 机制中,向 publisher 进行当前 message 确认的前提是该 message 被全部镜像所 accept 了。你可以按照如下语义对上述机制进行理解:即 message 被路由到多个普通的 queue 中,更进一步,在带有 publish 的事务中也同样会路由到多个 queue 中。

Flow Control
流控

RabbitMQ uses a credit-based algorithm to limit the rate of message publication. Publishers are permitted to publish when they receive credit from all mirrors of a queue. Credit in this context means permission to publish. Slaves that fail to issue credit can cause publishers to stall. Publishers will remain stalled until all slaves issue credit or until the remaining nodes consider the slave to be disconnected from the cluster. Erlang detects such disconnections by periodically sending a tick to all nodes. The tick interval can be controlled with the net_ticktime configuration setting.

RabbitMQ 使用了一种基于 credit 的算法来 限制 message 被 publish 的速率 。Publisher 只有在其从某个 queue 的 全部镜像处收到 credit 之后才被允许继续 publish 。在这个上下文中,Credit 意味着对 publish 行为的允许。如果存在没能成功发送出 credit 的 slaves ,则将导致 publisher 停止 publish 动作。Publisher 会一直保持停止的状态,直到所有 slave 都成功发送了 credit 或者直到剩余的 node 都认为某 slave 已经从 cluster 中断开了。Erlang 会周期性地发送 tick 到所有的 node 上来检测是否出现连接断开。 tick 的间隔时间可以通过配置 net_ticktime 的值来控制。

Unsynchronised Slaves
非同步 Slave 

A node may join a cluster at any time. Depending on the configuration of a queue, when a node joins a cluster, queues may add a slave on the new node. At this point, the new slave will be empty: it will not contain any existing contents of the queue, and currently, there is no synchronisation protocol. Such a slave will receive new messages published to the queue, and thus over time will accurately represent the tail of the mirrored-queue. As messages are drained from the mirrored-queue, the size of the head of the queue for which the new slave is missing messages, will shrink until eventually the slave's contents precisely match the master's contents. At this point, the slave can be considered fully synchronised, but it is important to note that this has occured because of actions of clients in terms of draining the pre-existing head of the queue.

一个 node 可以在任意时刻加入到一个 cluster 中 。按照 queue 的自身配置信息,当一个 node 加入到一个 cluster 中时,可能将当前 node 设置成 slave 。如果是这样,新增 slave 上(的 queue)将会是空的:其不会包含任何当前(cluster 中) queue 上业已存在的内容,且当前也没有任何同步协议可用。新增 slave 将可以收到 publish 到其对应 queue 的新 message ,并且在一定的运行时间后,(其 queue 中的内容)将可以准确呈现(当前 cluster 中的) mirrored-queue 的尾部 message 的“面貌”。随着(当前 cluster 中) mirrored-queue 上的 message 被逐渐 consume ,新增 slave 之前“错失”的 message 数量(以 queue 头上的 size 表示 - 即 message 的多少)将会逐步缩减,直到 slave 的内容与 master 的内容 最终 完全变成一致。此时,我们可以认为 slave 已经处于完全同步状态了,需要注意的是,上述同步行为的产生是基于客户端的动作触发,即其会逐步消耗光 queue 中业已存在的 message 。(这段译文实在是费了我很多心思,前后修改了3遍,最后译稿自己算是比较满意了,嘿嘿)

Thus a newly added slave provides no additional form of redundancy or availability of the queue's contents until the contents of the queue that existed before the slave was added have been removed. As a result of this, it is preferable to bring up all nodes on which slaves will exist prior to creating mirrored queues, or even better to ensure that your use of messaging generally results in very short or empty queues that rapidly drain. or the queue has been explicitly synchronised (see below). Since the queue becomes unresponsive while synchronisation is occurring, it is preferable to allow active queues from which messages are being drained to synchronise naturally, and only explicitly synchronise inactive queues.

故新增 slave 并没有为提高 queue 内容的冗余性或可用性提供额外的好处,只有在新增 slave 前,就已将存在于 queue 中的内容移除的情况下才能产生确实的好处。鉴于这个原因,推荐你最好在创建 mirrored queue 前先设置相应的 node 为 slave ,或者更进一步,你可以确保你进行 message 相关操作时,只会导致产生“存活期非常短或者空的 queue”,因为这种 queue 中的 message 会很快被 consume 光。 或者将该 queue 显式进行同步(参考下面)。既然该 queue 会在同步进行的时候无任何反应,最好能够让被消费内容的 active queue 进行自然地同步动作,而仅仅显式同步 inactive queue 。

You can determine which slaves have synchronised with the following rabbitmqctl invocation, or through the management plugin:

你可以通过如下 rabbitmqctl 命令或者管理插件来确定哪些 slave 已经进行了同步:

rabbitmqctl list_queues name slave_pids synchronised_slave_pids

Explicit synchronisation can be triggered in two ways: manually or automatically. If a queue is set to automatically synchronise it will synchronise whenever a new slave joins - becoming unresponsive until it has done so.

显式同步可以使用两种方式触发:手动方式或者自动方式。如果一个 queue 设置为自动同步,则只要有新 slave 加入到当前 cluster ,其就会进行同步 -- 并且会保持无任何反应的状态直到同步的完成。

Starting and Stopping Nodes
启动和停止 node

If you stop a RabbitMQ node which contains the master of a mirrored-queue, some slave on some other node will be promoted to the master (assuming there is one). If you continue to stop nodes then you will reach a point where a mirrored-queue has no more slaves: it exists only on one node, which is now its master. If the mirrored-queue was declared durable then, if its last remaining node is shutdown, durable messages in the queue will survive the restart of that node. In general, as you restart other nodes, if they were previously part of a mirrored-queue then they will rejoin the mirrored queue.

如果你停止了 mirrored-queue 中具有 master 行为的 RabbitMQ node ,那么将会发生某个作为 slave 的 node 被提升为 master 的情况(假定确实存在一个这样的 slave)。如果你继续停止(具有 master 行为的)node ,你最终会面临 mirrored-queue 中没有任何 slave 的情况:即只存在一个 node ,且其为 master 。 在 mirrored-queue 被声明为持久的情况下 ,如果其所包含的最后一个可用 node( 需要注意:此时该 node 已经成为了 master )被停止,那么位于该 queue 中的持久化 message 将在该 node 重启后得到恢复。通常来说, 当你重启一些 node - 如果这些 node 当初 mirrored-queue 的一部分 - 那么这些 node 将会在重启后重新加入到该 mirrored-queue 中。

However, there is currently no way for a slave to know whether or not its queue contents have diverged from the master to which it is rejoining (this could happen during a network partition, for example). As such, when a slave rejoins a mirrored-queue, it throws away any durable local contents it already has and starts empty. Its behaviour is at this point the same as if it were a new node joining the cluster.

然而, 当前没有任何方式可以让重新加入到 mirrored-queue 中的 slave 确认是否自身拥有的 queue 的内容与 master 的不同 (例如,可能出现在 network partition 的情况中)。所以,当一个 slave 重新加入到 mirrored-queue 中时,它将果断抛弃任何自身之前拥有的本地的持久化内容,并以空( queue )的状态启动。该 slave 的行为从某种意义上来说像是一个新加入到 cluster 中的 node

Configuring Mirroring
配置镜像

Queues have mirroring enabled via policy. Policies can change at any time; it is valid to create a non-mirrored queue, and then make it mirrored at some later point (and vice versa). There is a difference between a non-mirrored queue and a mirrored queue which does not have any slaves - the former lacks the extra mirroring infrastructure and will run faster.

queue 可以通过[ policy ]对镜像功能进行控制。任何时候策略都是可以改变的;你可以首先创建一个 non-mirrored  queue ,然后在之后的某一个时候将其再变成镜像的(或者相反操作)。在  non-mirrored queue 和不包含任何 slave 的镜像 queue 之间存在一点差别 - 前者因为不需要使用 额外 支持镜像功能的基础组件,故可以运行的更快。

You should be aware of the behaviour of adding mirrors to a queue.

(相对来讲)你更应该关注为 queue 添加镜像的行为

To cause queues to become mirrored, you should create a policy which matches them and sets policy keys ha-mode and (optionally) ha-params. The following table explains the options for these keys:

为了使 queue 具有镜像功能,你需要设计一个可以达成你想要功能的策略,并对 key 值 ha-mode 和(可选)ha-params 进行 相应 设置 。下表中解释了这些 key 值的含义:

ha-mode   ha-params Result
all (absent)   Queue is mirrored across all nodes in the cluster.When a new node is added to the cluster, the queue will be mirrored to that node.
exactly count Queue is mirrored to count nodes in the cluster.If there are less than count nodes in the cluster, the queue is mirrored to all nodes.If there are more than count nodes in the cluster, and a node containing a mirror goes down, then a new mirror will not be created on another node. (This is to prevent queues migrating across a cluster as it is brought down.)
nodes node names Queue is mirrored to the nodes listed in node names.If any of those node names are not a part of the cluster, this does not constitute an error.If none of the nodes in the list are online at the time when the queue is declared then the queue will be created on the node that the declaring client is connected to.

 

ha-mode ha-params Result
all   (absent) Queue 在 cluster 中被镜像到全部 node 上。当新 node 被添加到 cluster 中时,queue 将会自动被镜像到该 node 。
exactly count Queue 在 cluster 中被镜像到数目为 count 个 node 上。如果 cluster 中的 node 数目少于 count 值,当前 queue 会被镜像到全部 node 上。如果 cluster 中的 node 数目多于 count 值, 并且含有镜像 queue 的一个 node 失效了,那么并不会在另外一个 node 上创建一个新的镜像。(这时为了防止由于 node 失效引起的 queue 在 cluster 中的迁移行为)
nodes node names Queue 被镜像到通过参数 node names 指定的 node 上。如果给出的 node names 均不属于当前的 cluster ,这种情况并不构成错误。如果在 queue 被客户端声明时,参数 node names 指定的 node 均不在线,那么该 queue 将会在当前客户端所连接的那个 node 上被创建。

Whenever the HA policy for a queue changes it will endeavour to keep its existing mirrors as far as this fits with the new policy.

无论何时用于 queue 的 HA 策略发生了改变,原则上会尽力保持已存在于 queue 中的镜像不变,除非新策略适用于当前 queue 。

"nodes" policy and migrating masters
"nodes" 策略以及 master 迁移问题

Note that setting or modifying a "nodes" policy will never cause an existing master to go away, even if you ask it to. For example, if a queue is on [A], and you give it a nodes policy telling it to be on [B C], it will end up on [A B C]. If node A then fails or is shut down, the mirror on that node will not come back and the queue will stay on [B C].

值得注意的是, 设置或者修改 "nodes" 策略将不会导致业已存在的 master 的迁移 ,即使你想要这种结果。例如,如果某个(镜像) queue 位于 [A] 上,然后你通过设置 nodes 策略打算使该 queue 出现在 [B C] 之上,最终得到的结果会是 [A B C] 上都有该 queue (的镜像)。如果此时 node A 失效或者被关闭,则 node A 上的镜像 queue 将不会再恢复,而只(剩)存在于 [B C] 之上(的镜像 queue)。

Note that setting or modifying a "nodes" policy can cause the existing master to go away if it is not listed in the new policy. In order to prevent message loss, RabbitMQ will keep the existing master around until at least one other slave has synchronised (even if this is a long time). However, once synchronisation has occured things will proceed just as if the node had failed: consumers will be disconnected from the master and will need to reconnect.

值得注意的是,设置或者修改 "nodes" policy 的值可能导致已存在的 master 消失,前提条件为该 master 未出现在新 policy 的列表中。为了阻止出现 message 丢失的情况,RabbitMQ 将维持已存在 master 的生存状况,直到至少有一个 slave 与其完成同步(即使这个同步过程要花费很多时间)。然而,一旦同步成功了,就会像前面所说的那样,该 master 将会失效:consumer 将会与该 master 断开连接,之后需要执行重连动作。

For example, if a queue is on [A B] (with A the master), and you give it a nodes policy telling it to be on[C D], it will initially end up on [A C D]. As soon as the queue synchronises on its new mirrors [C D], the master on A will shut down.

例如,如果一个 queue 位于 [A B] mirrored-queue cluster 中(其中 A 为 master),然后你设置了 policy 为 nodes 以使该 queue 将镜像转移到 [C D] 上面。第一步将完成 [A C D] 的转变。一旦 queue 完成了其与 [C D] 的镜像同步动作,之前作为 master 的 A 将会令自身失效。

Synchronising Queues
同步 queue

Queues can be set to automatically synchronise by setting the ha-sync-mode policy key to automatic. ha-sync-mode can also be set to manual. If it is not set then manual is assumed.

可以通过设置 ha-sync-mode policy 的值为 automatic 以令 queue 具有自动同步功能。同样,可以设置 ha-sync-mode 的值为 manual 以令其具有手动同步功能。如果没有设置该值则默认为 manual 。

You can determine which slaves are synchronised with the following rabbitmqctl invocation:

你可以使用下面的 rabbitmqctl 命令来查看哪些 slaves 已经完成同步:

rabbitmqctl list_queues name slave_pids synchronised_slave_pids

You can manually synchronise a queue with:

你可以通过手动同步方式同步一个 queue :

rabbitmqctl sync_queue name

And you can cancel synchronisation with:

同样,你可以取消某个 queue 的同步同能:

rabbitmqctl cancel_sync_queue name

These features are also available through the management plugin. 

这些特性同样可以通过 management 插件来设置。

Some examples
一些例子

Policy where queues whose names begin with "ha." are mirrored to all nodes in the cluster:

【策略】名字以 "ha." 开头的 queue 被镜像到 cluster 中的全部 node 上:

rabbitmqctl 方式 --   rabbitmqctl set_policy ha-all "^ha\." '{"ha-mode":"all"}'
HTTP API 方式    --   PUT /api/parameters/policy/%2f/ha-all {"pattern":"^ha\.", "definition":{"ha-mode":"all"}}
Web UI 方式       --  

首先 Navigate 到 Admin > Policies > Add / update a policy 。
其次在 Name 栏设置 "ha-all" ,在 Pattern 栏设置 "^ha\.",并在 Policy 栏首行上设置 "ha-mode" = "all" 。
最后点击 Add policy 。

Policy where queues whose names begin with "two." are mirrored to any two nodes in the cluster:

【策略】名字以 "two." 开头的 queue 被镜像到 cluster 中的任意两个 node 上:

rabbitmqctl 方式 --   rabbitmqctl set_policy ha-two "^two\." '{"ha-mode":"exactly","ha-params":2}'
HTTP API 方式    --   PUT /api/parameters/policy/%2f/ha-two {"pattern":"^two\.", "definition":{"ha-mode":"exactly", "ha-params":2}}
Web UI 方式        --

首先 Navigate 到 Admin > Policies > Add / update a policy 。
其次在 Name 栏设置 "ha-two" ,在 Pattern 栏设置 "^two\.",并在 Policy 栏首行上设置 "ha-mode" = "exactly",第二行上设置 "ha-params" = 2 并且设置第二行的类型为 "Number" 。
最后点击 Add policy 。

============== 我是分割线 ===============


在  Release: RabbitMQ 3.0.0 中 对 Mirrored-Queue 做了如下改变:

24908 allow queue mirroring to be defined by broker-wide policy, not
      queue declaration, and add "exactly" mode

       所以从 3.0.0 版本开始,将不能再通过使用 Queue.Declare 中 arguments 的方式来声明 Mirrored-Queue 。而改为采用 policy 的方式设置 broker-wide 的策略。


============== 我是分割线 ===============

查看 cluster 中已存在的 queue 的配置信息

[root@Betty ~]# rabbitmqctl list_queues name durable auto_delete arguments policy pid slave_pids status
Listing queues ...
test_queue      false   false   []              <rabbit@Betty.1.724.0>          running
...done.
[root@Betty ~]#

设置 policy 令以 test 作为 queue 名开头 queue 在 cluster 中全部 node 上镜像。

[root@Betty ~]# rabbitmqctl set_policy ha-all "^test" '{"ha-mode":"all"}'
Setting policy "ha-all" for pattern "^test" to "{\"ha-mode\":\"all\"}" ...
...done.
[root@Betty ~]#

查看通过设置 policy 构造镜像后 cluster 中的 queue 的配置信息,此时已经生成镜像队列

[root@Betty ~]# rabbitmqctl list_queues name durable auto_delete arguments policy pid slave_pids status
Listing queues ...
test_queue      false   false   []      ha-all  <rabbit@Betty.1.724.0>  [<rabbit2@Betty.1.891.0>, <rabbit1@Betty.1.941.0>]      running
...done.
[root@Betty ~]#

查看 cluster 中各个 node 上所展现的 queue 的信息

[root@Betty ~]# rabbitmqctl -n rabbit2 list_queues name durable auto_delete policy pid slave_pids status
Listing queues ...
haha_queue      false   false           <rabbit@Betty.1.861.0>          running
test_queue      false   false   ha-all  <rabbit@Betty.1.724.0>  [<rabbit2@Betty.1.891.0>, <rabbit1@Betty.1.941.0>]      running
...done.
[root@Betty ~]# rabbitmqctl -n rabbit1 list_queues name durable auto_delete policy pid slave_pids status
Listing queues ...
haha_queue      false   false           <rabbit@Betty.1.861.0>          running
test_queue      false   false   ha-all  <rabbit@Betty.1.724.0>  [<rabbit2@Betty.1.891.0>, <rabbit1@Betty.1.941.0>]      running
...done.
[root@Betty ~]# rabbitmqctl -n rabbit list_queues name durable auto_delete policy pid slave_pids status                      
Listing queues ...
haha_queue      false   false           <rabbit@Betty.1.861.0>          running
test_queue      false   false   ha-all  <rabbit@Betty.1.724.0>  [<rabbit2@Betty.1.891.0>, <rabbit1@Betty.1.941.0>]      running
...done.
[root@Betty ~]#


 

标签: RabbitMQ HA 高可用
共有 人打赏支持
粉丝 360
博文 352
码字总数 952596
评论 (1)
古月楼
霸气
×
摩云飞
如果觉得我的文章对您有用,请随意打赏。您的支持将鼓励我继续创作!
* 金额(元)
¥1 ¥5 ¥10 ¥20 其他金额
打赏人
留言
* 支付类型
微信扫码支付
打赏金额:
已支付成功
打赏金额: