文档章节

HA的一个测试

s
 start0cheng
发布于 2015/02/09 23:11
字数 2012
阅读 10
收藏 1

 

之前配置完所,断心跳网卡后,应用不会切,一度以为是自己的配置有问题。但发现将vnet3切换成与网卡直接桥接,问题就解决了。这极有可能是因为vnet3两节点间,发送包有些问题。


  
  1. 前提部署:  
  2. 1、环境配置  
  3. 2、主机名,yum,ssh  
  4.  
  5. 1、安装heartbeat.  
  6. #yum install -y heartbeat*     #要执行两次哦,不然会发现有的包居然没有装上。  
  7.  
  8. # rpm -qa | grep heartbeat*  
  9. heartbeat-gui-2.1.3-3.el5.centos  
  10. heartbeat-2.1.3-3.el5.centos  
  11. heartbeat-stonith-2.1.3-3.el5.centos  
  12. heartbeat-devel-2.1.3-3.el5.centos  
  13. heartbeat-ldirectord-2.1.3-3.el5.centos          
  14. heartbeat-pils-2.1.3-3.el5.centos  
  15.  
  16. 复制相关的配置文件:  
  17. # cp /usr/share/doc/heartbeat-2.1.3/ha.cf /etc/ha.d/     #ha.cf HA的配置文件  
  18. # cp /usr/share/doc/heartbeat-2.1.3/haresources /etc/ha.d/  #haresources 资源文件  
  19. # cp /usr/share/doc/heartbeat-2.1.3/authkeys /etc/ha.d/   #HA节点间的验证文件  
  20.  
  21. # yum install -y httpd  
  22.  
  23. # vim /etc/ha.d/ha.cf  
  24. debugfile /var/log/ha-debug  
  25. logfile /var/log/ha-log  
  26. logfacility     local0  
  27. keepalive 2  
  28. deadtime 30  
  29. warntime 10  
  30. initdead 120  
  31. udpport 694  
  32. ucast eth1 1.1.1.2         #心跳  
  33. auto_failback on  
  34. node    ha1  
  35. node    ha2  
  36. ping 172.16.1.1  172.16.1.11 #网关与另一个节点IP  
  37. respawn hacluster /usr/lib/heartbeat/ipfail  
  38. deadping 30  
  39. apiauth ipfail uid=hacluster 
  40. use_logd yes  
  41. conn_logd_time 60  
  42.  
  43. #cat authkeys         #定义认证的keys  
  44. auth 1  
  45. crc 
  46. ================  
  47. heartbeat[8404]: 2011/07/26_05:02:48 ERROR: Bad permissions on keyfile [/etc/ha.d/authkeys], 600 recommended.  
  48. heartbeat[8404]: 2011/07/26_05:02:48 ERROR: Authentication configuration error.  
  49. heartbeat[8404]: 2011/07/26_05:02:48 ERROR: Configuration error, heartbeat not started.  
  50.  
  51. # chmod 600 /etc/ha.d/authkeys 
  52. =================  
  53. # cat /etc/ha.d/haresources       #配置HA资源  
  54. ha1     IPaddr::172.16.1.100/24/eth0:0 httpd  
  55.  
  56. # /etc/init.d/heartbeat start  
  57. logd is already running  
  58. Starting High-Availability services:   
  59. 2011/07/26_05:05:15 INFO:  Resource is stopped  
  60. [  OK  ]  
  61.  
  62. #ha1与ha2之间的配置,不同的就是ucast 值与 被ping的IP。
  63. #++++++++++++++++++++++++++++++++++++++++++++++++++++++
  64. #
  65. #++++++++++++++++++++++++++++++++++++++++++++++++++++++
  66. 以下为断开心跳线,以及重新插入心跳线的过程日志:
  67. #断开一方的心跳
    heartbeat[7043]: 2011/07/26_13:53:40 WARN: node ha2.example.com: is dead
    heartbeat[7043]: 2011/07/26_13:53:40 info: Dead node ha2.example.com gave up resources.
    heartbeat[7043]: 2011/07/26_13:53:40 info: Link ha2.example.com:eth1 dead.  
    ipfail[7069]: 2011/07/26_13:53:40 info: Status update: Node ha2.example.com now has status dead
    ipfail[7069]: 2011/07/26_13:53:42 info: NS: We are still alive!
    ipfail[7069]: 2011/07/26_13:53:42 info: Link Status update: Link ha2.example.com/eth1 now has status dead
    ipfail[7069]: 2011/07/26_13:53:44 info: Asking other side for ping node count.
    ipfail[7069]: 2011/07/26_13:53:44 info: Checking remote count of ping nodes.
  68. 这个时候,请使用ip addr观察双方的IP地址,会发现VIP 地址出现在两台机器上。脑裂了!

  69. #第二个节点又活了
    heartbeat[7043]: 2011/07/26_13:56:09 CRIT: Cluster node ha2.example.com returning after partition.
    heartbeat[7043]: 2011/07/26_13:56:09 info: For information on cluster partitions, See URL:
    http://linux-ha.org/SplitBrain
    heartbeat[7043]: 2011/07/26_13:56:09 WARN: Deadtime value may be too small.
    heartbeat[7043]: 2011/07/26_13:56:09 info: See FAQ for information on tuning deadtime.
    heartbeat[7043]: 2011/07/26_13:56:09 info: URL:
    http://linux-ha.org/FAQ#heavy_load

    heartbeat[7043]: 2011/07/26_13:56:09 info: Link ha2.example.com:eth1 up.
  70. heartbeat[7043]: 2011/07/26_13:56:09 WARN: Late heartbeat: Node ha2.example.com: interval 104930 ms
    ipfail[7069]: 2011/07/26_13:56:09 info: Link Status update: Link ha2.example.com/eth1 now has status up
    heartbeat[7043]: 2011/07/26_13:56:09 info: Status update for node ha2.example.com: status active
    ipfail[7069]: 2011/07/26_13:56:09 info: Status update: Node ha2.example.com now has status active
    harc[7916]:     2011/07/26_13:56:09 info: Running /etc/ha.d/rc.d/status status
    heartbeat[7043]: 2011/07/26_13:56:12 info: Heartbeat shutdown in progress. (7043)
    #发现节点2的心跳网卡又活了,heartbeat重启了。
  71. heartbeat[7932]: 2011/07/26_13:56:13 info: Giving up all HA resources.
    ResourceManager[7945]:  2011/07/26_13:56:13 info: Releasing resource group: ha1.example.com IPaddr::172.16.1.100/24/eth0:0 httpd
    ResourceManager[7945]:  2011/07/26_13:56:13 info: Running /etc/init.d/httpd  stop
    #资源管理器关闭了之前的应用
  72. ResourceManager[7945]:  2011/07/26_13:56:13 info: Running /etc/ha.d/resource.d/IPaddr 172.16.1.100/24/eth0:0 stop
    IPaddr[8037]:   2011/07/26_13:56:13 INFO: ifconfig eth0:0 down
    IPaddr[8008]:   2011/07/26_13:56:13 INFO:  Success
    #相应的VIP也关了
  73. ResourceManager[8067]:  2011/07/26_13:56:13 info: Releasing resource group: ha2.example.com IPaddr::172.16.1.101/24/eth0:1 vsftpd
    #释放原属于ha2.example.com的ftp服务
  74. ResourceManager[8067]:  2011/07/26_13:56:13 info: Running /etc/init.d/vsftpd  stop
    ResourceManager[8067]:  2011/07/26_13:56:14 info: Running /etc/ha.d/resource.d/IPaddr 172.16.1.101/24/eth0:1 stop
    IPaddr[8161]:   2011/07/26_13:56:14 INFO: ifconfig eth0:1 down
    #停服务,停网卡。
  75. IPaddr[8132]:   2011/07/26_13:56:14 INFO:  Success
    heartbeat[7932]: 2011/07/26_13:56:14 info: All HA resources relinquished.
    heartbeat[7043]: 2011/07/26_13:56:16 info: killing /usr/lib/heartbeat/ipfail process group 7069 with signal 15
    heartbeat[7043]: 2011/07/26_13:56:17 info: Received shutdown notice from 'ha2.example.com'.
    heartbeat[7043]: 2011/07/26_13:56:17 info: Resource takeover cancelled - shutdown in progress.
    heartbeat[7043]: 2011/07/26_13:56:19 info: killing HBFIFO process 7045 with signal 15
    heartbeat[7043]: 2011/07/26_13:56:19 info: killing HBWRITE process 7046 with signal 15
    heartbeat[7043]: 2011/07/26_13:56:19 info: killing HBREAD process 7047 with signal 15
    heartbeat[7043]: 2011/07/26_13:56:19 info: killing HBWRITE process 7048 with signal 15
    heartbeat[7043]: 2011/07/26_13:56:19 info: killing HBREAD process 7049 with signal 15
    heartbeat[7043]: 2011/07/26_13:56:19 info: Core process 7049 exited. 5 remaining
    heartbeat[7043]: 2011/07/26_13:56:19 info: Core process 7047 exited. 4 remaining
    heartbeat[7043]: 2011/07/26_13:56:19 info: Core process 7046 exited. 3 remaining
    heartbeat[7043]: 2011/07/26_13:56:19 info: Core process 7048 exited. 2 remaining
    heartbeat[7043]: 2011/07/26_13:56:19 info: Core process 7045 exited. 1 remaining
    heartbeat[7043]: 2011/07/26_13:56:19 info: ha1.example.com Heartbeat shutdown complete.
    #关了heartbeat服务
  76. heartbeat[7043]: 2011/07/26_13:56:19 info: Heartbeat restart triggered.
    heartbeat[7043]: 2011/07/26_13:56:19 info: Restarting heartbeat.
    heartbeat[7043]: 2011/07/26_13:56:19 info: Performing heartbeat restart exec.
    heartbeat[7043]: 2011/07/26_13:56:30 info: Version 2 support: false
    heartbeat[7043]: 2011/07/26_13:56:30 WARN: Logging daemon is disabled --enabling logging daemon is recommended
    heartbeat[7043]: 2011/07/26_13:56:30 info: **************************
    heartbeat[7043]: 2011/07/26_13:56:30 info: Configuration validated. Starting heartbeat 2.1.3
    heartbeat[8191]: 2011/07/26_13:56:30 info: heartbeat: version 2.1.3
    heartbeat[8191]: 2011/07/26_13:56:30 info: Heartbeat generation: 1311635912
    heartbeat[8191]: 2011/07/26_13:56:30 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1
    heartbeat[8191]: 2011/07/26_13:56:30 info: glib: ucast: bound send socket to device: eth1
    heartbeat[8191]: 2011/07/26_13:56:30 info: glib: ucast: bound receive socket to device: eth1
    heartbeat[8191]: 2011/07/26_13:56:30 info: glib: ucast: started on port 694 interface eth1 to 10.1.1.2
    heartbeat[8191]: 2011/07/26_13:56:30 info: glib: ping group heartbeat started.
  77. heartbeat[8191]: 2011/07/26_13:56:30 info: G_main_add_TriggerHandler: Added signal manual handler
    heartbeat[8191]: 2011/07/26_13:56:30 info: G_main_add_TriggerHandler: Added signal manual handler
    heartbeat[8191]: 2011/07/26_13:56:30 info: G_main_add_SignalHandler: Added signal handler for signal 17
    heartbeat[8191]: 2011/07/26_13:56:30 info: Local status now set to: 'up'
    heartbeat[8191]: 2011/07/26_13:56:32 info: Link group1:group1 up.
    heartbeat[8191]: 2011/07/26_13:56:32 info: Status update for node group1: status ping
    heartbeat[8191]: 2011/07/26_13:56:33 info: Link ha2.example.com:eth1 up.
    heartbeat[8191]: 2011/07/26_13:56:33 info: Status update for node ha2.example.com: status up
  78. harc[8199]:     2011/07/26_13:56:33 info: Running /etc/ha.d/rc.d/status status
    heartbeat[8191]: 2011/07/26_13:56:33 info: Comm_now_up(): updating status to active
    heartbeat[8191]: 2011/07/26_13:56:33 info: Local status now set to: 'active'
    heartbeat[8191]: 2011/07/26_13:56:33 info: Starting child client "/usr/lib/heartbeat/ipfail" (498,496)
    heartbeat[8216]: 2011/07/26_13:56:33 info: Starting "/usr/lib/heartbeat/ipfail" as uid 498  gid 496 (pid 8216)
    heartbeat[8191]: 2011/07/26_13:56:34 info: Status update for node ha2.example.com: status active
    harc[8219]:     2011/07/26_13:56:34 info: Running /etc/ha.d/rc.d/status status
    ipfail[8216]: 2011/07/26_13:56:40 info: Status update: Node ha2.example.com now has status active
    #检查另一个节点的状态
    ipfail[8216]: 2011/07/26_13:56:43 info: Asking other side for ping node count.
    ipfail[8216]: 2011/07/26_13:56:46 info: No giveup timer to abort.
    heartbeat[8191]: 2011/07/26_13:56:50 info: local resource transition completed.
    heartbeat[8191]: 2011/07/26_13:56:50 info: Initial resource acquisition complete (T_RESOURCES(us))
    heartbeat[8191]: 2011/07/26_13:56:50 info: remote resource transition completed.
    IPaddr[8271]:   2011/07/26_13:56:51 INFO:  Resource is stopped
    heartbeat[8235]: 2011/07/26_13:56:51 info: Local Resource acquisition completed.
    harc[8324]:     2011/07/26_13:56:51 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
    ip-request-resp[8324]:  2011/07/26_13:56:51 received ip-request-resp IPaddr::172.16.1.100/24/eth0:0 OK yes
    ResourceManager[8345]:  2011/07/26_13:56:51 info: Acquiring resource group: ha1.example.com IPaddr::172.16.1.100/24/eth0:0 httpd
    IPaddr[8372]:   2011/07/26_13:56:52 INFO:  Resource is stopped
    #获得资源信息
    ResourceManager[8345]:  2011/07/26_13:56:53 info: Running /etc/ha.d/resource.d/IPaddr 172.16.1.100/24/eth0:0 start
    IPaddr[8470]:   2011/07/26_13:56:54 INFO: Using calculated netmask for 172.16.1.100: 255.255.255.0
    IPaddr[8470]:   2011/07/26_13:56:54 INFO: eval ifconfig eth0:0 172.16.1.100 netmask 255.255.255.0 broadcast 172.16.1.255
    IPaddr[8441]:   2011/07/26_13:56:54 INFO:  Success
    #取得VIP及ip地址
    ResourceManager[8345]:  2011/07/26_13:56:54 info: Running /etc/init.d/httpd  start
  79. 服务正常了! 该日志为完整日志!

双心跳及HA个人理解综合 http://myhat.blog.51cto.com/391263/623546

本文出自 “潜入技术的海洋” 博客,请务必保留此出处http://myhat.blog.51cto.com/391263/623559

本文转载自:http://myhat.blog.51cto.com/391263/623559

s
粉丝 3
博文 260
码字总数 0
作品 0
广州
私信 提问
加载中

评论(0)

keepalived+haproxy双主高可用负载均衡

一、keepalived和haproxy 1、keepalived Keepalived的作用是检测服务器的健康状态,在所有可能出现单点故障的地方为其提供高可用。如果有一台服务器死机,或工作出现故障,Keepalived将检测到...

nmshuishui
2014/05/02
0
0
CloudStack4.2.1/RHEL6.3 KVM HA高可用性测试

一、前言: 据官方称Cloudstack的HA(高可用)功能在4.2.1 SP3中已经修复一些bug,遂测试其可用性。 CloudStack的HA功能分为VM的HA和Host的HA 比较: 基于VM和基于HOST的高可用实际是为了尽可...

tangwenjun
2014/03/24
0
0
Apache+heartbeat HA方案

Apache+heartbeat HA方案 主从热备,当master 宕机自动切换到slave提供服务,直到master恢复后自动切换到master提供服务。 1、系统环境 Master: ha01 eth0:192.168.58.101 eth1:192.168.107....

twtcom001
2009/02/14
0
0
Hadoop2.X HA架构与部署

HDFS-HA原理及配置 1.HDFS-HA架构原理介绍   hadoop2.x之后,Clouera提出了QJM/Qurom Journal Manager,这是一个基于Paxos算法实现的HDFS HA方案,它给出了一种较好的解决思路和方案,示意图...

努力的凹凸曼
03/31
0
0
虚拟化平台proxmox集群和HA配置

一.前言 proxmox是一个开源的虚拟化管理平台,支持集群管理和HA.在存储方面,proxmox除了支持常用的lvm,nfs,iscsi,还支持集群存储glusterfs和ceph,这也是我们选择它的原因. proxmox官方提供ISO...

kisops
2013/09/15
1.3W
8

没有更多内容

加载失败,请刷新页面

加载更多

超实用企业级搜索引擎_Elasticsearch(二)基于RESTFul Api操作

Elasticsearch(二)基于RESTFul Api操作 想要进行API操作,必须安装好Elasticsearch,如果没安装的,可以参考上篇去操作一波,再来学习API操作噢! Elasticsearch的 API,我们可以不用每个API语法啥...

煌sir
33分钟前
31
0
版本控制git的简单使用

0.第一次使用时配置: git config --global user.name "your_name" git config --global user.email "your_name@domain.com" 用的最多的: (查看当前git状态) git status 1.初始化: ......

baowang123
48分钟前
20
0
定时器Timer和TimerTask

为什么要使用定时器呢? 比如说一个web应用,如果这个应用规模很大,那它的日志数据是不是很多。如果一直存下来服务器的存储量怕是不行吧,需要隔一段时间删除,那么就需要一个线程每隔一段时...

南柯好萌
今天
18
0
深圳创服机构创成汇投融资对接指南

深圳创服机构创成汇投融资对接指南 一线城市一直是许多创业者创业热土,深圳也不例外,作为发达城市,科技是深圳的标志,也是许多科技创业者向往之地,科技创业者在创业前期面临许多难题,其...

cchlin
今天
35
0
egg学习笔记第六天:使用中间件屏蔽可疑用户

站点有时候想屏蔽一些特定频繁抓取服务器数据的用户,可以放在中间件中去做,用户在指定Ip数组内,则屏蔽,如果不在,则匹配路由规则执行controller。 中间件的概念: 匹配路由前,匹配路由完...

一生懸命吧
今天
34
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部