文档章节

ceph: how to fix if monitor IP changes

认真即可
 认真即可
发布于 2016/06/29 16:38
字数 1150
阅读 270
收藏 0
  1. 修正所有结点ip地址,之后的症状...
  • 在一个结点上执行:sudo ceph status , 从输出可看到,ceph还是在尝试连接“旧”地址;
  • systemctl status ceph-mon@xxx.service , 说unable to bind to ... "旧"地址;

单纯修改/etc/hosts和/etc/ceph/ceph.conf是没有用的! 因为ceph monitor 是把配置信息存在monmap中的, 不能随随便便更改, 因为monitor好比集群大脑,太重要了!以后最好给monitor分配私网IP地址。

    2. 怎么解决?

我偷个懒,把请教大牛的IRC聊天记录贴出了。主要思想就是把所有monitor先停下来,从集群中移走,直到集群剩下一个monitor,然后再一个一个从头加入进来

4:20:07 PM - zren: Hello, may I ask a quick question here: 
what should I do to recover my cluster after a long period of downtime?
2/3 nodes's IP has changed during this time. "ceph -s" still try to connect
the old IP even after I've set the new ip in /etc/hosts. 
And the "mon_host=" list in /etc/ceph/ceph.conf still shows the old IP addresses, 
should I correct the list manually?

4:28:15 PM - oms101: zren -> never done this but do change the /etc/ceph/ceph.conf
4:28:38 PM - oms101: This is definitely used by the client tools

4:52:04 PM - joao: <oms101> zren -> never done this but do change the /etc/ceph/ceph.conf
4:52:07 PM - joao: this may not be sufficient
4:52:15 PM - joao: how did the ip change?
4:52:51 PM - joao: did you properly moved the monitors to the new ips first?
4:53:03 PM - joao: i'm guessing no
4:53:22 PM - joao: so you'll likely have a monmap with the old ips in it
4:53:55 PM - joao: likelihood is that the monitors won't even be able to form quorum because
they have wrong ips for the monitors
4:53:56 PM - oms101: yes good point joao

4:54:28 PM - joao: in which case, your best chance will be extracting the current 
monmap from all monitors and injecting a new map
4:54:38 PM - oms101: http://docs.ceph.com/docs/master/man/8/monmaptool/
4:54:49 PM - oms101: is useful documentation on the monmaptool
4:54:49 PM - joao: this will mean shutting down your monitors, but given you likely don't even have quorum who cares anyway
4:56:01 PM - joao: if we're pointing to upstream, i'd instead suggest http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster
4:56:22 PM - joao: absolutely no clue if this has been mapped to our internal docs, although i hope so
4:56:48 PM - joao: omg
4:57:03 PM - zren: joao: first of all, thanks!  it changed because the network facility in server room was reconstructed by the IT guy, hah. 

5:22:07 PM - zren: joao: come back again;-) unfortunately, I got this error when trying to get the copy of monmap file according the link you point to:
5:22:07 PM - zren: ceph1:~ # ceph-mon -i `hostname` --extract-monmap /tmp/monmap
5:22:07 PM - zren: IO error: lock /var/lib/ceph/mon/ceph-ceph1/store.db/LOCK: Resource temporarily unavailable
5:22:07 PM - zren: 2016-06-29 17:14:04.155152 7fb9cf3607c0 -1 error opening mon data directory at '/var/lib/ceph/mon/ceph-ceph1': (22) Invalid argument
5:23:15 PM - joao: zren, the monitor is running
5:23:24 PM - joao: as i said, you need to shut them down
5:24:37 PM - zren: joao: Yes, according to the link, I stopped 2/3 nodes, so only one surviving monitor is left;-)
5:25:02 PM - joao: zren,you need to do that on *all* the monitors
5:25:11 PM - joao: you need the same map epoch on all the monitors
5:25:17 PM - joao: otherwise that will lead to inconsistencies

5:25:52 PM - joao: the idea is roughly to do
5:26:15 PM - zren: joao: OK, thanks! will try.. please treat me as a very newbie hah;-)
5:27:00 PM - joao: you only need to extract the monmap from the monitor with the latest monmap
5:27:11 PM - joao: but need to inject it into every monitor
5:27:23 PM - zren: joao: got it;-)
5:27:29 PM - smithfarm1 has left the room (Quit: Ping timeout: 121 seconds).
5:27:43 PM - joao: if by some chance you ended up running the cluster with quorum with less than 3 monitors, then you need to check which one has the latest monmap
5:28:02 PM - joao: in that case, extract the monmap on all the monitors and use the monmaptool to check the latest epoch
5:28:16 PM - joao: monmaptool --print /path/to/monmap
5:28:22 PM - joao: that will give you the map epoch
5:28:39 PM - joao: i can't emphasize this enough: use the latest epoch
下面步骤就是停掉所有monitor之后,恢复第二个monitor的大致方法
ceph2:~ # ceph mon remove ceph2 ceph2:~ # rm -rf /var/lib/ceph/mon/ceph-ceph2/* 
ceph2:~ # mkdir tmp ceph2:~ # ceph mon getmap -o tmp/monmap 
ceph2:~ # ceph auth get mon. -o tmp/keyring 
ceph2:~ # ceph-mon -i ceph2 --mkfs --monmap tmp/monmap --keyring tmp/keyring 
ceph2:~ # ceph-mon -i ceph2 --public-addr "new-ip":6789 
ceph2:~ # systemctl start ceph-mon@ceph2.service 
ceph2:~ # systemctl status ceph-mon@ceph2.service 

也可尝试下面命令,来自[3]:

    #Add the new monitor locations  
    # monmaptool --create --add mon0 192.168.32.2:6789 --add osd1 192.168.32.3:6789 \  
      --add osd2 192.168.32.4:6789 --fsid 61a520db-317b-41f1-9752-30cedc5ffb9a \  
      --clobber monmap  
       
    #Retrieve the monitor map  
    # ceph mon getmap -o monmap.bin  
       
    #Check new contents  
    # monmaptool --print monmap.bin  
       
    #Inject the monmap  
    # ceph-mon -i mon0 --inject-monmap monmap.bin  
    # ceph-mon -i osd1 --inject-monmap monmap.bin  
    # ceph-mon -i osd2 --inject-monmap monmap.bin 

参考文档:

[1]  http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster

[2] http://docs.ceph.com/docs/master/man/8/monmaptool/

[3] http://os.51cto.com/art/201412/462140.htm

© 著作权归作者所有

认真即可
粉丝 5
博文 101
码字总数 47640
作品 0
朝阳
程序员
私信 提问
ip改变引起的ceph monitor异常及osd盘崩溃的总结

公司搬家,所有服务器的ip改变。对ceph服务器配置好ip后启动,发现monitor进程启动失败,monitor进程总是试图绑定到以前的ip地址,那当然不可能成功了。开始以为服务器的ip设置有问题,在改变...

加油2018
2014/12/25
4.2K
6
基于centos7.3安装部署jewel版本ceph集群实战演练

一、环境准备 安装centos7.3虚拟机三台 由于官网源与网盘下载速度都非常的慢,所以给大家提供了国内的搜狐镜像源:http://mirrors.sohu.com/centos/7.3.1611/isos/x8664/CentOS-7-x86_64-DV...

盖世英雄iii
2018/06/27
0
0
Ceph修改OSD和Monitor的网络

转自:http://blog.csdn.net/for_tech/article/details/72382042 随着Ceph的应用越来越深入,我们也遇到了越来越多的需求,本次我们将讲一下如何修改OSD和Monitor的网络。 Ceph应用中,在部署...

penglaixy
2018/01/24
0
0
基于centos7.3安装部署jewel版本ceph集群实战演练

一、环境准备 安装centos7.3虚拟机三台 由于官网源与网盘下载速度都非常的慢,所以给大家提供了国内的搜狐镜像源:http://mirrors.sohu.com/centos/7.3.1611/isos/x8664/CentOS-7-x86_64-DV...

盖世英雄iii
2018/06/27
0
0
Ceph v0.87 Giant 发布,分布式文件系统

Ceph v0.87 Giant 发布,此版本现已提供下载,值得关注的更新如下: RADOS 性能提升和客户端 librados 吞吐量的提升 CephFS: 单MDS系统的稳定性和性能大幅提升,尽管我们还不建议在产品环境中...

oschina
2014/10/30
1K
8

没有更多内容

加载失败,请刷新页面

加载更多

为什么要在网站中应用CDN加速?

1. 网页加载速度更快 在网站中使用CDN技术最直接的一个好处就是它可以加快网页的加载速度。首先,CDN加速的内容分发是基于服务器缓存的,由于CDN中缓存了不少数据,它能够给用户提供更快的页...

云漫网络Ruan
20分钟前
2
0
亚玛芬体育(Amer Sports)和信必优正式启动合作开发Movesense创新

亚玛芬体育和信必优正式启动合作开发Movesense创新,作为亚玛芬体育的完美技术搭档,信必优利用Movesense传感器技术为第三方开发移动应用和服务。 Movesense基于传感器技术和开放的API,测量...

symbiochina88
31分钟前
2
0
创龙TI AM437x ARM Cortex-A9 + Xilinx Spartan-6 FPGA核心板规格书

SOM-TL437xF是一款广州创龙基于TI AM437x ARM Cortex-A9 + Xilinx Spartan-6 FPGA芯片设计的核心板,采用沉金无铅工艺的10层板设计,适用于高速数据采集和处理系统、汽车导航、工业自动化等领...

Tronlong创龙
31分钟前
2
0
好程序员Java学习路线分享MyBatis之线程优化

  好程序员Java学习路线分享MyBatis之线程优化,我们的项目存在大量用户同时访问的情况,那么就会出现大量线程并发访问数据库,这样会带来线程同步问题,本章我们将讨论MyBatis的线程同步问...

好程序员官方
37分钟前
6
0
IDEA 自定义方法注解模板

IDEA 自定义方法注解模板 1、使用效果 /*** 计算交易费用* @Author wangjiafang* @Date 2019/9/11* @param feeComputeVo* @return*/@PostMapping("/v1/fee_compute")public ApiResp......

小白的成长
37分钟前
6
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部