Ceph添加监视器Monitor失败
博客专区 > 哓竹 的博客 > 博客详情
Ceph添加监视器Monitor失败
哓竹 发表于4个月前
Ceph添加监视器Monitor失败
  • 发表于 4个月前
  • 阅读 19
  • 收藏 0
  • 点赞 0
  • 评论 0

腾讯云 新注册用户 域名抢购1元起>>>   

摘要: ceph版本有jewel10.2.2和10.2.7

#1.添加Mon 当前ceph的状态 # ceph -s cluster f4833745-d220-407b-82ea-72eb6297d435 health HEALTH_OK monmap e3: 3 mons at {dlw1=172.16.40.11:6789/0,dlw2=172.16.40.12:6789/0,dlw3=172.16.40.13:6789/0} election epoch 14, quorum 0,1,2 dlw1,dlw2,dlw3 osdmap e26: 3 osds: 3 up, 3 in flags sortbitwise,require_jewel_osds pgmap v9695: 352 pgs, 6 pools, 45725 kB data, 20 objects 253 MB used, 584 GB / 584 GB avail 352 active+clean

当前已有三个mon,分别为dlw1,dlw2和dlw3,现在添加第四个mon dlw4 > 疑问:为什么要有四个mon,也不满足Paxos 算法,因为我添加了dlw4作为mon,再把dlw1的mon移除掉,这样就等同于mon迁移了...,这不是重点,重点是添加mon过程中的报错及解决办法,做个记录。 >

用最简单快速的方法来添加ceph-deploy

# ceph-deploy --overwrite-conf mon create dlw4
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.37): /usr/bin/ceph-deploy --overwrite-conf mon create dlw4
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : True
[ceph_deploy.cli][INFO  ]  subcommand                    : create
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x1dd2d88>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  mon                           : ['dlw4']
[ceph_deploy.cli][INFO  ]  func                          : <function mon at 0x1c93de8>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.cli][INFO  ]  keyrings                      : None
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts dlw4
[ceph_deploy.mon][DEBUG ] detecting platform for host dlw4 ...
[dlw4][DEBUG ] connected to host: dlw4 
[dlw4][DEBUG ] detect platform information from remote host
[dlw4][DEBUG ] detect machine type
[dlw4][DEBUG ] find the location of an executable
[ceph_deploy.mon][INFO  ] distro info: CentOS Linux 7.2.1511 Core
[dlw4][DEBUG ] determining if provided host has same hostname in remote
[dlw4][DEBUG ] get remote short hostname
[dlw4][DEBUG ] deploying mon to dlw4
[dlw4][DEBUG ] get remote short hostname
[dlw4][DEBUG ] remote hostname: dlw4
[dlw4][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[dlw4][DEBUG ] create the mon path if it does not exist
[dlw4][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-dlw4/done
[dlw4][DEBUG ] create a done file to avoid re-doing the mon deployment
[dlw4][DEBUG ] create the init path if it does not exist
[dlw4][INFO  ] Running command: systemctl enable ceph.target
[dlw4][INFO  ] Running command: systemctl enable ceph-mon@dlw4
[dlw4][INFO  ] Running command: systemctl start ceph-mon@dlw4
[dlw4][INFO  ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.dlw4.asok mon_status
[dlw4][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
[dlw4][WARNIN] monitor: mon.dlw4, might not be running yet
[dlw4][INFO  ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.dlw4.asok mon_status
[dlw4][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
[dlw4][WARNIN] dlw4 is not defined in `mon initial members`
[dlw4][WARNIN] monitor dlw4 does not exist in monmap
[dlw4][WARNIN] neither `public_addr` nor `public_network` keys are defined for monitors
[dlw4][WARNIN] monitors may not be able to form quorum


[root@dlw1 opt]# ceph-deploy --overwrite-conf mon add dlw4 
  
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.37): /usr/bin/ceph-deploy --overwrite-conf mon add dlw4
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : True
[ceph_deploy.cli][INFO  ]  subcommand                    : add
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0xf08d88>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  mon                           : ['dlw4']
[ceph_deploy.cli][INFO  ]  func                          : <function mon at 0xdcade8>
[ceph_deploy.cli][INFO  ]  address                       : None
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.mon][INFO  ] ensuring configuration of new mon host: dlw4
[ceph_deploy.admin][DEBUG ] Pushing admin keys and conf to dlw4
[dlw4][DEBUG ] connected to host: dlw4 
[dlw4][DEBUG ] detect platform information from remote host
[dlw4][DEBUG ] detect machine type
[dlw4][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.mon][DEBUG ] Adding mon to cluster ceph, host dlw4
[ceph_deploy.mon][DEBUG ] using mon address by resolving host: 172.16.40.9
[ceph_deploy.mon][DEBUG ] detecting platform for host dlw4 ...
[dlw4][DEBUG ] connected to host: dlw4 
[dlw4][DEBUG ] detect platform information from remote host
[dlw4][DEBUG ] detect machine type
[dlw4][DEBUG ] find the location of an executable
[ceph_deploy.mon][INFO  ] distro info: CentOS Linux 7.2.1511 Core
[dlw4][DEBUG ] determining if provided host has same hostname in remote
[dlw4][DEBUG ] get remote short hostname
[dlw4][DEBUG ] adding mon to dlw4
[dlw4][DEBUG ] get remote short hostname
[dlw4][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[dlw4][DEBUG ] create the mon path if it does not exist
[dlw4][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-dlw4/done
[dlw4][DEBUG ] create a done file to avoid re-doing the mon deployment
[dlw4][DEBUG ] create the init path if it does not exist
[dlw4][INFO  ] Running command: systemctl enable ceph.target
[dlw4][INFO  ] Running command: systemctl enable ceph-mon@dlw4
[dlw4][INFO  ] Running command: systemctl start ceph-mon@dlw4
[dlw4][WARNIN] No data was received after 7 seconds, disconnecting...
[dlw4][INFO  ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.dlw4.asok mon_status
[dlw4][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
[dlw4][WARNIN] dlw4 is not defined in `mon initial members`
[dlw4][WARNIN] monitor dlw4 does not exist in monmap
[dlw4][WARNIN] neither `public_addr` nor `public_network` keys are defined for monitors
[dlw4][WARNIN] monitors may not be able to form quorum
[dlw4][INFO  ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.dlw4.asok mon_status
[dlw4][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
[dlw4][WARNIN] monitor: mon.dlw4, might not be running yet

#2.发现报错 这里用了add和create,发现都报错,而且报错内容一样,是找不到asok

找不到这个文件的原因是在于dlw4上的mon服务启动失败了

[root@dlw4 ceph]# systemctl status ceph-mon@`hostname`
● ceph-mon@dlw4.service - Ceph cluster monitor daemon
   Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Wed 2017-08-02 16:04:20 CHOST; 11min ago
  Process: 20119 ExecStart=/usr/bin/ceph-mon -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
 Main PID: 20119 (code=exited, status=1/FAILURE)

#3.解决经过 ##3.1.检查日志 在dlw4上

[root@dlw4 ceph]# cd /var/log/ceph/
[root@dlw4 ceph]# ls
ceph.log  ceph-mon.dlw4.log 
[root@dlw4 ceph]# tail -f ceph-mon.dlw4.log 
2017-08-02 16:04:09.972349 7f0232f61640  0 set uid:gid to 167:167 (ceph:ceph)
2017-08-02 16:04:09.972374 7f0232f61640  0 ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185), process ceph-mon, pid 20119
2017-08-02 16:04:09.972410 7f0232f61640  0 pidfile_write: ignore empty --pid-file
2017-08-02 16:04:09.998672 7f0232f61640  1 leveldb: Recovering log #30
2017-08-02 16:04:10.003399 7f0232f61640  1 leveldb: Delete type=0 #30

2017-08-02 16:04:10.003445 7f0232f61640  1 leveldb: Delete type=3 #29

2017-08-02 16:04:10.003694 7f0232f61640  0 mon.dlw4 does not exist in monmap, will attempt to join an existing cluster
2017-08-02 16:04:10.003795 7f0232f61640 -1 no public_addr or public_network specified, and mon.dlw4 not present in monmap or ceph.conf

最后一行引起了注意,没有指定public_addr或public_network,并且mon.dlw4也没指定再monmap或者ceph.conf中

##3.2.修改参数再次添加mon 在dlw1上,往ceph.conf中添加参数 public_network=172.16.40.0/24 并且把dlw4加入到mon_initial_members和mon_host中 把ceph.conf推到所有节点上

# ceph-deploy --overwrite-conf config push dlw2 dlw3 dlw4

再次添加mon

# ceph-deploy --overwrite-conf mon create dlw4
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.37): /usr/bin/ceph-deploy --overwrite-conf mon create dlw4
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : True
[ceph_deploy.cli][INFO  ]  subcommand                    : create
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x1ca3d88>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  mon                           : ['dlw4']
[ceph_deploy.cli][INFO  ]  func                          : <function mon at 0x1b64de8>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.cli][INFO  ]  keyrings                      : None
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts dlw4
[ceph_deploy.mon][DEBUG ] detecting platform for host dlw4 ...
[dlw4][DEBUG ] connected to host: dlw4 
[dlw4][DEBUG ] detect platform information from remote host
[dlw4][DEBUG ] detect machine type
[dlw4][DEBUG ] find the location of an executable
[ceph_deploy.mon][INFO  ] distro info: CentOS Linux 7.2.1511 Core
[dlw4][DEBUG ] determining if provided host has same hostname in remote
[dlw4][DEBUG ] get remote short hostname
[dlw4][DEBUG ] deploying mon to dlw4
[dlw4][DEBUG ] get remote short hostname
[dlw4][DEBUG ] remote hostname: dlw4
[dlw4][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[dlw4][DEBUG ] create the mon path if it does not exist
[dlw4][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-dlw4/done
[dlw4][DEBUG ] create a done file to avoid re-doing the mon deployment
[dlw4][DEBUG ] create the init path if it does not exist
[dlw4][INFO  ] Running command: systemctl enable ceph.target
[dlw4][INFO  ] Running command: systemctl enable ceph-mon@dlw4
[dlw4][INFO  ] Running command: systemctl start ceph-mon@dlw4
[dlw4][INFO  ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.dlw4.asok mon_status
[dlw4][DEBUG ] ********************************************************************************
[dlw4][DEBUG ] status for monitor: mon.dlw4
[dlw4][DEBUG ] {
[dlw4][DEBUG ]   "election_epoch": 0, 
[dlw4][DEBUG ]   "extra_probe_peers": [
[dlw4][DEBUG ]     "172.16.40.11:6789/0", 
[dlw4][DEBUG ]     "172.16.40.12:6789/0", 
[dlw4][DEBUG ]     "172.16.40.13:6789/0"
[dlw4][DEBUG ]   ], 
[dlw4][DEBUG ]   "monmap": {
[dlw4][DEBUG ]     "created": "2017-08-02 10:43:08.448472", 
[dlw4][DEBUG ]     "epoch": 0, 
[dlw4][DEBUG ]     "fsid": "f4833745-d220-407b-82ea-72eb6297d435", 
[dlw4][DEBUG ]     "modified": "2017-08-02 10:43:08.448472", 
[dlw4][DEBUG ]     "mons": [
[dlw4][DEBUG ]       {
[dlw4][DEBUG ]         "addr": "172.16.40.9:6789/0", 
[dlw4][DEBUG ]         "name": "dlw4", 
[dlw4][DEBUG ]         "rank": 0
[dlw4][DEBUG ]       }, 
[dlw4][DEBUG ]       {
[dlw4][DEBUG ]         "addr": "0.0.0.0:0/1", 
[dlw4][DEBUG ]         "name": "dlw1", 
[dlw4][DEBUG ]         "rank": 1
[dlw4][DEBUG ]       }, 
[dlw4][DEBUG ]       {
[dlw4][DEBUG ]         "addr": "0.0.0.0:0/2", 
[dlw4][DEBUG ]         "name": "dlw2", 
[dlw4][DEBUG ]         "rank": 2
[dlw4][DEBUG ]       }, 
[dlw4][DEBUG ]       {
[dlw4][DEBUG ]         "addr": "0.0.0.0:0/3", 
[dlw4][DEBUG ]         "name": "dlw3", 
[dlw4][DEBUG ]         "rank": 3
[dlw4][DEBUG ]       }
[dlw4][DEBUG ]     ]
[dlw4][DEBUG ]   }, 
[dlw4][DEBUG ]   "name": "dlw4", 
[dlw4][DEBUG ]   "outside_quorum": [
[dlw4][DEBUG ]     "dlw4"
[dlw4][DEBUG ]   ], 
[dlw4][DEBUG ]   "quorum": [], 
[dlw4][DEBUG ]   "rank": 0, 
[dlw4][DEBUG ]   "state": "probing", 
[dlw4][DEBUG ]   "sync_provider": []
[dlw4][DEBUG ] }
[dlw4][DEBUG ] ********************************************************************************
[dlw4][INFO  ] monitor: mon.dlw4 is running
[dlw4][INFO  ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.dlw4.asok mon_status

发现并没有报错,以为添加成功,执行ceph -s,发现还是3个 # ceph -s cluster f4833745-d220-407b-82ea-72eb6297d435 health HEALTH_OK monmap e3: 3 mons at {dlw1=172.16.40.11:6789/0,dlw2=172.16.40.12:6789/0,dlw3=172.16.40.13:6789/0} election epoch 14, quorum 0,1,2 dlw1,dlw2,dlw3 osdmap e26: 3 osds: 3 up, 3 in flags sortbitwise,require_jewel_osds pgmap v9697: 352 pgs, 6 pools, 45725 kB data, 20 objects 253 MB used, 584 GB / 584 GB avail 352 active+clean

切换至dlw4检查mon服务,发现服务也是正常启动的,再执行了一遍mon add,发现结果一样。

[root@dlw4 ceph]# systemctl status ceph-mon@`hostname`
● ceph-mon@dlw4.service - Ceph cluster monitor daemon
   Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2017-08-02 16:24:36 CHOST; 2min 29s ago
 Main PID: 20208 (ceph-mon)
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@dlw4.service
           └─20208 /usr/bin/ceph-mon -f --cluster ceph --id dlw4 --setuser ceph --setgroup ceph

Aug 02 16:24:36 dlw4 systemd[1]: Started Ceph cluster monitor daemon.
Aug 02 16:24:36 dlw4 systemd[1]: Starting Ceph cluster monitor daemon...
Aug 02 16:24:36 dlw4 ceph-mon[20208]: starting mon.dlw4 rank -1 at 172.16.40.9:6789/0 mon_data /var/lib/ceph/mon/ceph-dlw4 fsid f4833745-d220-407b-82ea-72eb6297d435

##3.3.检查状态 检查ceph集群mon的状态

# ceph mon_status |jq
{
  "name": "dlw1",
  "rank": 0,
  "state": "leader",
  "election_epoch": 14,
  "quorum": [
    0,
    1,
    2
  ],
  "outside_quorum": [],
  "extra_probe_peers": [
    "172.16.40.9:6789/0",
    "172.16.40.12:6789/0",
    "172.16.40.13:6789/0"
  ],
  "sync_provider": [],
  "monmap": {
    "epoch": 3,
    "fsid": "f4833745-d220-407b-82ea-72eb6297d435",
    "modified": "2017-08-01 19:00:04.795921",
    "created": "2017-07-20 12:38:26.592488",
    "mons": [
      {
        "rank": 0,
        "name": "dlw1",
        "addr": "172.16.40.11:6789/0"
      },
      {
        "rank": 1,
        "name": "dlw2",
        "addr": "172.16.40.12:6789/0"
      },
      {
        "rank": 2,
        "name": "dlw3",
        "addr": "172.16.40.13:6789/0"
      }
    ]
  }
}

> 备注jq是一个格式化显示工具,需要另外安装,epel源里面就有,ceph本身自带参数也可以格式化显示

# ceph mon_status -f json-pretty

检查dlw4的mon状态

[root@dlw4 ceph]# ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.dlw4.asok mon_status
{
    "name": "dlw4",
    "rank": 0,
    "state": "probing",
    "election_epoch": 0,
    "quorum": [],
    "outside_quorum": [
        "dlw4"
    ],
    "extra_probe_peers": [
        "172.16.40.11:6789\/0",
        "172.16.40.12:6789\/0",
        "172.16.40.13:6789\/0"
    ],
    "sync_provider": [],
    "monmap": {
        "epoch": 0,
        "fsid": "f4833745-d220-407b-82ea-72eb6297d435",
        "modified": "2017-08-02 10:43:08.448472",
        "created": "2017-08-02 10:43:08.448472",
        "mons": [
            {
                "rank": 0,
                "name": "dlw4",
                "addr": "172.16.40.9:6789\/0"
            },
            {
                "rank": 1,
                "name": "dlw1",
                "addr": "0.0.0.0:0\/1"
            },
            {
                "rank": 2,
                "name": "dlw2",
                "addr": "0.0.0.0:0\/2"
            },
            {
                "rank": 3,
                "name": "dlw3",
                "addr": "0.0.0.0:0\/3"
            }
        ]
    }
}

> 根据mon服务启动时创建的asok文件可以看到dlw4已经在monmap中了,但是状态是probing,相较于其它三台mon的状态分别为leader和peon(领导跟苦工),dlw4还在探索中,也就是dlw4上的mon已经正常了,但是并没有在集群的mon选举中,换句话说,就是它还没连上集群。

##3.4.检查日志 dlw1上(172.16.40.11)

# tail -f ceph-mon.dlw1.log
2017-08-02 16:42:52.877734 7fbfc2333700  0 -- 172.16.40.11:6789/0 >> 172.16.40.9:6789/0 pipe(0x7fbfd9c1c800 sd=22 :6789 s=0 pgs=0 cs=0 l=0 c=0x7fbfd7360700).accept: got bad authorizer
2017-08-02 16:42:54.879024 7fbfc2333700  0 cephx: verify_authorizer could not decrypt ticket info: error: NSS AES final round failed: -8190
2017-08-02 16:42:54.879028 7fbfc2333700  0 mon.dlw1@0(leader) e3 ms_verify_authorizer bad authorizer from mon 172.16.40.9:6789/0
2017-08-02 16:42:54.879034 7fbfc2333700  0 -- 172.16.40.11:6789/0 >> 172.16.40.9:6789/0 pipe(0x7fbfd93c6000 sd=22 :6789 s=0 pgs=0 cs=0 l=0 c=0x7fbfd7360400).accept: got bad authorizer
2017-08-02 16:42:55.076972 7fbfc06d0700  0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption
2017-08-02 16:42:55.076981 7fbfc06d0700  0 -- 172.16.40.11:6789/0 >> 172.16.40.9:6789/0 pipe(0x7fbfd9c1b400 sd=19 :55595 s=1 pgs=0 cs=0 l=0 c=0x7fbfd8b0ab80).failed verifying authorize reply
2017-08-02 16:42:56.885648 7fbfc2333700  0 cephx: verify_authorizer could not decrypt ticket info: error: NSS AES final round failed: -8190
2017-08-02 16:42:56.885657 7fbfc2333700  0 mon.dlw1@0(leader) e3 ms_verify_authorizer bad authorizer from mon 172.16.40.9:6789/0

dlw4上(172.16.40.9)

[root@dlw4 ceph]# tail -f ceph-mon.dlw4.log 
2017-08-02 16:43:14.890240 7fc9b6483700  0 -- 172.16.40.9:6789/0 >> 172.16.40.13:6789/0 pipe(0x7fc9cb59e800 sd=24 :40309 s=1 pgs=0 cs=0 l=0 c=0x7fc9cb3fad00).failed verifying authorize reply
2017-08-02 16:43:14.919426 7fc9b6584700  0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption
2017-08-02 16:43:14.919456 7fc9b6584700  0 -- 172.16.40.9:6789/0 >> 172.16.40.12:6789/0 pipe(0x7fc9cb59d400 sd=15 :34233 s=1 pgs=0 cs=0 l=0 c=0x7fc9cb3fab80).failed verifying authorize reply
2017-08-02 16:43:15.693119 7fc9b5b81700  0 cephx: verify_authorizer could not decrypt ticket info: error: NSS AES final round failed: -8190
2017-08-02 16:43:15.693129 7fc9b5b81700  0 mon.dlw4@0(probing) e0 ms_verify_authorizer bad authorizer from mon 172.16.40.13:6789/0

> 比较日志,也验证了前面的猜想,一边是连接dlw4时获取的权限不对,一边dlw4连接其它mon时权限验证不对。联想ceph的权限是cephx来认证的,而Cephx 用共享密钥来认证,即客户端和监视器集群各自都有客户端密钥的副本。这样的认证协议使参与双方不用展现密钥就能相互认证,就是说集群确信用户拥有密钥、而且用户相信集群有密钥的副本。

##3.5.修改密钥 检查集群中的每个mon的密钥 发现dlw1,dlw2, dlw3的keyring相同是

# cat keyring 
[mon.]
        key = AQDiJHBZAAAAABAAiAz+B0XamXqLSemUudvStA==
        caps mon = "allow *"

而dlw4的keying是

# cd /var/lib/ceph/mon/ceph-dlw4/
# cat keyring 
[mon.]
        key = AQCIUIBZAAAAABAASHOxkpYwK6BlD4ITbuIrkQ==
        caps mon = "allow *"

于是手动修改dlw4的keying,将文件中key修改为dlw1的key 再重启dlw4的mon服务

[root@dlw4 ceph-dlw4]# systemctl restart ceph-mon@`hostname`

##3.6.检查状态 # ceph -s cluster f4833745-d220-407b-82ea-72eb6297d435 health HEALTH_OK monmap e4: 4 mons at {dlw1=172.16.40.11:6789/0,dlw2=172.16.40.12:6789/0,dlw3=172.16.40.13:6789/0,dlw4=172.16.40.9:6789/0} election epoch 16, quorum 0,1,2,3 dlw4,dlw1,dlw2,dlw3 osdmap e26: 3 osds: 3 up, 3 in flags sortbitwise,require_jewel_osds pgmap v9700: 352 pgs, 6 pools, 45725 kB data, 20 objects 253 MB used, 584 GB / 584 GB avail 352 active+clean 发现已经是4个mon了

[root@dlw4 ceph-dlw4]# ceph quorum_status  -f json-pretty

{
    "election_epoch": 16,
    "quorum": [
        0,
        1,
        2,
        3
    ],
    "quorum_names": [
        "dlw4",
        "dlw1",
        "dlw2",
        "dlw3"
    ],
    "quorum_leader_name": "dlw4",
    "monmap": {
        "epoch": 4,
        "fsid": "f4833745-d220-407b-82ea-72eb6297d435",
        "modified": "2017-08-02 16:54:32.549853",
        "created": "2017-07-20 12:38:26.592488",
        "mons": [
            {
                "rank": 0,
                "name": "dlw4",
                "addr": "172.16.40.9:6789\/0"
            },
            {
                "rank": 1,
                "name": "dlw1",
                "addr": "172.16.40.11:6789\/0"
            },
            {
                "rank": 2,
                "name": "dlw2",
                "addr": "172.16.40.12:6789\/0"
            },
            {
                "rank": 3,
                "name": "dlw3",
                "addr": "172.16.40.13:6789\/0"
            }
        ]
    }
}

#4.总结

此次添加mon,一共是出了2个问题 > 第一个问题是添加mon的时候需要public_network

> 第二个问题是由于没有添加public_network直接添加mon生成了一个不同于原集群的keying,导致mon之间并不能进行cephx认证,因此mon无法加入到集群的mon选举中。

共有 人打赏支持
粉丝 5
博文 39
码字总数 49084
×
哓竹
如果觉得我的文章对您有用,请随意打赏。您的支持将鼓励我继续创作!
* 金额(元)
¥1 ¥5 ¥10 ¥20 其他金额
打赏人
留言
* 支付类型
微信扫码支付
打赏金额:
已支付成功
打赏金额: