NetApp存储方案及巡检命令

2018/03/07 11:03
阅读数 42

一、MCC概述

Clustered Metro Cluster(简称MCC)是Netapp Data Ontap提供的存储双活解决方案,当初的方案是把1个FAS/ V系列双控在数据中心之间拉远形成异地HA Pair,每站点只有单控制器节点,数据中心两站点之间通过额外的FC/VI集群适配器相连,数据中心间SAS磁盘框通过SAS转FC的FibreBridge相连在500米以内、同一个机房采用直接光纤通道交换机连接;在500米以上(最远100km)采用光纤通道和DWDM交换机相连。

640?wx_fmt=png&wxfrom=5&wx_lazy=1

 

0?wx_fmt=png

      MetroCluster在此架构上也进行了演变。通过在站点A、B两个站点分别放置两套FAS/ V双控阵列,阵列A的A控和阵列B的A控,阵列A的B控和阵列B的B控分别形成集群,这样可以充分把A、B站点数据中心资源充分利用,同时对外提供存储服务;但阵列内的A、B不是集群。如果站点间形成集群Pair的任意一个控制器节点故障,故障站点的主机都需要远程访问远端控制器节点;如何站点间形成集群Pair的两个节点同时故障,就会发生业务中断。

      Netapp Data Ontap8.3版本推出了4控双活解决方案,最远支持200公里距离,4控Metro Cluster方案首先由2个HA Pair组成2个本地集群,然后再从2个集群上做4节点集群。集群控制器之间内存日志通过存放在NVRAM里面,NVRAM对没有下盘的日志做了镜像,保证节点故障以后,HA Pair集群的Partner节点能够接管业务;或者站点故障以后,远端HA Pair集群能够接管业务。当日志到达一定水位或者发生系统操作刷盘时,下盘数据同步通过SyncMirror实现主从站点双写,从而确保一个站点磁盘故障以后,另外一个站点磁盘还能提供系统访问,实现站点故障切换,保证业务不中断。

0?wx_fmt=png

      MetroCluster使用两个不同地点的镜像和集群来保护数据,每个集群把数据和Storage Virtual Machine (SVM) 配置都镜像同步另一个集群。当某个站点发生灾难时,管理员可以激活远端SVM并在另一站点接管业务。此外,每个集群在本地节点均配置为HA Pair,从而提供了本地故障转移能力。

0?wx_fmt=png

      NetApp MetroCluster是以NetApp SyncMirror是配合Cluster_remote和控制器Cluster Failover的功能实现的。

      • Clustered Failover – 在主存储和容灾存储间提供高可用性失败恢复能力,故障接管的决策是由管理员通过单一命令行决定的。

      • SyncMirror – 为远端存储提供即时的数据拷贝,当故障接管时,数据可以仅通过远端的存储进行访问。

      • ClusterRemote – 提供管理机制用以判断灾难的发生并初始远端存储进行接管。

二、MCC巡检常用命令

1、系统健康状态检查

cluster1::> system health status show
Status
---------------
ok

2、集群状态检查

cluster1::> cluster show              
Node                  Health  Eligibility
--------------------- ------- ------------
cluster1-01           true    true
cluster1-02           true    true
2 entries were displayed.

3、集群统计状态检查

cluster1::> cluster statistics show
         Counter             Value         Delta
---------------- ----------------- -------------
       CPU Busy:                0%             -
     Operations:
          Total:                 0             -
            NFS:                 0             -
           CIFS:                 0             -
   Data Network:
           Busy:                0%             -
       Received:            5.78GB             -
           Sent:            13.7GB             -
Cluster Network:
           Busy:                0%             -
       Received:             967KB             -
           Sent:             979KB             -
   Storage Disk:
           Read:            6.38PB             -
          Write:            6.26PB             -

4、查看RAID组信息

cluster1::> aggr show
                                                                      

Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
aggr0_A1   953.8GB   247.3GB   74% online       1 cluster1-01      raid4,
                                                                   mirrored,
                                                                   normal
aggr0_A2   953.8GB   247.3GB   74% online       1 cluster1-02      raid4,
                                                                   mirrored,
                                                                   normal
aggr_data_A1 
           68.93TB   16.04TB   77% online      32 cluster1-01      mixed_raid_
                                                                   type,
                                                                   mirrored,
                                                                   hybrid,
                                                                   normal
aggr_data_A2 
           68.93TB   14.77TB   79% online      31 cluster1-02      mixed_raid_
                                                                   type,
                                                                   mirrored,
                                                                   hybrid,
                                                                   normal
4 entries were displayed.

5、查看节点信息

cluster1::> node show
Node      Health Eligibility Uptime        Model       Owner    Location  
--------- ------ ----------- ------------- ----------- -------- ---------------
cluster1-01 
          true   true        
                            369 days 19:12 FAS8040              gz_idc
cluster1-02 
          true   true        
                            369 days 19:23 FAS8040              gz_idc
2 entries were displayed.

6、查看版本信息

cluster1::> version
NetApp Release 8.3.2P9: Fri Jan 06 05:54:05 UTC 2017

7、查看序列号

cluster1::> system license show

Serial Number: 1-80-023992
Owner: cluster1
Package           Type    Description           Expiration
----------------- ------- --------------------- --------------------
Base              license Cluster Base License  -

Serial Number: 1-81-0000000000000451515******
Package           Type    Description           Expiration
----------------- ------- --------------------- --------------------
NFS               license NFS License           -
iSCSI             license iSCSI License         -

Serial Number: 1-81-0000000000000451515******
Owner: cluster1-02
Package           Type    Description           Expiration
----------------- ------- --------------------- --------------------
NFS               license NFS License           -
iSCSI             license iSCSI License         -
5 entries were displayed.

8、查看子系统健康状态

cluster1::> system health subsystem show
Subsystem         Health
----------------- ------------------
SAS-connect       ok
Environment       ok
Memory            ok
Service-Processor ok
Switch-Health     ok
CIFS-NDO          ok
Motherboard       ok
IO                ok
MetroCluster      ok
MetroCluster_Node ok
FHM-Switch        ok
FHM-Bridge        ok
12 entries were displayed.

9、查看MCC集群信息状态及节点信息状态

cluster1::> metrocluster show

Configuration: fabric

Cluster                        Configuration State    Mode
------------------------------ ---------------------- ------------------------
 Local: cluster1               configured             normal
Remote: cluster1_dr            configured             normal

cluster1::> metrocluster node show
DR                               Configuration  DR
Group Cluster Node               State          Mirroring Mode
----- ------- ------------------ -------------- --------- --------------------
1     cluster1
              cluster1-01        configured     enabled   normal
              cluster1-02        configured     enabled   normal
      cluster1_dr
              cluster1_dr-01     configured     enabled   normal
              cluster1_dr-02     configured     enabled   normal
4 entries were displayed.

10、查看控制器状态

cluster1::> system controller show
Controller Name           System ID     Serial Number     Model    Status      
------------------------- ------------- ----------------- -------- ----------- 
cluster1-01               536964819     451515******      FAS8040  ok
cluster1-02               536961600     451515******      FAS8040  ok
2 entries were displayed.

11、查看故障硬盘

cluster1::> storage disk show -broken 
There are no entries matching your query.

12、查看spare硬盘

cluster1::> storage disk show -spare  
Original Owner: cluster1-01                                           
  Checksum Compatibility: block
                                                            Usable Physical
    Disk            HA Shelf Bay Chan   Pool  Type    RPM     Size     Size Owner
    --------------- ------------ ---- ------ ----- ------ -------- -------- --------
    1.30.11         3a    30  11    A  Pool0   SAS  10000   1.09TB   1.09TB cluster1-01
    1.30.13         3a    30  13    A  Pool0   SAS  10000   1.09TB   1.09TB cluster1-01
    1.31.4          3a    31   4    A  Pool0   SAS  10000   1.09TB   1.09TB cluster1-01
    1.32.20         4b    32  20    B  Pool0   SAS  10000   1.09TB   1.09TB cluster1-01
    1.32.23         3a    32  23    A  Pool0   SAS  10000   1.09TB   1.09TB cluster1-01
    1.33.0          3a    33   0    A  Pool0   SAS  10000   1.09TB   1.09TB cluster1-01
    1.33.1          3a    33   1    A  Pool0   SAS  10000   1.09TB   1.09TB cluster1-01
    1.33.10         4b    33  10    B  Pool0   SAS  10000   1.09TB   1.09TB cluster1-01
    2.42.22         3a    42  22    A  Pool1   SAS  10000   1.09TB   1.09TB cluster1-01
    2.42.23         4b    42  23    B  Pool1   SAS  10000   1.09TB   1.09TB cluster1-01
    2.43.2          4b    43   2    B  Pool1   SAS  10000   1.09TB   1.09TB cluster1-01
    2.43.22         3b    43  22    A  Pool1   SAS  10000   1.09TB   1.09TB cluster1-01
    2.43.23         4b    43  23    B  Pool1   SAS  10000   1.09TB   1.09TB cluster1-01
    3.11.21         4b    11  21    B  Pool0   SSD      -  372.4GB  372.6GB cluster1-01
    4.20.21         3a    20  21    A  Pool1   SSD      -  372.4GB  372.6GB cluster1-01
    4.21.14         3a    21  14    A  Pool1   SAS  10000   1.09TB   1.09TB cluster1-01
Original Owner: cluster1-02
  Checksum Compatibility: block
                                                            Usable Physical
    Disk            HA Shelf Bay Chan   Pool  Type    RPM     Size     Size Owner
    --------------- ------------ ---- ------ ----- ------ -------- -------- --------
    2.44.23         3b    44  23    A  Pool1   SAS  10000   1.09TB   1.09TB cluster1-02
    3.12.21         4a    12  21    B  Pool0   SSD      -  372.4GB  372.6GB cluster1-02
    4.23.21         3b    23  21    A  Pool1   SSD      -  372.4GB  372.6GB cluster1-02
    5.60.23         3b    60  23    B  Pool1   SAS  10000   1.09TB   1.09TB cluster1-02
20 entries were displayed.

13、查看SAS桥故障

cluster1::> storage bridge show
                                       Is        Monitor
Bridge                   Symbolic Name Monitored Status  Vendor Model                 Bridge WWN
------------------------ ------------- --------- ------- ------ --------------------- ----------------
ATTO_10.0.15.17          BRIDGE_B_1
                                       true      ok      Atto   FibreBridge 6500N     2000001086627bc0
ATTO_10.0.15.18          BRIDGE_B_2
                                       true      ok      Atto   FibreBridge 6500N     2000001086630f0e
ATTO_10.0.15.19          BRIDGE_B_3
                                       true      ok      Atto   FibreBridge 6500N     2000001086630edc
ATTO_10.0.15.20          BRIDGE_B_4
                                       true      ok      Atto   FibreBridge 6500N     2000001086630ed2
ATTO_10.0.15.6           BRIDGE_A_1
                                       true      ok      Atto   FibreBridge 6500N     2000001086630eb4
ATTO_10.0.15.7           BRIDGE_A_2
                                       true      ok      Atto   FibreBridge 6500N     2000001086630efa
ATTO_10.0.15.8           BRIDGE_A_3
                                       true      ok      Atto   FibreBridge 6500N     2000001086630f18
ATTO_10.0.15.9           BRIDGE_A_4
                                       true      ok      Atto   FibreBridge 6500N     2000001086630ef0
ATTO_FibreBridge6500N_10 -
                                       false     -       Atto   FibreBridge6500N      200000108663e514
ATTO_FibreBridge6500N_11 -
                                       false     -       Atto   FibreBridge6500N      200000108663e3f2
ATTO_FibreBridge6500N_12 -
                                       false     -       Atto   FibreBridge6500N      200000108663e488
ATTO_FibreBridge6500N_13 -
                                       false     -       Atto   FibreBridge6500N      20000010866114ec
ATTO_FibreBridge6500N_14 -
                                       false     -       Atto   FibreBridge6500N      2000001086627bc0
ATTO_FibreBridge6500N_7  -
                                       false     -       Atto   FibreBridge6500N      2000001086630e96
ATTO_FibreBridge6500N_9  -
                                       false     -       Atto   FibreBridge6500N      200000108663e4c4
15 entries were displayed.

14、查看纤交换机故障

cluster1::> storage switch show
                      Symbolic                                Is        Monitor
Switch                Name     Vendor  Model Switch WWN       Monitored Status
--------------------- -------- ------- ----- ---------------- --------- -------
Brocade_10.0.15.10
                      SW_A_1
                               Brocade Brocade6505
                                             100050eb1a88327f true      ok
Brocade_10.0.15.11
                      SW_A_2
                               Brocade Brocade6505
                                             100050eb1a881582 true      ok
Brocade_10.0.15.21
                      SW_B_3
                               Brocade Brocade6505
                                             100050eb1a882f69 true      ok
Brocade_10.0.15.22
                      SW_B_4
                               Brocade Brocade6505
                                             100050eb1a881522 true      ok
4 entries were displayed.

15、查看failover状态

cluster1::> storage failover show 
                              Takeover          
Node           Partner        Possible State Description  
-------------- -------------- -------- -------------------------------------
cluster1-01    cluster1-02    true     Connected to cluster1-02
cluster1-02    cluster1-01    true     Connected to cluster1-01
2 entries were displayed.

16、查看严重告警日志及错误告警日志

cluster1::> event log show -severity critical 
There are no entries matching your query.

cluster1::> event log show -severity error
Time                Node             Severity      Event
------------------- ---------------- ------------- ---------------------------
3/6/2018 02:28:30   cluster1-02      ERROR         asup.post.drop: AutoSupport message (HA Group Notification from cluster1-02 (MANAGEMENT_LOG) INFO) for host (0) was not posted to NetApp. The system will drop the message.
3/6/2018 01:28:18   cluster1-02      ERROR         asup.post.drop: AutoSupport message (HA Group Notification from cluster1-02 (PERFORMANCE DATA) INFO) for host (0) was not posted to NetApp. The system will drop the message.
3/6/2018 00:00:07   cluster1-02      ERROR         mgmtgwd.certificate.expired: A digital certificate with Fully Qualified Domain Name (FQDN) cluster1, Serial Number 5589765F, Certificate Authority 'cluster1' and type server for Vserver cluster1 has expired.
3/6/2018 00:00:07   cluster1-02      ERROR         mgmtgwd.certificate.expired: A digital certificate with Fully Qualified Domain Name (FQDN) UC_SVM2, Serial Number 55A03966, Certificate Authority 'SVM2' and type server for Vserver SVM2 has expired.
3/6/2018 00:00:07   cluster1-02      ERROR         mgmtgwd.certificate.expired: A digital certificate with Fully Qualified Domain Name (FQDN) UC_SVM, Serial Number 559FFD76, Certificate Authority 'SVM' and type server for Vserver SVM has expired.
3/6/2018 00:00:07   cluster1-02      ERROR         mgmtgwd.certificate.expired: A digital certificate with Fully Qualified Domain Name (FQDN) UCS_SVM_DR, Serial Number 545845C16E278, Certificate Authority 'SVM_DR' and type server for Vserver SVM_DR-mc has expired.
3/6/2018 00:00:07   cluster1-02      ERROR         mgmtgwd.certificate.expired: A digital certificate with Fully Qualified Domain Name (FQDN) UCS_SVM2_DR, Serial Number 545845A7B01FA, Certificate Authority 'SVM2_DR' and type server for Vserver SVM2_DR-mc has expired.
7 entries were displayed.

 17、查看某个聚合下的Volume状态信息
cluster1::> vol show -aggregate aggr_data_A1

 18、查看Lun信息及Lun详细信息

cluster1::> lun show
cluster1::> lun show -v

 19、查看map信息及map详情

cluster1::> igroup show
cluster1::> igroup show -v

 20、查看Lun的map情况

cluster1::> lun show -m

21、进入某一节点

cluster1::> run -node cluster1-01 
Type 'exit' or 'Ctrl-D' to return to the CLI
cluster1-01>

 22、节点下查看spare disks

cluster1-01> vol status -s

Local spares

Pool1 spare disks

RAID Disk       Device                  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------       ------                  ------------- ---- ---- ---- ----- --------------    --------------
Spare disks for block checksum
spare           SW_B_3:6.126L41         3a    21  14  FC:A   1   SAS 10000 1142352/2339537408 1144641/2344225968 (not zeroed)
spare           SW_B_3:7.126L75         3a    42  22  FC:A   1   SAS 10000 1142352/2339537408 1144641/2344225968 
spare           SW_B_3:7.126L101        3b    43  22  FC:A   1   SAS 10000 1142352/2339537408 1144641/2344225968 
spare           SW_B_4:7.126L76         4b    42  23  FC:B   1   SAS 10000 1142352/2339537408 1144641/2344225968 
spare           SW_B_4:7.126L29         4b    43  2   FC:B   1   SAS 10000 1142352/2339537408 1144641/2344225968 
spare           SW_B_4:7.126L50         4b    43  23  FC:B   1   SAS 10000 1142352/2339537408 1144641/2344225968 
spare           SW_B_3:6.126L22         3a    20  21  FC:A   1   SSD   N/A 381304/780910592  381554/781422768 

Pool0 spare disks

RAID Disk       Device                  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------       ------                  ------------- ---- ---- ---- ----- --------------    --------------
Spare disks for block checksum
spare           SW_A_1:7.126L12         3a    30  11  FC:A   0   SAS 10000 1142352/2339537408 1144641/2344225968 
spare           SW_A_1:7.126L14         3a    30  13  FC:A   0   SAS 10000 1142352/2339537408 1144641/2344225968 
spare           SW_A_1:7.126L31         3a    31  4   FC:A   0   SAS 10000 1142352/2339537408 1144641/2344225968 
spare           SW_A_1:7.126L76         3a    32  23  FC:A   0   SAS 10000 1142352/2339537408 1144641/2344225968 
spare           SW_A_1:7.126L79         3a    33  0   FC:A   0   SAS 10000 1142352/2339537408 1144641/2344225968 
spare           SW_A_1:7.126L80         3a    33  1   FC:A   0   SAS 10000 1142352/2339537408 1144641/2344225968 
spare           SW_A_2:7.126L73         4b    32  20  FC:B   0   SAS 10000 1142352/2339537408 1144641/2344225968 
spare           SW_A_2:7.126L37         4b    33  10  FC:B   0   SAS 10000 1142352/2339537408 1144641/2344225968 
spare           SW_A_2:6.126L74         4b    11  21  FC:B   0   SSD   N/A 381304/780910592  381554/781422768

 23、节点下查看fail disk

cluster1-01> vol status -f

Broken disks (empty)

 24、显示没有ownership(归属权)的硬盘

cluster1-01> disk show -n

disk show : No unassigned disks

 25、分配硬盘的归属(硬盘更换常用)

cluster1-01> disk assign all

  26、查看所有硬盘位置信息

cluster1-01> storage show disk -p
展开阅读全文
打赏
0
0 收藏
分享
加载中
更多评论
打赏
0 评论
0 收藏
0
分享
返回顶部
顶部