drainer 异常宕机原因分析

原创
01/11 00:00
阅读数 17

作者: Raymond 原文来源:https://tidb.net/blog/a8f3e65c

一、测试背景

最近在笔者个人的测试环境中突然发现一个drainer节点宕机了,且无法拉起,通过查看drainer的日志的关键字

table xxx not found,于是笔者猜测这应该和tidb节点的ddl变更有关系,于是看了下ddl的变更历史记录发现是对某个表做了交换分区的处理,难道是drainer 不兼容交换分区功能导致?通过查阅官方文档https://docs.pingcap.com/zh/tidb/v6.5/basic-features#分区 发现交换分区是在6.5的版本中正式引入的,恰巧笔者的环境也是6.5.3,所以笔者猜测还真有可能是交换分区导致的,于是带着这个好奇心重新测试了下,在这里简单说明下交换分区的作用,交换分区的作用是交换分区和非分区表,类似于rename table,具体的说明可以查阅官方文档https://docs.pingcap.com/zh/tidb/v6.5/partitioned-table#分区管理

二、测试过程

测试版本:v6.5.3

测试过程:

1.新建1个分区表e

表结构来源于
https://dev.mysql.com/doc/refman/8.0/en/partitioning-management-exchange.html
CREATE TABLE e (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30)
)
    PARTITION BY RANGE (id) (
        PARTITION p0 VALUES LESS THAN (50),
        PARTITION p1 VALUES LESS THAN (100),
        PARTITION p2 VALUES LESS THAN (150),
        PARTITION p3 VALUES LESS THAN (MAXVALUE)
);
查询该表的table id 为93,然后查询每个分区的分区id
mysql> select * from information_schema.tables where table_name = 'e'\G
*************************** 1. row ***************************
             TABLE_CATALOG: def
              TABLE_SCHEMA: test
                TABLE_NAME: e
                TABLE_TYPE: BASE TABLE
                    ENGINE: InnoDB
                   VERSION: 10
                ROW_FORMAT: Compact
                TABLE_ROWS: 4
            AVG_ROW_LENGTH: 19
               DATA_LENGTH: 77
           MAX_DATA_LENGTH: 0
              INDEX_LENGTH: 0
                 DATA_FREE: 0
            AUTO_INCREMENT: NULL
               CREATE_TIME: 2023-09-03 10:27:19
               UPDATE_TIME: NULL
                CHECK_TIME: NULL
           TABLE_COLLATION: utf8mb4_bin
                  CHECKSUM: NULL
            CREATE_OPTIONS: partitioned
             TABLE_COMMENT:
             TIDB_TABLE_ID: 93
 TIDB_ROW_ID_SHARDING_INFO: NOT_SHARDED
              TIDB_PK_TYPE: NONCLUSTERED
TIDB_PLACEMENT_POLICY_NAME: NULL
1 row in set (0.00 sec)

mysql> select * from information_schema.PARTITIONS where TABLE_NAME ='e'\G
*************************** 1. row ***************************
                TABLE_CATALOG: def
                 TABLE_SCHEMA: test
                   TABLE_NAME: e
               PARTITION_NAME: p0
            SUBPARTITION_NAME: NULL
   PARTITION_ORDINAL_POSITION: 1
SUBPARTITION_ORDINAL_POSITION: NULL
             PARTITION_METHOD: RANGE
          SUBPARTITION_METHOD: NULL
         PARTITION_EXPRESSION: `id`
      SUBPARTITION_EXPRESSION: NULL
        PARTITION_DESCRIPTION: 50
                   TABLE_ROWS: 1
               AVG_ROW_LENGTH: 20
                  DATA_LENGTH: 20
              MAX_DATA_LENGTH: 0
                 INDEX_LENGTH: 0
                    DATA_FREE: 0
                  CREATE_TIME: 2023-09-03 10:27:19
                  UPDATE_TIME: NULL
                   CHECK_TIME: NULL
                     CHECKSUM: NULL
            PARTITION_COMMENT:
                    NODEGROUP: NULL
              TABLESPACE_NAME: NULL
            TIDB_PARTITION_ID: 94
   TIDB_PLACEMENT_POLICY_NAME: NULL
   由此可知e表p0分区的id为94

2.新建1个非分区表e2

CREATE TABLE e2 (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30)

);
mysql> select * from information_schema.tables where table_name = 'e2'\G
*************************** 1. row ***************************
             TABLE_CATALOG: def
              TABLE_SCHEMA: test
                TABLE_NAME: e2
                TABLE_TYPE: BASE TABLE
                    ENGINE: InnoDB
                   VERSION: 10
                ROW_FORMAT: Compact
                TABLE_ROWS: 0
            AVG_ROW_LENGTH: 0
               DATA_LENGTH: 0
           MAX_DATA_LENGTH: 0
              INDEX_LENGTH: 0
                 DATA_FREE: 0
            AUTO_INCREMENT: NULL
               CREATE_TIME: 2023-09-03 10:32:47
               UPDATE_TIME: NULL
                CHECK_TIME: NULL
           TABLE_COLLATION: utf8mb4_bin
                  CHECKSUM: NULL
            CREATE_OPTIONS:
             TABLE_COMMENT:
             TIDB_TABLE_ID: 105
 TIDB_ROW_ID_SHARDING_INFO: NOT_SHARDED
              TIDB_PK_TYPE: NONCLUSTERED
TIDB_PLACEMENT_POLICY_NAME: NULL
1 row in set (0.01 sec)
由此可知e2的table id为105

3.分区表与非分区表进行交换

INSERT INTO e VALUES
    (1669, "Jim", "Smith"),
    (337, "Mary", "Jones"),
    (16, "Frank", "White"),
    (2005, "Linda", "Black");
 
    

mysql> select * from e;
+------+-------+-------+
| id   | fname | lname |
+------+-------+-------+
| 1669 | Jim   | Smith |
|  337 | Mary  | Jones |
| 2005 | Linda | Black |
| 1669 | Jim   | Smith |
|  337 | Mary  | Jones |
| 2005 | Linda | Black |
|   16 | Frank | White |
|   16 | Frank | White |
+------+-------+-------+
8 rows in set, 1 warning (0.05 sec)
 ALTER TABLE e EXCHANGE PARTITION p0 WITH TABLE e2;
将e表的p0分区的数据交换到e2表
mysql> select * from e2;
+----+-------+-------+
| id | fname | lname |
+----+-------+-------+
| 16 | Frank | White |
| 16 | Frank | White |
+----+-------+-------+
2 rows in set (0.00 sec)

mysql> select * from e;
+------+-------+-------+
| id   | fname | lname |
+------+-------+-------+
| 1669 | Jim   | Smith |
|  337 | Mary  | Jones |
| 2005 | Linda | Black |
| 1669 | Jim   | Smith |
|  337 | Mary  | Jones |
| 2005 | Linda | Black |
+------+-------+-------+
6 rows in set, 1 warning (0.01 sec)

由此可知,交换分区成功

4.交换后table id发生的变化

mysql> select * from information_schema.tables where table_name = 'e2'\G
*************************** 1. row ***************************
             TABLE_CATALOG: def
              TABLE_SCHEMA: test
                TABLE_NAME: e2
                TABLE_TYPE: BASE TABLE
                    ENGINE: InnoDB
                   VERSION: 10
                ROW_FORMAT: Compact
                TABLE_ROWS: 2
            AVG_ROW_LENGTH: 20
               DATA_LENGTH: 40
           MAX_DATA_LENGTH: 0
              INDEX_LENGTH: 0
                 DATA_FREE: 0
            AUTO_INCREMENT: NULL
               CREATE_TIME: 2023-09-03 10:38:24
               UPDATE_TIME: NULL
                CHECK_TIME: NULL
           TABLE_COLLATION: utf8mb4_bin
                  CHECKSUM: NULL
            CREATE_OPTIONS:
             TABLE_COMMENT:
             TIDB_TABLE_ID: 94
 TIDB_ROW_ID_SHARDING_INFO: NOT_SHARDED
              TIDB_PK_TYPE: NONCLUSTERED
TIDB_PLACEMENT_POLICY_NAME: NULL
1 row in set (0.00 sec)
mysql> select * from information_schema.PARTITIONS where TABLE_NAME ='e'\G
*************************** 1. row ***************************
                TABLE_CATALOG: def
                 TABLE_SCHEMA: test
                   TABLE_NAME: e
               PARTITION_NAME: p0
            SUBPARTITION_NAME: NULL
   PARTITION_ORDINAL_POSITION: 1
SUBPARTITION_ORDINAL_POSITION: NULL
             PARTITION_METHOD: RANGE
          SUBPARTITION_METHOD: NULL
         PARTITION_EXPRESSION: `id`
      SUBPARTITION_EXPRESSION: NULL
        PARTITION_DESCRIPTION: 50
                   TABLE_ROWS: 0
               AVG_ROW_LENGTH: 0
                  DATA_LENGTH: 0
              MAX_DATA_LENGTH: 0
                 INDEX_LENGTH: 0
                    DATA_FREE: 0
                  CREATE_TIME: 2023-09-03 10:27:19
                  UPDATE_TIME: NULL
                   CHECK_TIME: NULL
                     CHECKSUM: NULL
            PARTITION_COMMENT:
                    NODEGROUP: NULL
              TABLESPACE_NAME: NULL
            TIDB_PARTITION_ID: 105
   TIDB_PLACEMENT_POLICY_NAME: NULL
由此可知,交换分区后,e2的table id变成了原来e表p0分区的table id 94,而e表p0分区的tidb_partition_id   变成了原来e2表的
105,也就是分区交换后,table id也进行了互换                                         

三、测试结果

那么进行分区交换后导致的tabld id 也发生了变化会带来什么后果呢?当对e2进行一个ddl时,比如说删除e2表

drop table e2;

那么drainer 是否可以正常处理这个ddl,通过查看的drainer.log,可以看到drainer 已经开始报错了,而报错的原因是因为table 94 not found,这也是就是drainer 暂时无法支持这种如果表是交换分区后产生的特殊处理,而且会导致drainer进程无法正常运行

[2023/09/03 10:41:04.223 +08:00] [ERROR] [main.go:69] ["start drainer server failed"] [error="handle ddl job ID:108, Type:drop table, State:synced, SchemaState:none, SchemaID:85, TableID:94, RowCount:0, ArgLen:0, start time: 2023-09-03 10:41:02.807 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0 failed, the schema info: xxxx 
table 94 not found"]

四、结论

通过以上的测试我们可以得出目前drainer组件其实是不兼容交换分区的处理,所以在日常的使用过程中我们要注意这点,在官方没有完善drainer的这个兼容性后,如果应用层面无法避免使用交换分区的逻辑,那么在TiDB层面应该使用br log来代替drainer的使用 https://docs.pingcap.com/zh/tidb/v6.5/br-pitr-guide。

展开阅读全文
加载中
点击引领话题📣 发布并加入讨论🔥
打赏
0 评论
0 收藏
0
分享
返回顶部
顶部