# 从库延迟案例分析

04/12 10:58

## 过程分析

$grep 010559 100werror3306.log | tail -n 3 2024-01-31T14:07:50.172007+08:00 6806 [Note] [MY-010559] [Repl] Multi-threaded slave statistics for channel 'cluster': seconds elapsed = 120; events assigned = 3318582273; worker queues filled over overrun level = 207029; waite d due a Worker queue full = 238; waited due the total size = 0; waited at clock conflicts = 348754579743300 waited (count) when Workers occupied = 34529247 waited when Workers occupied = 76847369713200 2024-01-31T14:09:50.078829+08:00 6806 [Note] [MY-010559] [Repl] Multi-threaded slave statistics for channel 'cluster': seconds elapsed = 120; events assigned = 3319256065; worker queues filled over overrun level = 207029; waite d due a Worker queue full = 238; waited due the total size = 0; waited at clock conflicts = 348851330164000 waited (count) when Workers occupied = 34535857 waited when Workers occupied = 76866419841900 2024-01-31T14:11:50.060510+08:00 6806 [Note] [MY-010559] [Repl] Multi-threaded slave statistics for channel 'cluster': seconds elapsed = 120; events assigned = 3319894017; worker queues filled over overrun level = 207029; waite d due a Worker queue full = 238; waited due the total size = 0; waited at clock conflicts = 348943740455400 waited (count) when Workers occupied = 34542790 waited when Workers occupied = 76890229805500  上述信息的详细解释，可以参考MTS性能监控你知道多少 去掉了发生次数比较少的统计，显示了一些关键数据的对比 可以发现自然时间120，回放的协调线程有90多秒由于无法并行回放而进入等待，有近20秒是由于没有空闲的work线程进入等待，折算下来协调线程工作的时间只有10秒左右。 ### 并行度统计 众所周知，mysql从库并行回放主要依赖于binlog中的last_commmitted来做判断，如果事务的last_committed相同，则基本上可以认为这些事务可以并行回放，下面从环境中获取一个relay log进行并行回放的大概统计 $ mysqlsqlbinlog --no-defaults 046638 |grep -o 'last_committed.*' | sed 's/=/ /g' | awk '{print $2}' |sort -n | uniq -c |awk 'BEGIN {print "last_commited group_count Percentage"} {count[$2]=$1 ; sum+=$1} END {for (i in count) printf "%d %d %.2f%%\n", i, count[i], (count[i]/sum)*100|"sort -k 1,1n"}' | awk '{if($2>=1 &&$2 <11){sum+=$2}} END {print sum}' 235703$ mysqlsqlbinlog --no-defaults 046638 |grep -o 'last_committed.*' | sed 's/=/ /g' | awk '{print $2}' |sort -n | uniq -c |awk 'BEGIN {print "last_commited group_count Percentage"} {count[$2]=$1 ; sum+=$1} END {for (i in count) printf "%d %d %.2f%%\n", i, count[i], (count[i]/sum)*100|"sort -k 1,1n"}' | awk '{if($2>10){sum+=$2}} END {print sum}'
314694


$mysqlbinlog --no-defaults 046638 |grep -o 'last_committed.*' | sed 's/=/ /g' | awk '{print$2}' |sort -n | uniq -c |awk 'BEGIN {print "last_commited group_count Percentage"} {count[$2]=$1; sum+=$1} END {for (i in count) printf "%d %d %.2f%%\n", i, count[i], (count[i]/sum)*100|"sort -k 1,1n"}' | awk '{if($2>=1 && $2 <11) {print$2}}' | sort | uniq -c
200863 1
17236 2
98 3
13 4
3 5
1 7


$mysqlsqlbinlog --no-defaults 046638 |grep -o 'last_committed.*' | sed 's/=/ /g' | awk '{print$2}' |sort -n | uniq -c |awk 'BEGIN {print "last_commited group_count Percentage"} {count[$2]=$1
; sum+=$1} END {for (i in count) printf "%d %d %.2f%%\n", i, count[i], (count[i]/sum)*100|"sort -k 1,1n"}' | awk '{if($2>11){print \$0}}' | column -t
last_commited  group_count  Percentage
1              7340         1.33%
11938          7226         1.31%
23558          7249         1.32%
35248          6848         1.24%
46421          7720         1.40%
59128          7481         1.36%
70789          7598         1.38%
82474          6538         1.19%
93366          6988         1.27%
104628         7968         1.45%
116890         7190         1.31%
128034         6750         1.23%
138849         7513         1.37%
150522         6966         1.27%
161989         7972         1.45%
175599         8315         1.51%
189320         8235         1.50%
202845         8415         1.53%
218077         8690         1.58%
234248         8623         1.57%
249647         8551         1.55%
264860         8958         1.63%
280962         8900         1.62%
297724         8768         1.59%
313092         8620         1.57%
327972         9179         1.67%
344435         8416         1.53%
359580         8924         1.62%
375314         8160         1.48%
390564         9333         1.70%
407106         8637         1.57%
422777         8493         1.54%
438500         8046         1.46%
453607         8948         1.63%
470939         8553         1.55%
486706         8339         1.52%
503562         8385         1.52%
520179         8313         1.51%
535929         7546         1.37%


### last_committed机制介绍

1. 基础算法为COMMIT_ORDER
2. WRITESET算法是在COMMIT_ORDER基础上再计算一次
3. SESSION_WRITESET算法是在WRITESET基础上再计算一次

#### COMMIT_ORDER

COMMIT_ORDER计算规则：如果两个事务在主节点上是同时提交的，说明两个事务的数据之间没有冲突，那么一定也是可以在从节点上并行执行的，理想中的典型案例如下面的例子

session-1 session-2
BEGIN BEGIN
INSERT t1 values(1)
INSERT t2 values(2)
commit (group_commit) commit (group_commit)

session-1 session-2
BEGIN BEGIN
INSERT t1 values(1)
INSERT t2 values(2)
commit
commit

session-1 session-2 session-3
BEGIN BEGIN BEGIN
INSERT t1 values(1) INSERT t2 values(1) INSERT t3 values(1)
INSERT t1 values(2) INSERT t2 values(2)
commit
INSERT t1 values(3)
commit
commit

#### WRITESET

1. 非DDL语句或者表具有主键或者唯一键或者空事务
2. 当前session使用的hash算法与hash map中的一致
3. 未使用外键
4. hash map的容量未超过binlog_transaction_dependency_history_size的设置 以上4个条件均满足时，则可以使用WRITESET算法，如果有任意一个条件不满足，则会退化为COMMIT_ORDER计算方式

1. last_committed设置为m_writeset_history_start，此值为m_writeset_history列表中最小的sequence_number

2. 遍历事务的writeset列表

a 如果某个writeset在全局m_writeset_history中不存在，构建一个pair<writeset, 当前事务的sequence_number>对象，插入到全局m_writeset_history列表中

b. 如果存在，那么last_committed=max(last_committed, 历史writeset的sequence_number值)，并同时更新m_writeset_history中该writeset对应的sequence_number为当前事务值

3. 如果has_missing_keys=false，即事务所有数据表均包含主键或者唯一索引，则最后取commit_order和writeset两种方式计算的最小值作为最终的last_committed值

TIPS：基于上面WRITESET规则，就会出现后提交的事务的last_committed比先提交的事务还小的情况

## 结论分析

### 结论描述

1. 单last_committed的事务中涉及到的数据和sequence_number存在数据冲突
2. 单last_committed的事务中涉及到的表存在无主键的情况，而且这种事务特别多

### 优化措施

1. 从业务侧对表做改造，在允许的情况下给相关表都添加上主键。
2. 尝试调大参数binlog_group_commit_sync_delay、binlog_group_commit_sync_no_delay_count从0修改为10000，由于特殊环境限制，该调整并未生效，不同的场景可能会有不同的表现。

Enjoy GreatSQL :)

## 关于 GreatSQL

GreatSQL是适用于金融级应用的国内自主开源数据库，具备高性能、高可靠、高易用性、高安全等多个核心特性，可以作为MySQL或Percona Server的可选替换，用于线上生产环境，且完全免费并兼容MySQL或Percona Server。

## GreatSQL社区：

（对文章有疑问或者有独到见解都可以去社区官网提出或分享哦~）

QQ群：533341697

0 评论
2 收藏
0