你应该了解NewSQL:CockroachDB验证文档

原创
2017/09/29 10:38
阅读数 1.2K

方案八:CockroachDB

cockroachdb是分布式、支持事务、支持SQL操作、K-V存储模式的数据库。CockroachDB的三位创始人全部来自Google,其架构受到Google的 Spanner和F1的启发,cockroach开源地址输入图片说明 具有:

  • 标准SQL接口, 使用PostgreSQL协议,支持标准SQL接口,兼容关系型数据库SQL生态;
  • 扩展能力强、高并发,支持类MPP并行查询框架;
  • 弹性扩容,持按需扩容, 自动负载均衡;
  • 多副本强一致,使用raft算法保证数据一致性;
  • 服务高可用,上去中心化,无SPOF;
  • 分布式事务,基于MVCC实现事务控制,支持SI和SSI两种隔离级别;

调研

建表

DROP TABLE IF EXISTS "tracks";  
CREATE TABLE IF NOT EXISTS "tracks" (  
"id"  SERIAL PRIMARY KEY ,  
"third_tracks_id" varchar(32)  NOT NULL DEFAULT '' ,  
"tracks_title" varchar(255) NOT NULL DEFAULT '' ,  
"tracks_title_other" varchar(255)  NOT NULL DEFAULT '',  
"tracks_title_py" varchar(64) NOT NULL DEFAULT '' ,  
"data_source" bigint DEFAULT 1 NOT NULL,  
"tags" varchar(255)  NOT NULL DEFAULT '',  
"duration" bigint DEFAULT 0 NOT NULL,  
"status" int DEFAULT 0 NOT NULL,  
"pa" int DEFAULT 0 NOT NULL,  
"announcer_name" varchar(255)  NOT NULL DEFAULT '',  
"anchor_name" varchar(255)  NOT NULL DEFAULT '',
"play_count" bigint DEFAULT 0 NOT NULL,
"own_count" bigint DEFAULT 0 NOT NULL,
"paid" int DEFAULT 0 NOT NULL,
"info" text NOT NULL,
"created_at" timestamp NOT NULL,
"updated_at" timestamp NOT NULL,
"data_updated" bigint NOT NULL,
"created" timestamp NOT NULL,
"updated" timestamp NOT NULL,
"announcer_id" varchar(256) NOT NULL DEFAULT '',
"anchor_id" varchar(256) NOT NULL DEFAULT '',
UNIQUE INDEX "idx_thirdTrackId_dataSource" (third_tracks_id ASC, data_source ASC),
INDEX "idx_announcerid_status_paid_playcount" (announcer_id ASC, status ASC, paid ASC, play_count DESC)
);

注意点

1.暂不支持数据表字段加注释; 2.不支持大量数据删除:

> DELETE FROM tracks where id <100000;
pq: kv/txn_coord_sender.go:428: transaction is too large to commit: 189948 intents

因为需要强一致性,如果删除大量数据会导致集群延时变大,如果需要删除大量数据可以采用:分段删除

alter table tracks rename to tracks_0907;
for (i=1;i<count(*);i+=2000){
	DELETE FROM tracks_0907 where id <= i;
}

性能圧测

圧测工具 go语言实现圧测脚本

圧测方法 三台机器执行圧测脚本,每一次圧测时长5-10min,每行记录约为1.6kb。因为测试集群为3台性能如下:

M02-XI3
整机SN216486580
CPU【INTEL Xeon E5-2650 V4 12C 2.2GHZ】*2
内存【LANGCHAO PC4-19200 16G】*8
硬盘【LANGCHAO SATA 3T 7.2K】*4
FLASH【LANGCHAO NVMe SSD 800G】*1
网卡【LANGCHAO INTEL 82599】*1
加速卡
RAID无硬件RAID卡

尽可能模拟将线上数据从MySQL迁移到NewSQL的场景。很多人看到mysql或者其他的数据库性能测试报告时候,看到qps都是几万左右,但是大家注意看就会发现测试的单行记录才50byte,与线上的数据是不相符的。

3台比较好的连接数为1000左右,所以测试会以1000并发为限。

圧测表格

序号 SQL 数据量 并发 qps 99分位延时 90分位延时 SQL Byte Traffic 备注
1 insert 1500w 100 666 152ms 117ms 832kb 18个newsql索引
2 insert 1500w 300 687 352ms 187ms 872kb 18个newsql索引
3 insert 1500w 900 778 700ms 1500ms 1.2MB 18个newsql索引
4 insert 1500w 1000 807 1500s 1200ms 1.3MB 18个newsql索引
5 insert 1500w 100 1051 42ms 12ms 1.2MB 1个newsql索引
6 insert 1500w 300 2254 92ms 32ms 3.2MB 1个newsql索引
7 insert 1500w 600 4021 130ms 56ms 6.1MB 1个newsql索引
8 insert 1500w 900 5938 250ms 148ms 8.4MB 1个newsql索引
9 insert 1500w 1000 6125 270ms 171ms 8.7MB 1个newsql索引
10 select * from tracks WHERE id = 随机id AND status = 0 1500w 300 5625 30ms 10ms 7.3MB 主键索引
11 select * from tracks WHERE id = 随机id AND status = 0 1500w 600 8713 45ms 8ms 11.7MB 主键索引
12 select * from tracks WHERE id = 随机id AND status = 0 1500w 1000 12320 160ms 130ms 16.2MB 主键索引
13 select * from tracks WHERE id (随机20ids) AND status = 0 1500w 300 2134 200ms 140ms 29.7MB 主键索引
14 select * from tracks WHERE id (随机20ids) AND status = 0 1500w 600 2526 420ms 350ms 34.1MB 主键索引
15 select * from tracks WHERE id (随机20ids) AND status = 0 1500w 1000 2650 771ms 640ms 36.1MB 主键索引
16 select * from tracks WHERE id (随机50ids) AND status = 0 1500w 300 714 670ms 540ms 23.2MB 主键索引
17 select * from tracks WHERE id (随机50ids) AND status = 0 1500w 600 672 1700ms 1300ms 21.4MB 主键索引
18 select * from tracks WHERE id (随机50ids) AND status = 0 1500w 600 757 3000ms 2490ms 24.7MB 主键索引
19 SELECT *FROM "tracks" WHERE "third_tracks_id" IN (随机1个id) AND "data_source" = 随机公司 AND "status" = 0 1500w 300 5553 40ms 5ms 4.1MB (third_tracks_id , data_source )索引
20 SELECT *FROM "tracks" WHERE "third_tracks_id" IN (随机1个id) AND "data_source" = 随机公司 AND "status" = 0 1500w 600 8624 120ms 18ms 6.0MB (third_tracks_id , data_source )索引
21 SELECT *FROM "tracks" WHERE "third_tracks_id" IN (随机1个id) AND "data_source" = 随机公司 AND "status" = 0 1500w 1000 1145 310ms 90ms 8.7MB (third_tracks_id , data_source )索引
22 SELECT *FROM "tracks" WHERE "announcer_id" IN (随机1个id) AND "paid" = 0 AND "status" = 0 order by play_count DESC 1500w 300 5493 160ms 3ms 5.1MB (announcer_id ASC, status ASC, paid ASC, play_count DESC)索引
23 SELECT *FROM "tracks" WHERE "announcer_id" IN (随机1个id) AND "paid" = 0 AND "status" = 0 order by play_count DESC 1500w 600 7825 283ms 23ms 7.5MB (announcer_id ASC, status ASC, paid ASC, play_count DESC)索引
24 SELECT *FROM "tracks" WHERE "announcer_id" IN (随机1个id) AND "paid" = 0 AND "status" = 0 order by play_count DESC 1500w 1000 11171 310ms 68ms 10.5MB (announcer_id ASC, status ASC, paid ASC, play_count DESC )索引
25 select * from tracks WHERE id = 随机id AND status = 0 3000w 300 5614 123ms 3ms 8.3MB 主键索引
26 select * from tracks WHERE id = 随机id AND status = 0 3000w 600 8723 130ms 8ms 12.4MB 主键索引
27 select * from tracks WHERE id = 随机id AND status = 0 3000w 1000 10764 320ms 30ms 16.2MB 主键索引
28 select * from tracks WHERE id = 随机id AND status = 0 5000w 300 5136 159ms 8ms 7.8MB 主键索引
29 select * from tracks WHERE id = 随机id AND status = 0 5000w 600 8463 180ms 13ms 11.8MB 主键索引
30 select * from tracks WHERE id = 随机id AND status = 0 5000w 1000 10848 220ms 26ms 16.3MB 主键索引

数据分析

  1. 与MySQL等数据库一样,存在索引时候insert的数据会慢上许多,但是在cockroachDB中会更加明显一些。
    输入图片说明 图中可以得出:
  • 大量索引对insert的影响是巨大,会导致写库操作qps大降,在提高并发数后qps亦没有显著提升;
  • 在仅有主键索引时候,insert的qps随着并发数的提升得到相应的提升,到了6000qps后再提升并发数效果就不再明显了。
  1. 通过对主键id的获取数据量不同的压测,可以得到如下图:
    输入图片说明
    图中可以得出:
  • id数越多需要查询的range就越多qps就越低;
  • id数越多得到的数据量也就越多,网络IO变大,性能也会有所下降;
  1. 对比主键索引和复合索引的查询效率,可以得到:
    输入图片说明
    图中可以得出:
  • 有索引的查询效率会有大幅提升,并且主键索引、唯一索引与复合索引的查询效率基本相同;
  • 主键查询效率最优;
  1. 对比在不同数据记录数中查询效率:
    输入图片说明
    图中可以得出:
  • 表中数据量的大小对qps影响不是很大,起码在千万级别是可以接受的;
  • 表中数据量越少速度就越快;
展开阅读全文
打赏
2
10 收藏
分享
加载中
更多评论
打赏
0 评论
10 收藏
2
分享
返回顶部
顶部