参赛日记day17-tidb性能竞赛-tikv/pd#2950

原创
2020/10/27 23:28
阅读数 286

参赛日记day17-tidb性能竞赛-tikv/pd#2950

changelog

  • 2020/10/27 day17 笔记转移到语雀了,用来单纯做技术笔记不错,左边是笔记,右边是流程图
  • 2020/10/26 day16 继续翻译yugabytedb,发现如果任务分散到每天而不是想一天全翻译完,每天的理解会前一天更多一些
  • 2020/10/25 day14 翻译完yugabytedb
  • 2020/10/24 day13 中断
  • 2020/10/23 day12 中断
  • 2020/10/22 day11 熟悉了下redis-py的使用,之前没接触过redis
  • 2020/10/21 day10 酱油
  • 2020/10/20 day9晚上应该用来翻译yugabytedb的Colocated tables的,结果想把整章翻译完,最后拖来拖去一个字没翻译,所以还是先翻译Colocated tables这个小结比较靠谱,也不要急着运行代码。Colocated tables看完了应该就可以依葫芦画瓢写文档了。
  • 2020/10/19 day8用力过猛,熬夜把zoom讲解看完了,笔记流程图放在前面笔记里了。画流程图还是有点效果的。

背景补充

阅读中文资料YugabyteDB 介绍 - 知乎

翻译Colocated tables | YugabyteDB Docs

Colocated tables 共用表/同地办公/主机表?

In workloads that need lower throughput and have a small data set, the bottleneck shifts from CPU/disk/network to the number of tablets that should be hosted per node. Since each table by default requires at least one tablet per node, a YugabyteDB cluster with 5000 relations (which includes tables and indexes) will result in 5000 tablets per node. There are practical limitations to the number of tablets that YugabyteDB can handle per node since each tablet adds some CPU, disk, and network overhead. If most or all of the tables in YugabyteDB cluster are small tables, then having separate tablets for each table unnecessarily adds pressure on CPU, network and disk.

在需要较低吞吐量且数据集较小的工作负载中,瓶颈从CPU/磁盘/网络转移到每个节点应该托管的分片tablet数量上。由于每个表默认需要每个节点至少存储一个片子tablet,所以一个拥有5000个关系(包括表和索引)的YugabyteDB集群将导致每个节点需要5000个分片tablet。YugabyteDB每个节点可以处理的平板电脑数量是有实际限制的,因为每增加一个分片tablet电脑都需要增加一些CPU、磁盘和网络开销。如果YugabyteDB集群中的大部分或全部表都是小表,那么为每张表单独设置分片tablet就会不必要地增加CPU、网络和磁盘的压力。

为什么说导致每个节点都需要5000个tablet的,每个节点不需要包含5000个tablet?比如有5个节点,第1个节点存15000,其他三个节点用来备份,这样不就需要每个节点都包含5000个tablet了?

To help accommodate such relational tables and workloads, YugabyteDB supports colocating SQL tables. Colocating tables puts all of their data into a single tablet, called the colocation tablet. This can dramatically increase the number of relations (tables, indexes, etc) that can be supported per node while keeping the number of tablets per node low. Note that all the data in the colocation tablet is still replicated across three nodes (or whatever the replication factor is). Large tablets can be dynamically split at a future date if there is need to serve more throughput over a larger data set.

为了帮助适应这样的关系表和工作负载,YugabyteDB支持SQL表的主机托管。共用Colocating表将它们的所有数据放到一个单一的分片tablet中,称为共用分片colocation tablet。这可以极大地增加每个节点可以支持的关系(表、索引等)的数量,同时保持每个节点的tablet数量较少。需要注意的是,colocation tablet中的所有数据仍然要在三个节点上进行复制(或者不管复制系数是多少)。如果需要在更大的数据集上提供更多的吞吐量,可以在未来的某一天动态拆分大型分片。

也就是把多个小表放到一个tablet中了。

Motivation This feature is desirable in a number of scenarios, some of which are described below.

Small datasets needing HA or geo-distribution Applications that have a smaller dataset may fall into the following pattern:

They require large number of tables, indexes and other relations created in a single database. The size of the entire dataset is small. Typically, this entire database is less than 500 GB in size. Need high availability and/or geographic data distribution. Scaling the dataset or the number of IOPS is not an immediate concern. In this scenario, it is undesirable to have the small dataset spread across multiple nodes because this might affect performance of certain queries due to more network hops (for example, joins).

Example: User identity service for a global application. The user dataset size may not be too large, but is accessed in a relational manner, requires high availability and might need to be geo-distributed for low latency access.

动机 这个功能在一些场景中是可取的,其中一些场景描述如下。

需要HA或地理分布的小型数据集 拥有较小数据集的应用程序可能属于以下模式。

它们需要在一个数据库中创建大量的表、索引和其他关系。 整个数据集的大小很小。通常情况下,这个整个数据库的大小小于500GB。 需要高可用性和/或地理数据分布。 缩放数据集或IOPS的数量不是一个直接的问题。 在这种情况下,将小数据集分布在多个节点上是不可取的,因为这可能会由于更多的网络跳数(例如,连接)而影响某些查询的性能。

**举个例子:**全局应用的用户身份服务。用户数据集规模可能不会太大,但以关系方式访问,需要高可用性,可能需要地理分布以实现低延迟访问。

有点类似CDN了。

Large datasets - a few large tables with many small tables Applications that have a large dataset may fall into the pattern where:

  • They need a large number of tables and indexes.
  • A handful of tables are expected to grow large, needing to be scaled out.
  • The rest of the tables will continue to remain small.

In this scenario, only the few large tables would need to be sharded and scaled out. All other tables would benefit from colocation because queries involving all tables, except the larger ones, would not need network hops.

Example: An IoT use case, where one table records the data from the IoT devices while there are a number of other tables that store data pertaining to user identity, device profiles, privacy, etc.

拥有大型数据集的应用程序可能属于这样的模式

他们需要大量的表和索引。 少数表预计会变大,需要缩减规模。 其余的表将继续保持小规模。

在这种情况下,只有少数大型表需要被分片和缩减。所有其他表都将从主机代管中受益,因为除了大表之外,涉及所有表的查询都不需要网络跳转。

**例子:**一个物联网用例,其中一个表记录来自物联网设备的数据,而其他一些表则存储与用户身份、设备配置文件、隐私等相关的数据。

Scaling the number of databases, each database with a small dataset There may be scenarios where the number of databases grows rapidly, while the dataset of each database is small. This is characteristic of a microservices-oriented architecture, where each microservice needs its own database. These microservices are hosted in dev, test, staging, production and other environments. The net result is a lot of small databases, and the need to be able to scale the number of databases hosted. Colocated tables allow for the entire dataset in each database to be hosted in one tablet, enabling scalability of the number of databases in a cluster by simply adding more nodes.

Example: Multi-tenant SaaS services where one database is created per customer. As new customers are rapidly on-boarded, it becomes necessary to add more databases quickly while maintaining high-availability and fault-tolerance of each database.

缩放数据库的数量,每个数据库都有一个小的数据集 可能会有这样的场景:数据库的数量快速增长,而每个数据库的数据集却很小。这是面向微服务架构的特点,每个微服务都需要自己的数据库。这些微服务被托管在开发、测试、暂存、生产和其他环境中。净结果是有很多小数据库,并且需要能够扩展托管的数据库数量。Colocated表允许将每个数据库中的整个数据集托管在一个tabalet中,通过简单地添加更多的节点,实现集群中数据库数量的可扩展性。

**例子:**多租户SaaS服务,每个客户创建一个数据库。随着新客户的快速加入,就需要快速增加更多的数据库,同时保持每个数据库的高可用性和容错性。

Tradeoffs Fundamentally, colocated tables have the following tradeoffs:

Higher performance - no network reads for joins. All of the data across the various colocated tables is local, which means joins no longer have to read data over the network. This improves the speed of joins. Support higher number of tables - using fewer tablets. Because multiple tables and indexes can share one underlying tablet, a much higher number of tables can be supported using colocated tables. Lower scalability - until removal from colocation tablet. The assumptions behind tables that are colocated is that their data need not be automatically sharded and distributed across nodes. If it is known a priori that a table will get large, it can be opted out of the colocation tablet at creation time. If a table already present in the colocation tablet gets too large, it can dynamically be removed from the colocation tablet to enable splitting it into multiple tablets, allowing it to scale across nodes.

权衡利弊 从根本上讲,主机表 colocated tables 有以下权衡:

  • **更高的性能--无需通过网络读取联接数据。**各个主机表 colocated tables 的所有数据都是本地的,这意味着join不再需要通过网络读取数据。这提高了联接的速度。
  • **支持更多的表--使用更少的片。**由于多个表和索引可以共享一个底层tablet,因此使用colocated表可以支持更多数量的表。
  • **较低的可扩展性--直到从主机托管tablet移除。**同地办公的表背后的假设是,它们的数据不需要自动分片并分布在各个节点上。如果事先知道某个表会变得很大,那么可以在创建时将其从主机托管tablet中选择出来。如果已经存在于主机托管tablet中的表变得过大,可以动态地从主机托管tablet中移除,以实现将其分割成多个tablet,使其能够跨节点扩展。

Usage To learn more about using this feature, see Explore colocated tables.

使用方法 要了解有关使用此功能的更多信息,请参见 探索colocated tables。

What's next? For more information, see the architecture for colocated tables.

下一步是什么? 有关更多信息,请参见 colocated tables 架构。

翻译Explore colocated tables on Linux | YugabyteDB Docs

In workloads that do very little IOPS and have a small data set, the bottleneck shifts from CPU/disk/network to the number of tablets one can host per node. Since each table by default requires at least one tablet per node, a YugabyteDB cluster with 5000 relations (tables, indexes) will result in 5000 tablets per node.There are practical limitations to the number of tablets that YugabyteDB can handle per node since each tablet adds some CPU, disk and network overhead. If most or all of the tables in YugabyteDB cluster are small tables, then having separate tablets for each table unnecessarily adds pressure on CPU, network and disk.

To help accommodate such relational tables and workloads, you can colocate SQL tables. Colocating tables puts all of their data into a single tablet, called the colocation tablet. This can dramatically increase the number of relations (tables, indexes, etc.) that can be supported per node while keeping the number of tablets per node low. Note that all the data in the colocation tablet is still replicated across three nodes (or whatever the replication factor is).

This tutorial uses the yb-ctl local cluster management utility.

在工作负载中,如果IOPS很少,数据量很小,瓶颈就会从CPU/磁盘/网络转移到每个节点可以承载的分片tablet数量上。由于每个表默认需要每个节点至少有一个tablet,一个有5000个关系(表、索引)的YugabyteDB集群将导致每个节点有5000个tablet.YugabyteDB每个节点可以处理的tablet数量是有实际限制的,因为每个平板电脑都会增加一些CPU、磁盘和网络开销。如果YugabyteDB集群中的大部分或全部表都是小表,那么为每个表单独设置tablet就会不必要地增加CPU、网络和磁盘的压力。

为了帮助适应这样的关系表和工作负载,你可以将SQL表放在一起。协同表将它们的所有数据放到一个单一的tablet中,称为colocation tablet。这可以极大地增加每个节点可以支持的关系(表、索引等)数量,同时保持每个节点的tablet较低。需要注意的是,colocation tablet中的所有数据仍然会在三个节点上进行复制(或者不管复制系数是多少)。

本教程使用yb-ctl本地集群管理实用程序。

  1. 创建一个领域 Create a universe

./bin/yb-ctl create # 这个是创建一个灵越

  1. 创建一个colocated database,为什么不是创建一个colocated tablet

连接到集群使用ysqlsh,这是干嘛的?

./bin/ysqlsh -h 127.0.0.1

创建一个数据库使用colocated = true这个选项,也就是在SQL里面加这么一句 WITH colocated = true; yugabyte=# CREATE DATABASE northwind WITH colocated = true; 这将创建一个数据库northwind,它的所有表都在一个tablet里面

  1. 创建表tables

连接到northwind数据库,使用标准的CREATE TABLE命令创建表。由于数据库是在colocated = true选项下创建的,所以这些表将被集中在一个tablet上。

\c northwind #这个应该是进入表的意思吧
CREATE TABLE customers (
    customer_id bpchar,
    company_name character varing(40) NOT NULL, # character是干嘛用的
    contact_title character varying(30),
  PRIMARY KEY(customer_id ASC) # ASC是干嘛的
);
CREATE TABLE categories (
    category_id smallint, # 我们知道每个对象都由很多属性,表示存多个相同对象的东西
    category_name character varying(15) NOT NULL, # 
    description text;
  PRIMARY KEY(category_id ASC) # ASC 是什么?
);
#又创建了一个表,这些表都是一个业务涉及的多对象
CREATE TABLE suppliers (
    supplier_id smallint,
    company_name character varying(40) NOT NULL,
    contact_name character varying(30),
    contact_title character varying(30),
  PRIMARY KEY(supplier_id ASC)
);
#商品,这个对象会同时和多个对象打交道
CREATE TABLE products (
    product_id smallint,
    product_name character varying(40) NOT NULL,
    supplier_id smallint,
    category_id smallint,
    quantity_per_unit character varying(20),
    unit_price real,
  PRIMARY KEY(product_id ASC),
  FOREIGN KEY (category_id) REFERENCES categories,
  FOREIGN KEY (supplier_id) REFERENCES suppliers
);

image.png 如果你在主界面中进入表格视图,你会看到所有的表格都有相同的。

  1. 选择退出同地办公表 Opt out table from colocation,这个和反亲和特别像啊,看来还是要把pd里面的rule placement代码阅读下

YugabyteDB可以灵活地选择一个表退出colocation托管。在这种情况下,表将使用自己的一组tablet,而不是使用与colocated database相同的tablets。这对于扩展可能很大的表很有用。您可以在创建表时使用colocated = false选项来实现这一点。

CREATE TABLE orders (
    order_id smallint NOT NULL PRIMARY KEY,
    customer_id bpchar,
    order_date date,
    ship_address character varying(60),
    ship_city character varying(15),
    ship_postal_code character varying(10),
    FOREIGN KEY (customer_id) REFERENCES customers
) WITH (colocated = false);

如果你进入主界面的表格视图,你会看到订单表有自己的一套tablet。 image.png

翻译yugabyte-db/ysql-colocated-tables.md at master · yugabyte/yugabyte-db

  1. 读写colocated表格中的数据

你可以使用标准的 YSQL DML 语句来读取和写入colocated表中的数据。YSQL的查询规划器和执行器将处理将数据路由到正确的平板。

下一步是什么? 有关更多信息,请参见colocated表的架构。

概念

键值对记录

可以表示为 (row:string,column:string,time:int64) -> string

ACID

是指数据库管理系统(DBMS)在寫入或更新資料的過程中,為保證事务(transaction)是正確可靠的,所必須具備的四个特性:原子性(atomicity,或稱不可分割性)、一致性(consistency)、隔离性(isolation,又称独立性)、持久性(durability)。

tablet

region

tidb源码注释

为了方便后面的源码阅读,这里把竞赛的版本提取成分支,然后再放到gitee方便些注释。

community/high-performance-tidb-challenge-cn.md at master · pingcap/community

一共3个仓库,先克隆到自己的仓库,如果是用gitee也可以不用克隆,然后切换分支,创建新分支。

tidb:1bfeff96c7439ed672f8362cf67573666a43f781
tikv:dcd2f8f4076d847151fdf58e9c0ba333f242d374
pd:c05ef6f95773941db5c1060174f5a62e8f864e88

git clone https://github.com/eatcosmos/tidb.git && cd ~/git/tidb
git reset --hard 1bfeff96c7439ed672f8362cf67573666a43f781 && git checkout -b 1bfeff-dev && git push --set-upstream origin 1bfeff-dev
git reset --hard 1bfeff96c7439ed672f8362cf67573666a43f781 && git checkout -b 1bfeff-comment && git push --set-upstream origin 1bfeff-comment

git clone https://github.com/eatcosmos/tikv.git && cd ~/git/tikv
git reset --hard dcd2f8f4076d847151fdf58e9c0ba333f242d374 && git checkout -b dcd2f8-dev && git push --set-upstream origin dcd2f8-dev
git reset --hard dcd2f8f4076d847151fdf58e9c0ba333f242d374 && git checkout -b dcd2f8-comment && git push --set-upstream origin dcd2f8-comment

git clone https://github.com/eatcosmos/pd.git && cd ~/git/pd
git reset --hard c05ef6f95773941db5c1060174f5a62e8f864e88 && git checkout -b c05ef6-dev && git push --set-upstream origin c05ef6-dev
git reset --hard c05ef6f95773941db5c1060174f5a62e8f864e88 && git checkout -b c05ef6-comment && git push --set-upstream origin c05ef6-comment

#开发版 github git clone --single-branch --branch 1bfeff-dev https://github.com/eatcosmos/tidb.git git clone --single-branch --branch dcd2f8-dev https://github.com/eatcosmos/tikv.git git clone --single-branch --branch c05ef6-dev https://github.com/eatcosmos/pd.git

#注释版 gitee https://gitee.com/eatcosmos/tidb/tree/1bfeff-comment/ https://gitee.com/eatcosmos/tikv/tree/dcd2f8-comment/ https://gitee.com/eatcosmos/pd/tree/c05ef6-comment/

学习方法

  1. 发现对比学习效果最好,本来对tikv的结构比较模糊,看来看去也不是很确定,但是看了和他类似的yugabytedb,通过微小的差异对比达到加深理解的目的。
  2. 需要一鼓作气反复看,避免时间被消息打散。中断的几天基本是因为,每次看到中途就去找其他资料,其实没必要,不懂的就跳过去,不要去搜其他资料。等看完了,再集中去补充。
  3. 如果有交流的渠道最好,可以把疑问和想法放上去,没有就自己先记录下来。
展开阅读全文
0
0 收藏
分享
加载中
更多评论
打赏
0 评论
0 收藏
0
分享
返回顶部
顶部