
前
言
拓数派首款数据计算引擎 PieCloudDB 是一款全新的云原生虚拟数仓。为了提升用户使用体验,提高查询效率,在实现存算分离的同时,PieCloudDB 设计与打造了全新的存储引擎「简墨」等模块,并针对云场景和分析型场景设计了高效的「Data Skipping」索引。
本文将详细介绍 PieCloudDB 的存储和索引的设计与打造过程,并将通过示例来演示 PieCloudDB 如何使用 Data Skipping 索引加速查询的效率。



「Data Skipping」演示
CREATE TABLE dataskip (a int, b int);
INSERT INTO dataskip SELECT i, i*2 FROM generate_series(1, 1000)i;
INSERT INTO dataskip SELECT i, i*2 FROM generate_series(1001, 2000)i;
INSERT INTO dataskip SELECT i, i*2 FROM generate_series(2001, 3000)i;
INSERT INTO dataskip SELECT i, i*2 FROM generate_series(3001, 4000)i;
INSERT INTO dataskip SELECT i, i*2 FROM generate_series(4001, 5000)i;
INSERT INTO dataskip SELECT i, i*2 FROM generate_series(5001, 6000)i;
INSERT INTO dataskip SELECT i, i*2 FROM generate_series(6001, 7000)i;
INSERT INTO dataskip SELECT i, i*2 FROM generate_series(7001, 8000)i;
INSERT INTO dataskip SELECT i, i*2 FROM generate_series(8001, 9000)i;
INSERT INTO dataskip SELECT i, i*2 FROM generate_series(9001, 10000)i;
CREATE TABLE jtbl (a int, b int);
INSERT INTO jtbl SELECT i, i * random() FROM generate_series(1, 1000)i;
demo=# EXPLAIN ANALYZE SELECT * FROM dataskip WHERE a < 10;
QUERY PLAN
-------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=2.00..10.21 rows=3 width=8) (actual time=34.361..36.928 rows=9 loops=1)
-> Bitmap Heap Scan on dataskip (cost=2.00..10.17 rows=1 width=8) (actual time=16.189..31.790 rows=5 loops=1)
Recheck Cond: (a < 10)
Rows Removed by Index Recheck: 316
-> Bitmap Index Scan on dataskip (cost=0.00..2.00 rows=333 width=0) (actual time=2.908..2.910 rows=1 loops=1)
Index Cond: (a < 10)
Planning Time: 4.259 ms
(slice0) Executor memory: 159K bytes.
(slice1) Executor memory: 32972K bytes avg x 3 workers, 32972K bytes max (seg0).
Memory used: 128000kB
Optimizer: Postgres query optimizer
Execution Time: 55.895 ms
(12 rows)
demo=# SET enable_bitmapscan = off;
SET
demo=# EXPLAIN ANALYZE SELECT * FROM dataskip WHERE a < 10;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..51.71 rows=3 width=8) (actual time=129.916..140.925 rows=9 loops=1)
-> Seq Scan on dataskip (cost=0.00..51.67 rows=1 width=8) (actual time=2.939..132.546 rows=5 loops=1)
Filter: (a < 10)
Rows Removed by Filter: 3292
Planning Time: 0.099 ms
(slice0) Executor memory: 123K bytes.
(slice1) Executor memory: 32825K bytes avg x 3 workers, 32825K bytes max (seg0).
Memory used: 128000kB
Optimizer: Postgres query optimizer
Execution Time: 154.416 ms
(10 rows)
demo=# EXPLAIN ANALYZE SELECT * FROM dataskip JOIN jtbl ON dataskip.a = jtbl.a and jtbl.a < 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=2.00..15.47 rows=3 width=16) (actual time=33.638..33.712 rows=9 loops=1)
-> Nested Loop (cost=2.00..15.43 rows=1 width=16) (actual time=33.300..33.405 rows=5 loops=1)
Join Filter: (dataskip.a = jtbl.a)
Rows Removed by Join Filter: 20
-> Redistribute Motion 3:3 (slice2; segments: 3) (cost=0.00..5.21 rows=2 width=8) (actual time=0.003..0.013 rows=5 loops=1)
Hash Key: jtbl.a
-> Seq Scan on jtbl (cost=0.00..5.17 rows=2 width=8) (actual time=3.144..20.979 rows=3 loops=1)
Filter: (a < 10)
Rows Removed by Filter: 356
-> Materialize (cost=2.00..10.19 rows=1 width=8) (actual time=5.547..5.554 rows=4 loops=6)
-> Redistribute Motion 3:3 (slice3; segments: 3) (cost=2.00..10.19 rows=1 width=8) (actual time=33.130..33.269 rows=5 loops=1)
Hash Key: dataskip.a
-> Bitmap Heap Scan on dataskip (cost=2.00..10.17 rows=1 width=8) (actual time=11.766..24.910 rows=5 loops=1)
Recheck Cond: (a < 10)
Rows Removed by Index Recheck: 316
-> Bitmap Index Scan on dataskip (cost=0.00..2.00 rows=333 width=0) (actual time=2.783..2.784 rows=1 loops=1)
Index Cond: (a < 10)
Planning Time: 6.522 ms
(slice0) Executor memory: 220K bytes.
(slice1) Executor memory: 79K bytes avg x 3 workers, 80K bytes max (seg0). Work_mem: 17K bytes max.
(slice2) Executor memory: 32826K bytes avg x 3 workers, 32826K bytes max (seg0).
(slice3) Executor memory: 32975K bytes avg x 3 workers, 32975K bytes max (seg0).
Memory used: 128000kB
Optimizer: Postgres query optimizer
Execution Time: 68.989 ms
(25 rows)
demo=# SET enable_bitmapscan = off;
SET
demo=# EXPLAIN ANALYZE SELECT * FROM dataskip JOIN jtbl ON dataskip.a = jtbl.a and jtbl.a < 10;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..56.97 rows=3 width=16) (actual time=139.811..139.886 rows=9 loops=1)
-> Nested Loop (cost=0.00..56.92 rows=1 width=16) (actual time=139.587..139.691 rows=5 loops=1)
Join Filter: (dataskip.a = jtbl.a)
Rows Removed by Join Filter: 20
-> Redistribute Motion 3:3 (slice2; segments: 3) (cost=0.00..5.21 rows=2 width=8) (actual time=0.003..0.011 rows=5 loops=1)
Hash Key: jtbl.a
-> Seq Scan on jtbl (cost=0.00..5.17 rows=2 width=8) (actual time=1.758..21.023 rows=3 loops=1)
Filter: (a < 10)
Rows Removed by Filter: 356
-> Materialize (cost=0.00..51.69 rows=1 width=8) (actual time=23.262..23.269 rows=4 loops=6)
-> Redistribute Motion 3:3 (slice3; segments: 3) (cost=0.00..51.69 rows=1 width=8) (actual time=136.260..139.557 rows=5 loops=1)
Hash Key: dataskip.a
-> Seq Scan on dataskip (cost=0.00..51.67 rows=1 width=8) (actual time=1.730..134.913 rows=5 loops=1)
Filter: (a < 10)
Rows Removed by Filter: 3292
Planning Time: 0.248 ms
(slice0) Executor memory: 185K bytes.
(slice1) Executor memory: 79K bytes avg x 3 workers, 80K bytes max (seg0). Work_mem: 17K bytes max.
(slice2) Executor memory: 32826K bytes avg x 3 workers, 32826K bytes max (seg0).
(slice3) Executor memory: 32827K bytes avg x 3 workers, 32827K bytes max (seg0).
Memory used: 128000kB
Optimizer: Postgres query optimizer
Execution Time: 155.026 ms
(23 rows)
关于 PieCloudDB
本文分享自微信公众号 - 开源软件联盟PostgreSQL分会(kaiyuanlianmeng)。
如有侵权,请联系 support@oschina.cn 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。