文档章节

ArangoDB 又添一项新技能啦~ RocksDB Integration in ArangoDB

GermanWifi
 GermanWifi
发布于 2017/05/19 19:40
字数 908
阅读 89
收藏 0

RocksDB Integration in ArangoDB – FAQs

May 18, 2017 CommunityGeneralReleasesTags: RocksDB (Edit)

The new release of ArangoDB 3.2 is just around the corner and will include some major improvements like distributed graph processing with Pregel or a powerful export tool. But most importantly we integrated Facebook’s RocksDB as the first pluggable storage engine in ArangoDB. With RocksDB you will be able to use as much data in ArangoDB as fits on your disc.

As this is an important change and many questions reached us from the community we wanted to share some answers on the most common questions. Please find them below

Will I be able to go beyond the limit of RAM?

Yes. By defining RocksDB as your storage engine you will be able to work with as much data as fits on your disc.

What is the locking behaviour with RocksDB in ArangoDB?

With RocksDB as your storage engine locking is on document level on writes and no locking on reads. Concurrent writes of the same documents will cause write-write conflicts that will be propagated to the calling code, so users can retry the operations when required.

… when you say “Concurrent writes of the same documents will cause write-write conflicts that will be propagated to the calling code”, does it mean that the behavior will differ from currently? Won’t writes try to acquire a lock on the document first?

Yes, it does mean the behavior will differ from currently. The current (MMAP files) engine has collection-level locks so write-write conflicts are not possible. The RocksDB engine has document-level locks so write-write conflicts are possible.

Consider the following example of two transactions T1 and T2 both trying to write a document in collection “c”.

In the old (MMFiles) engine, these transactions would be serialized, e.g.

T1 begins
T1 writes document “a” in collection “c”
T1 commits
T2 begins
T2 writes document “a” in collection “c”
T2 commits

so no write conflicts here.

In the RocksDB engine, the transactions can run in parallel, but as they modify the same document, it needs to be locked to prevent lost updates. The following scheduling will cause a write-write conflict:

T1 begins
T2 begins
T1 writes document “a” in collection “c”
T2 writes document “a” in collection “c”

here one of the transactions (T2) will abort to prevent an unnoticed lost update. Concurrent writes of the same documents will cause write-write conflicts that will be propagated to the calling code, so users can retry the operations when required.

When using RocksDB as a storage engine will I need a fast disc/SSD if an index is disc based?

It will be beneficial to use fast storage. This is true for the memory-mapped files storage engine as well as for the RocksDB-based storage engine.

Will I be able to choose how different collections are stored, or will it be a per-database choice?

It is a per server / cluster choice. It is not possible yet to mix modes or to use different storage engines in the same arangodb instance or cluster.

Can I switch from RocksDB to memory-mapped files with a collection or a database?

It is a per server / cluster choice. The choice must be made before the first server start. The first server start will store the storage engine selection in a file on disk, and this file is validated on all restarts. If the storage engine must be changed after the initial change, data from the ArangoDB instance can be dumped with arangodump, and then arangodb can be restarted with an empty database directory and a different storage engine. The data produced by arangodump can then be loaded into arangod with arangorestore.

Do indexes always store on disk now? Or only persisted type of index?

If you choose RocksDB as your storage engine all indexes will be persisted on disc.

I’m using Microsoft Azure where virtual machines have very fast local SSD disks that are unfortunately “temporary” (meaning they may not survive a reboot), compared to slower but persistent network-attached disks (that can be SSD as well). Would there be any way to leverage the local disk? I’m thinking about something like, using the local disk for fast queries but having the data persisted to the network-attached disk?

RocksDB in general allows specifying different data directories for the different levels of the database. Data on lower levels in newer data, so it would in general be possible to write low-level data to SSD first and have RocksDB move it to slower HDD or network-attached disks when it is moved to higher levels. Note that this is an option that RocksDB offers but that ArangoDB does not yet exploit. In general we don’t think the “read from fast SSD vs. read from slow disks” can be made on a per query-basis, because a query may touch arbitrary data. But recent data or data that is accessed often will likely sit in RocksDB’s in-memory block cache anyway.

 

If you like to dig a bit deeper into our upgrades in 3.2 please find more infos here in our release notes. If you like to take the latest technical preview including RocksDB for a spin you can download ArangoDB 3.2alpha4.

We hope to have covered all important questions. Please let us know if we missed something important via hackers@arangodb.com.

Find ArangoDB on Github

本文转载自:https://www.arangodb.com/2017/05/rocksdb-integration-arangodb-faqs/

GermanWifi
粉丝 139
博文 16
码字总数 14016
作品 0
德国
程序员
私信 提问
ArangoDB 3.2 正式版:提升性能,减少内存占用

ArangoDB 3.2 正式版已发布,该版本消除了两个重大的障碍,添加了一个期待已久的功能,还集成了一个有趣的功能。此外,官方团队表示新版本将 ArangoDB 的性能平均提高了 35%。同时,与 3.1 ...

局长
2017/07/21
1K
6
ArangoDB 发布 3.4 正式版:全文搜索、GeoJSON、流式游标

多模型数据库的理念是:给你提供了一种多视图查看数据的能力。ArangoDB 的数据库背后的理念是:可以自由的将这些视图组合到单个查询中。在 ArangoDB 3.4 的更新中,我们进一步的扩展了 Aran...

程六金
2018/12/06
1K
4
ArangoDB 3.2 Beta 版本发布,融合 RocksDB 存储引擎

ArangoDB 3.2 Beta 版本已发布。ArangoDB 是一个开源数据库,具有灵活的数据模型,如 document, graph 以及 key-value。同时也是一个高性能数据库,支持类似 SQL 的查询以及 JavaScript 和 ...

局长
2017/06/27
1K
0
ArangoDB 3.2 发布

ArangoDB 是一个开源的分布式原生多模型数据库 (Apache 2 license)。 我们的 Vision 是:利用一个引擎,一个 query 语法,一项数据库技术,以及多个数据 模型,来最大力度满足项目的灵活性,...

nanomsg
2017/07/18
3
0
Benchmark 2018 – MongoDB, PG, ArangoDB, Neo4j

Dieser Artikel ist Teil der Open-Source-Performance-Benchmark-Reihe von ArangoDB Seit dem letzten Post gibt es neue Versionen konkurrierender Software, an denen Benchmarks durch......

Mr-Pieces
2018/02/20
0
0

没有更多内容

加载失败,请刷新页面

加载更多

PostgreSQL 11.3 locking

rudi
今天
5
0
Mybatis Plus sql注入器

一、继承AbstractMethod /** * @author beth * @data 2019-10-23 20:39 */public class DeleteAllMethod extends AbstractMethod { @Override public MappedStatement injectMap......

一个yuanbeth
今天
10
1
一次写shell脚本的经历记录——特殊字符惹的祸

本文首发于微信公众号“我的小碗汤”,扫码文末二维码即可关注,欢迎一起交流! redis在容器化的过程中,涉及到纵向扩pod实例cpu、内存以及redis实例的maxmemory值,statefulset管理的pod需要...

码农实战
今天
4
0
为什么阿里巴巴Java开发手册中不建议在循环体中使用+进行字符串拼接?

之前在阅读《阿里巴巴Java开发手册》时,发现有一条是关于循环体中字符串拼接的建议,具体内容如下: 那么我们首先来用例子来看看在循环体中用 + 或者用 StringBuilder 进行字符串拼接的效率...

武培轩
今天
8
0
队列-链式(c/c++实现)

队列是在线性表功能稍作修改形成的,在生活中排队是不能插队的吧,先排队先得到对待,慢来得排在最后面,这样来就形成了”先进先出“的队列。作用就是通过伟大的程序员来实现算法解决现实生活...

白客C
今天
81
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部