Each MOB has a threshold: if the value length of a cell is larger than this threshold, this cell is regarded as a MOB cell.
When the MOB cells are updated in the regions, they are written to the WAL and memstore, just like the normal cells. In flushing, the MOBs are flushed to MOB files, and the metadata and paths of MOB files are flushed to store files. The data consistency and HBase replication features are native to this design.
The MOB edits are larger than usual. In the sync, the corresponding I/O is larger too, which can slow down the sync operations of WAL. If there are other regions that share the same WAL, the write latency of these regions can be affected. However, if the data consistency and non-volatility are needed, WAL is a must.
当MOB单元在region里被更新时，被写入WAL和memstore，跟正常的单元格没区别。当刷新的时候，中等大小文件被刷新到MOB file里，元数据和MOB file的路径被刷入stroe file。这个设计中，一致性和副本都是原生的。
The cells are permitted to move between stored files and MOB files in the compactions by changing the threshold. The default threshold is 100KB.
As illustrated below, the cells that contain the paths of MOB files are called reference cells. The tags are retained in the cells, so we can continue to rely on the HBase security mechanism.
The reference cells have reference tags that differentiates them from normal cells. A reference tag implies a MOB cell in a MOB file, and thus further resolving is needed in reading.
改变阈值，允许单元格在store file和压缩过的MOB file之间移动，默认的阈值设置为100KB。
“引用单元格”通过“引用标签”来跟正常的单元格区分。“引用标签”表示MOB file中的一个MOB 单元格，因此需要在读取的时候进一步转换。
In reading, the store scanner opens scanners to memstore and store files. If a reference cell is met, the scanner reads the file path from the cell value, and seeks the same row key from that file. The block cache can be enabled for the MOB files in scan, which can accelerate seeking..
It is not necessary to open readers to all the MOB files; only one is needed when required. This random read is not impacted by the number of MOB files. So, we don’t need to compact the MOB files over and over again when they are large enough.
The MOB filename is readable, and comprises three parts: the MD5 of the start key, the latest date of cells in this MOB file, and a UUID. The first part is the start key of the region from where this MOB file is flushed. Usually, the MOBs have a user-defined TTL, so you can find and delete expired MOB files by comparing the second part with the TTL.
读取的时候，扫描器扫描memstore和store file，如果遇到“引用单元格”，扫描器读取单元格里的文件路径，通过相同的row key查找文件。可以对扫描过的MOB文件启用块缓存, 这样可以加速查找。没有必要打开所有MOB的reader。只需要打开一个。随机读取不会受文件数量的影响。所以，我们不需要一遍又一遍的压缩足够大的文件。
To be more friendly to the snapshot, the MOB files are stored in a special dummy region, whereby the snapshot, table export/clone, and archive work as expected.
When storing a snapshot to a table, one creates the MOB region in the snapshot, and adds the existing MOB files into the manifest. When restoring the snapshot, create file links in the MOB region.
为了更友好地使用快照, 这些MOB文件存储在一个特殊的虚拟region中, 其中快照、表导出/复制和存档按预期的方式工作。
将快照存储到表中时, 会在快照中创建暴民区域, 并将现有的暴民文件添加到清单中。还原快照时, 在MOB region中创建文件链接。
There are two situations when MOB files should be deleted: when the MOB file is expired, and when the MOB file is too small and should be merged into bigger ones to improve HDFS efficiency.
HBase MOB has a chore in master: it scans the MOB files, finds the expired ones determined by the date in the filename, and deletes them. Thus disk space is reclaimed periodically by aging off expired MOB files.
MOB files may be relatively small compared to a HDFS block if you write rows where only a few entries qualify as MOBs; also, there might be deleted cells. You need to drop the deleted cells and merge the small files into bigger ones to improve HDFS utilization. The MOB compactions only compact the small files and the large files are not touched, which avoids repeated compaction to large files.
如果写入的行只有少数条目符合 MOB 条件， MOB文件可能会比HDFS块相对较小。并且，可能还有被删除的单元格。你需要清理删掉的单元格，并且用HDFS工具将小文件合并成大文件。HBase只压缩小文件，不涉及大文件，避免了对大文件重复压缩。
Some other things to keep in mind:
· Know which cells are deleted. In every HBase major compaction, the delete markers are written to a del file before they are dropped.
· In the first step of MOB compactions, these del files are merged into bigger ones.
· All the small MOB files are selected. If the number of small files is equal to the number of existing MOB files, this compaction is regarded as a major one and is called an ALL_FILES compaction.
· These selected files are partitioned by the start key and date in the filename. The small files in each partition are compacted with del files so that deleted cells could be dropped; meanwhile, a new HFile with new reference cells is generated, the compactor commits the new MOB file, and then it bulk loads this HFile into HBase.
· After compactions in all partitions are finished, if an ALL_FILES compaction is involved, the del files are archived.
3. 所有MOB小文件都被选中。如果小文件的数量等于现有的MOB文件的数量, 这种压缩被认为是一个主要压缩, 被称为 ALL_FILES 压缩。
4.这些选定的文件由文件名中的开始键和日期进行分区。每个分区中的小文件都用 del 文件压缩, 这样删除的单元格就会被丢弃。同时, 一个新的 HFile 与新的参考单元产生, 压缩器提交新的MOB文件, 然后它大批量加载这个 HFile 到 HBase。
5.在所有分区中的压缩完成后, 如果涉及 ALL_FILES 压缩, 则会存档 del 文件。
The life cycle of MOB files is illustrated below. Basically, they are created when memstore is flushed, and deleted by HFileCleaner from the filesystem when they are not referenced by the snapshot or expired in the archive.
下面说明了MOB文件的生命周期。基本上, 它们是在 memstore 被刷新时创建的, 当未被快照引用或在存档中过期时, 由HFileCleaner从文件系统中删除。
In summary, the new HBase MOB design moves MOBs out of the main I/O path of HBase while retaining most security, compaction, and snapshotting features. It caters to the characteristics of operations in MOB, makes the write amplification of MOBs more predictable, and keeps low latencies in both reading and writing.
总之, 新的 HBase MOB设计将中等大小文件移出 HBase 的主要读写路径, 同时保留大多数安全性、压缩性和快照特性。它迎合了MOB操作的特点, 使大量的MOB写入更可预测, 并保持读写低延迟。
Jincheng Du is a Software Engineer at Intel and an HBase contributor.
Jon Hsieh is a Software Engineer at Cloudera and an HBase committer/PMC member. He is also the founder of Apache Flume, and a committer on Apache Sqoop.