mercurial largefiles

2015/06/07 22:27
阅读数 293

原文地址: #Largefiles extension

<!> This is considered a feature of last resort. Large binary files tend to be not very compressible, not very "diffable", and not at all mergeable. Such files are not handled well by Mercurial's storage format (Revlog), which is based on compressed binary deltas. largefiles solves this problem by adding a centralized client-server layer on top of Mercurial: largefiles live in a central store out on the network somewhere, and you only fetch the ones that you need when you need them. 大的二进制文件一般不能被压缩,不能被对比,不能被合并。这些文件不能被Mercurial默认的存储格式(revlog)所记录,revlog默认是需要将文件压缩到一起的。largefiles扩展通过在mercurial之上增加一个中心化的client-server层来解决这个问题:大型的文件只在中心存储中保存而不是保存在网络的其他地方,当你需要他们的时候才读取他们。

##1 Status This extension is distributed with Mercurial 2.0 and later.


Author: Various

##2. Overview

The largefiles extension allows for tracking large, incompressible binary files in Mercurial without requiring excessive bandwidth for clones and pulls. Files added as largefiles are not tracked directly by Mercurial; rather, their revisions are identified by a checksum, and Mercurial tracks these checksums. This way, when you clone a repository or pull in changesets, only the largefiles needed to update to the current version are downloaded. This saves both disk space and bandwidth.

largefiles扩展可以让mercurial跟踪大的不能被要锁的二进制文件,而不需要再clone 或者pull的时候占据大量的贷款。文件以largefilse方式,而不是通过mercurial进行跟踪,而且他们的版本revision是通过checksum进行确定,mercurialtrack这些文件的checksum而不是文件本身。这样当你clone一个仓库或者拉取一个修改集的时候,只有那些当前版本需要更新的文件才会被下载。这样就节省了磁盘空间和带宽

If you are starting a new repository or adding new large binary files, using largefiles for them is as easy as adding '--large' to your hg add command. For example:

如果你要开启一个新的仓库,或者增加一个新的二进制文件,使用largefiles的方法是简单的加上‘--large’ 参数就可以了,如下:

$ dd if=/dev/urandom of=thisfileislarge count=2000
$ hg add --large thisfileislarge
$ hg commit -m 'add thisfileislarge, which is large, as a largefile'

When you push a changeset that affects largefiles to a remote repository, its largefile revisions will be uploaded along with the changeset. This ensures that the central store gets a copy of every revision of every largefile. Note that the remote Mercurial must also have the largefiles extension enabled for this to work.


When you pull a changeset that affects largefiles from a remote repository, nothing different from Mercurial's normal behavior happens. However, when you update to such a revision, any largefiles needed by that revision are downloaded if they have never been downloaded before. This means that network access is required to update to a revision you have not yet updated to.


If you already have large files tracked by Mercurial without the largefiles extension, you will need to convert your repository in order to benefit from largefiles. This is done with the 'hg lfconvert' command:

如果你在使用largefiles 扩展之前已经使用mercurial进行了大文件的跟踪,那你就需要将你的仓库进行转换,使用'hg lfconvert'命令:

$ hg lfconvert --size 10 oldrepo newrepo

By default, in repositories that already have largefiles in them, any new file over 10 MB will automatically be added as largefiles. To change this threshhold, set largefiles.minsize in your Mercurial config file to the minimum size in megabytes to track as a largefile:


minsize = 2

or use the --lfsize option to the add command (also in megabytes):


$ hg add --lfsize 2

The largefiles.patterns config option allows you to specify specific space-separated filename patterns (in shell glob syntax) that should always be tracked as largefiles:

largefiles.patterns 属性设置允许你制定特定的文件类型(使用shell glob 格式)来让largefiles进行跟踪:

patterns = *.jpg *.{png,bmp} content/audio/*

Note: the patterns syntax shown here is probably incorrect, please try hg help patterns to see if it fits better, in particular .{png,bmp} seems not to work, whereas re:..(png|bmp) get things done as expected.

注意:这里显示的格式是不正确的,请使用hg help patterns来看正确的格式,这里*{png,bmp}是不对的,应该是re:.*.(png|bmp)

##3. Configuration 设置 Enable the largefiles extension by adding following lines in your config file:


largefiles =

##4. Design设计 This section explains how largefiles works behind the scenes. If you're just adding/modifying/committing/pushing/pulling in a largefiles repo, you shouldn't have to read this section (although it can't hurt). But if you are setting up or administering Mercurial with largefiles, this is essential reading.


###4.1. The local store 本地存储

Each local repository has a local largefiles store in '.hg/largefiles'. When you add a new largefile to a repository, it is first stored here. When largefiles are downloaded from the central store (see below), a copy is saved there. Files in the local store are also hard-linked to the user cache.

每个本地的仓库都有一个本地的largefiles位置在'.hg/largefiles'。当你添加一个新的largefiles,它首先被存储到这里。当largefiles从中心存储中被下载下来也是被存储到这里。在本地存储中的文件同样是user cache中的hard-link

###4.2. The user cache 用户缓存

The user cache helps to avoid downloading and storing multiple copies of largefiles. When a largefile is needed but does not exist in the local store, Mercurial checks the user cache. If the needed largefile exists, a hard-link is created in the local store.

用户缓存用来避免下载和存储多个拷贝的largefiles.当一个largefiles被需要而且在local store(本地存储)中没有的时候,mercurial会首先检查用户缓存。如果文件在用户缓存中存在,则在local store中创建一个hard-link

The cache location is OS dependent:


  • OS X /Users/username/Library/Caches/largefiles
  • Windows (Vista and up) C:\Users\username\AppData\Local\largefiles
  • Windows (pre-Vista) C:\Documents and Settings\username\Application Data\largefiles
  • Linux /home/username/.cache/largefiles

You can set your user cache to a non-default location by setting largefiles.usercache in your Mercurial config:

你可以设置largefiles.usercache来改变默认user cache的路径

usercache = /shared/myusercachedir

The user cache can be deleted at any time to reclaim disk space, but doing so may also result in downloading and storing additional copies of largefiles.

user cache文件在任何时候被删除以节省磁盘空间,但是删除以后,如果再需要就需要从新进行下载。

#####4.2.1. The central store

In a typical setup with a central Mercurial server, the user who serves the central repositories will get a user cache that acts as a central store for all the repositories. This central largefiles store has every past revision of every largefile.

在一个通常的mercurial中心服务器中,设置中心仓库的user将会有一个user cache,这个usercache就像是一个中心store,为所有仓库服务。这个largefiles中心存储着所有largefile的版本。

<!> Unlike other user caches, the central store should not be deleted! It may be the only cache that holds a largefile used by an old revision.

跟其他的user cache不一样,中心存储的文件不能删除,这里是存储largefile各个版本的唯一位置。

<!> When a client repository needs to download a largefile, it'll try to get it from the repository specified as default in the hgrc file. If not specified or incorrect repository is specified, the download will fail. As an alternative, a default path can be set for the specific hg update command:

当客户端的仓库需要下载一个largefile的时候,它将是同从hgrc配置文件中制定的默认远程仓库进行下载。如果没有指定,或者制定的仓库不存在,下载将会失败。另外也可以在hg update命令时制定相应的仓库路径:

hg --config paths.default=path-to-repo-with-the-file update

####4.3. Implementation details 执行的细节

Each largefile has a standin file in '.hglf/', which is tracked by Mercurial like any other file. The standin contains the SHA-1 hash of the largefile contents. When a largefile is added/removed/copied/renamed/etc the same operation is applied to the standin. Thus the history of the standin is the history of the largefile.

每个largefile 在'.hglf'文件夹中都有一个替身文件,mercurial像其他文件一样跟踪这些文件。替身文件包含着尸体文件的sha-1 hash值。当一个largefile 被 added/removed/copied/renamed/etc 替身文件也会被做相应的操作,这样替身文件的历史就是相应largefile的历史

For performance reasons, the contents of a standin are only updated before a commit. Standins are added/removed/copied/renamed from add/remove/copy/rename Mercurial commands but their contents will not be updated. The contents of a standin will always be the hash of the largefile as of the last commit. To support some commands (revert) some standins are temporarily updated, but changed back after the command is finished.

为了性能考虑的原因,替身文件的内容只在commit前进行update.替身文件在mercural add/remove/copy/rename/updated命令时进行相应的added/removed/copied/renamed,但是他们的内用却不会update.替身文件的内容永远是相应largefile最后一次commit的hash值。当运行revert或者某些命令的时候替身文件的内容会临时的updated,但是当命令结束以后又会恢复。

A Mercurial dirstate object tracks the state of the largefiles. The dirstate uses the last modified time and current size to detect if a file has changed without reading the entire contents of the file.

mercurial的一个dirstate object跟踪largefiles的状态。dirstate查询largefile最后的修改时间和当前的文件大小来确定文件是否被修改,而不是读取文件的全部内容。 ###5. See also 额外的参考

There are a number of older extensions for managing large files. This extension is a descendant of the BfilesExtension and is now the recommended way to handle such files. Alternatives are BigfilesExtension and SnapExtension.

有几个老的扩展也用来管理大的文件,本扩展是由BfilesExtension衍生而来,而且目前我们推荐使用本扩展来处理类似的大文件,其他的可选扩展还有BigFilesExtension 和SnapExtension

1 收藏
0 评论
1 收藏