# mercurial largefiles

2015/06/07 22:27

<!> This is considered a feature of last resort. Large binary files tend to be not very compressible, not very "diffable", and not at all mergeable. Such files are not handled well by Mercurial's storage format (Revlog), which is based on compressed binary deltas. largefiles solves this problem by adding a centralized client-server layer on top of Mercurial: largefiles live in a central store out on the network somewhere, and you only fetch the ones that you need when you need them. 大的二进制文件一般不能被压缩，不能被对比，不能被合并。这些文件不能被Mercurial默认的存储格式（revlog）所记录，revlog默认是需要将文件压缩到一起的。largefiles扩展通过在mercurial之上增加一个中心化的client-server层来解决这个问题：大型的文件只在中心存储中保存而不是保存在网络的其他地方，当你需要他们的时候才读取他们。

##1 Status This extension is distributed with Mercurial 2.0 and later.

Author: Various

##2. Overview

The largefiles extension allows for tracking large, incompressible binary files in Mercurial without requiring excessive bandwidth for clones and pulls. Files added as largefiles are not tracked directly by Mercurial; rather, their revisions are identified by a checksum, and Mercurial tracks these checksums. This way, when you clone a repository or pull in changesets, only the largefiles needed to update to the current version are downloaded. This saves both disk space and bandwidth.

largefiles扩展可以让mercurial跟踪大的不能被要锁的二进制文件，而不需要再clone 或者pull的时候占据大量的贷款。文件以largefilse方式，而不是通过mercurial进行跟踪，而且他们的版本revision是通过checksum进行确定，mercurialtrack这些文件的checksum而不是文件本身。这样当你clone一个仓库或者拉取一个修改集的时候，只有那些当前版本需要更新的文件才会被下载。这样就节省了磁盘空间和带宽

If you are starting a new repository or adding new large binary files, using largefiles for them is as easy as adding '--large' to your hg add command. For example:

$dd if=/dev/urandom of=thisfileislarge count=2000$ hg add --large thisfileislarge
$hg commit -m 'add thisfileislarge, which is large, as a largefile'  When you push a changeset that affects largefiles to a remote repository, its largefile revisions will be uploaded along with the changeset. This ensures that the central store gets a copy of every revision of every largefile. Note that the remote Mercurial must also have the largefiles extension enabled for this to work. 当你Push一个修改集到远端的仓库时，largfile版本就是和修改集一汽被push.这就确保了中心存储保存有每个largefiles的每个版本.另外需要确保远端仓库的largefiles扩展是同样被启用的状态。 When you pull a changeset that affects largefiles from a remote repository, nothing different from Mercurial's normal behavior happens. However, when you update to such a revision, any largefiles needed by that revision are downloaded if they have never been downloaded before. This means that network access is required to update to a revision you have not yet updated to. 当你从远端仓库pull一个带有largefiles的修改集，这跟mercurial通常的操作是一样的。不过，当你要更新到这个版本的时候，任何需要本下载的largefiles才被真正的下载。就是说直到你真正需要更新update到相应版本的时候才进行大文件的网络访问。 If you already have large files tracked by Mercurial without the largefiles extension, you will need to convert your repository in order to benefit from largefiles. This is done with the 'hg lfconvert' command: 如果你在使用largefiles 扩展之前已经使用mercurial进行了大文件的跟踪，那你就需要将你的仓库进行转换，使用'hg lfconvert'命令： $ hg lfconvert --size 10 oldrepo newrepo


By default, in repositories that already have largefiles in them, any new file over 10 MB will automatically be added as largefiles. To change this threshhold, set largefiles.minsize in your Mercurial config file to the minimum size in megabytes to track as a largefile:

[largefiles]
minsize = 2


or use the --lfsize option to the add command (also in megabytes):

\$ hg add --lfsize 2


The largefiles.patterns config option allows you to specify specific space-separated filename patterns (in shell glob syntax) that should always be tracked as largefiles:

largefiles.patterns 属性设置允许你制定特定的文件类型（使用shell glob 格式）来让largefiles进行跟踪：

[largefiles]
patterns = *.jpg *.{png,bmp} library.zip content/audio/*


Note: the patterns syntax shown here is probably incorrect, please try hg help patterns to see if it fits better, in particular .{png,bmp} seems not to work, whereas re:..(png|bmp) get things done as expected.

##3. Configuration 设置 Enable the largefiles extension by adding following lines in your config file:

[extensions]
largefiles =


##4. Design设计 This section explains how largefiles works behind the scenes. If you're just adding/modifying/committing/pushing/pulling in a largefiles repo, you shouldn't have to read this section (although it can't hurt). But if you are setting up or administering Mercurial with largefiles, this is essential reading.

###4.1. The local store 本地存储

Each local repository has a local largefiles store in '.hg/largefiles'. When you add a new largefile to a repository, it is first stored here. When largefiles are downloaded from the central store (see below), a copy is saved there. Files in the local store are also hard-linked to the user cache.

###4.2. The user cache 用户缓存

The user cache helps to avoid downloading and storing multiple copies of largefiles. When a largefile is needed but does not exist in the local store, Mercurial checks the user cache. If the needed largefile exists, a hard-link is created in the local store.

The cache location is OS dependent:

• Windows (Vista and up) C:\Users\username\AppData\Local\largefiles
• Windows (pre-Vista) C:\Documents and Settings\username\Application Data\largefiles

You can set your user cache to a non-default location by setting largefiles.usercache in your Mercurial config:

[largefiles]
usercache = /shared/myusercachedir


The user cache can be deleted at any time to reclaim disk space, but doing so may also result in downloading and storing additional copies of largefiles.

user cache文件在任何时候被删除以节省磁盘空间，但是删除以后，如果再需要就需要从新进行下载。

#####4.2.1. The central store

In a typical setup with a central Mercurial server, the user who serves the central repositories will get a user cache that acts as a central store for all the repositories. This central largefiles store has every past revision of every largefile.

<!> Unlike other user caches, the central store should not be deleted! It may be the only cache that holds a largefile used by an old revision.

<!> When a client repository needs to download a largefile, it'll try to get it from the repository specified as default in the hgrc file. If not specified or incorrect repository is specified, the download will fail. As an alternative, a default path can be set for the specific hg update command:

hg --config paths.default=path-to-repo-with-the-file update


####4.3. Implementation details 执行的细节

Each largefile has a standin file in '.hglf/', which is tracked by Mercurial like any other file. The standin contains the SHA-1 hash of the largefile contents. When a largefile is added/removed/copied/renamed/etc the same operation is applied to the standin. Thus the history of the standin is the history of the largefile.

For performance reasons, the contents of a standin are only updated before a commit. Standins are added/removed/copied/renamed from add/remove/copy/rename Mercurial commands but their contents will not be updated. The contents of a standin will always be the hash of the largefile as of the last commit. To support some commands (revert) some standins are temporarily updated, but changed back after the command is finished.

A Mercurial dirstate object tracks the state of the largefiles. The dirstate uses the last modified time and current size to detect if a file has changed without reading the entire contents of the file.

There are a number of older extensions for managing large files. This extension is a descendant of the BfilesExtension and is now the recommended way to handle such files. Alternatives are BigfilesExtension and SnapExtension.

0
1 收藏

0 评论
1 收藏
0