Github 最近推出了一个 extension到 git 以不同的方式存储大文件。 extension replaces large files with text pointers in Git 到底是什么意思?
最佳答案
你可以在git-lfs sources中看到怎么样"text pointer" is defined :
type Pointer struct {
Version string
Oid string
Size int64
OidType string
}
smudge和 clean来源意味着git-lfs
可以使用 content filter driver 为了:
- checkout 时下载实际文件
- 在提交时将它们存储在外部源中。
参见 the pointer specs :
The core Git LFS idea is that instead of writing large blobs to a Git repository, only a pointer file is written.
version https://git-lfs.github.com/spec/v1
oid sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393
size 12345
(ending \n)
Git LFS needs a URL endpoint to talk to a remote server.
A Git repository can have different Git LFS endpoints for different remotes.
实际文件是上传到或下载自遵守 Git-LFS API 的服务器.
git-lfs
man page 证实了这一点,其中提到:
The actual file gets pushed to a Git LFS API
您需要一个实现该 API 的 Git 服务器,以支持上传和下载二进制内容。
关于内容过滤器驱动程序(它在 Git 中存在很长时间,早于 lfs,在这里被 lfs 使用来添加这个“大文件管理”功能),这是大部分工作发生的地方:
The smudge filter runs as files are being checked out from the Git repository to the working directory.
Git sends the content of the Git blob as STDIN, and expects the content to write to the working directory as STDOUT.Read 100 bytes.
If the content is ASCII and matches the pointer file format:
Look for the file in .git/lfs/objects/{OID}.If it's not there, download it from the server.
Read its contents to STDOUTOtherwise, simply pass the STDIN out through STDOUT.
The clean filter runs as files are added to repositories.
Git sends the content of the file being added as STDIN, and expects the content to write to Git as STDOUT.
- Stream binary content from STDIN to a temp file, while calculating its SHA-256 signature.
- Check for the file at
.git/lfs/objects/{OID}
.- If it does not exist:
- Queue the OID to be uploaded.
- Move the temp file to
.git/lfs/objects/{OID}
.- Delete the temp file.
- Write the pointer file to STDOUT.
Git 2.11(2016 年 11 月)有一个提交详细说明了它是如何工作的:commit edcc858 ,由 Martin-Louis Bright 提供帮助并由 Lars Schneider 签字。
convert
: addfilter.<driver>.process
optionGit's clean/smudge mechanism invokes an external filter process for every single blob that is affected by a filter. If Git filters a lot of blobs then the startup time of the external filter processes can become a significant part of the overall Git execution time.
In a preliminary performance test this developer used a clean/smudge filter written in golang to filter 12,000 files. This process took 364s with the existing filter mechanism and 5s with the new mechanism. See details here: git-lfs/git-lfs#1382
This patch adds the
filter.<driver>.process
string option which, if used, keeps the external filter process running and processes all blobs with the packet format (pkt-line
) based protocol over standard input and standard output.
The full protocol is explained in detail inDocumentation/gitattributes.txt
.A few key decisions:
- The long running filter process is referred to as filter protocol version 2 because the existing single shot filter invocation is considered version 1.
- Git sends a welcome message and expects a response right after the external filter process has started. This ensures that Git will not hang if a version 1 filter is incorrectly used with the
filter.<driver>.process
option for version 2 filters. In addition, Git can detect this kind of error and warn the user.- The status of a filter operation (e.g. "success" or "error) is set before the actual response and (if necessary!) re-set after the response. The advantage of this two step status response is that if the filter detects an error early, then the filter can communicate this and Git does not even need to create structures to read the response.
- All status responses are pkt-line lists terminated with a flush packet. This allows us to send other status fields with the same protocol in the future.
这会导致在 Git 2.12(2017 年第一季度)中设置警告
参见 commit 7eeda8b (2016 年 12 月 18 日),以及 commit c6b0831 (2016 年 12 月 3 日)Lars Schneider ( larsxschneider
) .
(由 Junio C Hamano -- gitster
-- merge 于 commit 08721a0 ,2016 年 12 月 27 日)
docs
: warn about possible '=
' in clean/smudge filter process valuesA pathname value in a clean/smudge filter process "
key=value
" pair can contain the '=
' character (introduced in edcc858).
Make the user aware of this issue in the docs, add a corresponding test case, and fix the issue in filter process value parser of the example implementation incontrib
.
关于git - Git Large File Storage 背后的存储机制是什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29530200/