git - 将对象分类到以 SHA-1 字符串的前 2 个字符命名的文件夹中的优点?

标签 git

Git 使用对象的 SHA-1 字符串的前 2 个字符将对象存储到分类文件夹中,这种存储结构有什么优势?

我认为它无法避免任何潜在的冲突,为什么不将所有对象放入一个平面文件夹中呢?

最佳答案

loose objects 结构(example hereGit Internal - Packfiles)表示对象最初在 Git 中的存储方式。

你可以看到approach used elsewhere too (for images database for instance, actually on two levels ,但这也适用于 Git):

compute the SHA-1 hash of the image, generate its hexadecimal form, and use the first two characters of the SHA-1 string as a first-level directory.

SHA1 hashes give good distribution, even in the first few characters, so that will nicely distribute the files into a (relatively) balanced folder structure.
This simplistic approach will use no more than 256 folders at each level.

Using the hexadecimal form of the image's SHA-1 has two very nice benefits:

  • no name collisions, and
  • any given file will only be stored once even if the same file is uploaded more than once.

参见 gitrepository-layout :

对象/[0-9a-f][0-9a-f]

A newly created object is stored in its own file.
The objects are splayed over 256 subdirectories using the first two characters of the sha1 object name to keep the number of directory entries in objects itself to a manageable number.
Objects found here are often called unpacked (or loose) objects.

git commit 88520ca为我们提供了有关该结构优势的更多信息,这会影响何时运行 gc:

search 4 directories to improve statistic of gc hint

On Windows, git-gui suggests running the garbage collector if it finds 1 or more files in .git/objects/42 (as opposed to 8 files on other platforms).
The probability of that happening if the repo contains about 100 loose objects is 32%.
The probability for the same to happen when searching 4 directories is only 8%, which is bit more reasonable.

The following octave script shows the probability for at least m*q objects to be found in q subdirectories of .git/objects if n is the total number of objects.

(它使用 Cumulative Distribution function (CDF) 作为 binomial distribution binocdf )

q = 4;
m = [1 2 8];
n = 0:10:2000;

P = zeros(length(n), length(m));
for k = 1:length(n)
        P(k, :) = 1-binocdf(q*m-1, n(k), q/(256-q));
end
plot(n, P);

n \ q   1       4
50      18%     1%
100     32%     8%
200     54%     39%
500     86%     96%

该组织还允许 Git's packing heuristics尽可能快速高效,当git gc has to occur .

关于git - 将对象分类到以 SHA-1 字符串的前 2 个字符命名的文件夹中的优点?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30662521/

相关文章:

git - 是否可以继续 Github 上其他人打开的 pull 请求?

git - 我如何 git push 特定分支?

Git:设置一个只获取的远程?

git - 修复旧 Git 提交中的许可证

git - 检查 Git 标签导致 "detached HEAD state"

git - 在 Visual Studio Code for Windows 中,Git 分支不显示且无法创建分支

每个文件的 Git 贡献者

java - Java WebApp 的 Azure GIT 持续部署

git - Web 开发问题的基本版本控制 - 单个开发人员。 (SVN/GIT)

windows - Github 窗口,整个文件显示提交后的更改。