以某种方式将正常压缩文件的“未压缩”版本存储在存储库中是否有意义?
如果是这样,是否有标准的方法来实现这一点?
(也许是一个标准的预提交 Hook ,将每个此类文件解压缩到一个特殊命名的文件夹中;
以及将此类特殊命名的文件夹压缩为 LibreOffice 知道如何读写的压缩文件的结帐后 Hook ?类似于 "Should I decompress zips before I archive?" 所描述的过程?)
(可能是破解版本控制软件的代码,自动解压新旧版本,并存储解压文件之间的差异,如果失败或没有显着改进,则回退到原始存储系统原始文件之间的直接差异,还是直接存储文件?)
我有一组经常编辑的 OpenOffice/LibreOffice 文件。
我将它们存储在版本控制存储库中——
由 "Should images be stored in a git repository?" 推荐.
虽然我碰巧使用 TortoiseHg 或 SourceTree 来访问我的存储库,而不是 git。
我碰巧知道 Open Office 文件实际上是 zip 压缩的容器,里面有一些 XML 文件。
(我听说许多其他流行的应用程序“二进制文件格式”也是某种形式的 zip 压缩文件)。
我的理解是,即使是对此类“二进制”文件的最小更改也会导致整个新文件存储在存储库中。
与“文本”文件中的小更改相反,这会导致仅存储和传输更改。
从理论上讲,这将具有以下优点:
最佳答案
Does it make any sense to somehow store an "uncompressed" version of normally-compressed files in the repository?
这很有意义,特别是如果您需要分支和区分。
这个old thread总结情况。
- For Openoffice documents whose size is dominated by embed images and other large objects, the git delta mechanism already performs reasonably well, since OO files are Zip archives where each file is compressed separately.
If you do not change an image, then that image remains stored in the same way and the delta can be done.- For OO documents whose size is dominated by plain content, the git delta mechanism cannot work, since the zip compression introduces "mixing" and a small change in the document is converted into a very large change in the zip file.
It could be possible to write a
clean
filter to uncompress before commit.
However there is a trick with the complementarysmudge
filter to be used at checkout. If you do not smudge properly, git always shows the file as changed wrt the index.
Smudging correctly would mean using the very same compression ratio and compress method that OO uses, which can be a little tricky. I have tried using the zip binary both in theclean
and thesmudge
phases and it does not work nicely. The smudged file is always different from the original one.
One should probably work at a lower level to have a finer control on what is happening (libzip) and prepend to the uncompressed file the compression parameters to be restored on smudging.The bigger issue is however that the clean/smudge thing can be really slow when dealing with large OO files.
关于version-control - 在提交到存储库之前解压缩压缩的数据文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17501146/