python - 如何使用 python 从 git 存储库获取特定文件版本

我有一个本地 git 存储库，我正在尝试找到一种方法将 xlsx 文件的特定版本放入我的 Python 代码中，以便我可以使用 pandas 处理它。

我找到了 gitpython lib；但我不知道如何正确使用它。

repo = Repo(path_to_repo)
commit = repo.commit(sha)
targetfile = commit.tree / 'dataset.xlsx'

我不知道下一步该做什么。我尝试使用路径将其加载到pandas；但是，当然，它只是加载我的最新版本。

如何将以前版本的xlsx加载到pandas？

最佳答案

当您请求 commit.tree/'dataset.xlsx' 时，您会得到一个 git.Blob 对象:

>>> targetfile
<git.Blob "3137d9443f54325b8ad8a263b13053fee47fbff2">

如果你想读取对象的内容，可以使用data_stream方法提取内容，该方法返回一个类似文件的对象:

>>> data = targetfile.data_stream.read()

或者您可以使用 stream_data 方法(别看我，我没有命名它们)，它将数据写入类似文件的对象:

>>> import io
>>> buf = io.BytesIO()
>>> targetfile.stream_data(buf)
<git.Blob "3137d9443f54325b8ad8a263b13053fee47fbff2">
>>> buf.getvalue()
b'The contents of the file...'

关于python - 如何使用 python 从 git 存储库获取特定文件版本，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/71869347/

上一篇：react-native - 用于 react 导航的 getOptions

下一篇：python - protonvpn 运行时错误 : Couldn't find acceptable executables for {'xdg-open' }

相关文章：

Python:大型数据集中分类值的卡方

python - NLTK 中的 "ImportError: cannot import name StanfordNERTagger"

git - 为什么 git 不显示我领先于 origin 呢？

Git 删除 -- 由 TortoiseGit 缓存

git - 是否有重复的 SHA 提交？

git - 使用 ssh 到 github 获取 kex_exchange_identification 提示

python - 长时间运行的数据处理python脚本中的程序结构

python - PyInt_FromLong 和 Py_BuildValue 之间的区别

git - 避免重新解决 git rebase 中的冲突，包括 merge 提交

python - 在Python中删除列表中的项目