python - RobuSTLy 使用 Git blame 检索 SHA 和行内容 (Python3)

标签 python git-blame

我正在为一个使用 git blame 检索文件信息的包 (Python >= 3.5) 做贡献。我正在努力更换 GitPython自定义代码的依赖性仅支持我们实际需要的一小部分功能(并以我们实际需要的形式提供数据)。

我发现 git blame -lts 最接近我的需要,即检索文件中每一行的提交 SHA 和行内容。这给了我这样的输出

82a3e5021b7131e31fc5b110194a77ebee907955 books/main/docs/index.md  5) Softwareplattform [ILIAS](https://www.ilias.de/), die an zahlreichen

我已经处理过

       line_pattern = re.compile('(.*?)\s.*\s*\d\)(\s*.*)')

        for line in cmd.stdout():
            m = line_pattern.match(line)
            if m:
                sha = m.group(1)
                content = m.group(2).strip()

效果很好。然而,该软件包的维护者正确地警告说,“这可能会为非常特定的用户组引入难以调试的错误。可能需要跨多个操作系统和 GIT 版本进行大量单元测试。”

我采用我的方法是因为我发现 git blame --porcelain 的输出解析起来有些乏味。

30ed8daf1c48e4a7302de23b6ed262ab13122d31 1 1 1
author XY
author-mail <XY>
author-time 1580742131
author-tz +0100
committer XY
committer-mail <XY>
committer-time 1580742131
committer-tz +0100
summary Stub-Outline-Dateien
filename home/docs/README.md
        hero: abcdefghijklmnopqrstuvwxyz
82a3e5021b7131e31fc5b110194a77ebee907955 18 18

82a3e5021b7131e31fc5b110194a77ebee907955 19 19
        ---
82a3e5021b7131e31fc5b110194a77ebee907955 20 20

...

我不喜欢这种对字符串列表的迭代所涉及的内务处理。

我的问题是:

1) 我是否应该更好地使用 --porcelain 输出,因为它明确用于机器消费? 2) 我可以期望这种格式在 Git 版本和操作系统上是健壮的吗?我是否可以假设以 TAB 字符开头的行是内容行,这是源代码行的最后输出行,并且该制表符之后的任何内容都是原始行内容?

最佳答案

不知道这是否是最好的解决方案,我没有在这里等待答案就试了一下。我假设我的两个问题的答案是"is"。

可以在此处的上下文中看到以下代码:https://github.com/uliska/mkdocs-git-authors-plugin/blob/6f5822c641452cea3edb82c2bbb9ed63bd254d2e/mkdocs_git_authors_plugin/repo.py#L466-L565

    def _process_git_blame(self):
        """
        Execute git blame and parse the results.

        This retrieves all data we need, also for the Commit object.
        Each line will be associated with a Commit object and counted
        to its author's "account".
        Whether empty lines are counted is determined by the
        count_empty_lines configuration option.

        git blame --porcelain will produce output like the following
        for each line in a file:

        When a commit is first seen in that file:
            30ed8daf1c48e4a7302de23b6ed262ab13122d31 1 2 1
            author John Doe
            author-mail <j.doe@example.com>
            author-time 1580742131
            author-tz +0100
            committer John Doe
            committer-mail <j.doe@example.com>
            committer-time 1580742131
            summary Fancy commit message title
            filename home/docs/README.md
                    line content (indicated by TAB. May be empty after that)

        When a commit has already been seen *in that file*:
            82a3e5021b7131e31fc5b110194a77ebee907955 4 5
                    line content

        In this case the metadata is not repeated, but it is guaranteed that
        a Commit object with that SHA has already been created so we don't
        need that information anymore.

        When a line has not been committed yet:
            0000000000000000000000000000000000000000 1 1 1
            author Not Committed Yet
            author-mail <not.committed.yet>
            author-time 1583342617
            author-tz +0100
            committer Not Committed Yet
            committer-mail <not.committed.yet>
            committer-time 1583342617
            committer-tz +0100
            summary Version of books/main/docs/index.md from books/main/docs/index.md
            previous 1f0c3455841488fe0f010e5f56226026b5c5d0b3 books/main/docs/index.md
            filename books/main/docs/index.md
                    uncommitted line content

        In this case exactly one Commit object with the special SHA and fake
        author will be created and counted.

        Args:
            ---
        Returns:
            --- (this method works through side effects)
        """

        re_sha = re.compile('^\w{40}')

        cmd = GitCommand('blame', ['--porcelain', str(self._path)])
        cmd.run()

        commit_data = {}
        for line in cmd.stdout():
            key = line.split(' ')[0]
            m = re_sha.match(key)
            if m:
                commit_data = {
                    'sha': key
                }
            elif key in [
                'author',
                'author-mail',
                'author-time',
                'author-tz',
                'summary'
            ]:
                commit_data[key] = line[len(key)+1:]
            elif line.startswith('\t'):
                # assign the line to a commit
                # and create the Commit object if necessary
                commit = self.repo().get_commit(
                    commit_data.get('sha'),
                    # The following values are guaranteed to be present
                    # when a commit is seen for the first time,
                    # so they can be used for creating a Commit object.
                    author_name=commit_data.get('author'),
                    author_email=commit_data.get('author-mail'),
                    author_time=commit_data.get('author-time'),
                    author_tz=commit_data.get('author-tz'),
                    summary=commit_data.get('summary')
                )
                if len(line) > 1 or self.repo().config('count_empty_lines'):
                    author = commit.author()
                    if author not in self._authors:
                        self._authors.append(author)
                    author.add_lines(self, commit)
                    self.add_total_lines()
                    self.repo().add_total_lines()

关于python - RobuSTLy 使用 Git blame 检索 SHA 和行内容 (Python3),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60523415/

相关文章:

python - Pytorch - RuntimeError : Trying to backward through the graph a second time, 但缓冲区已被释放

python - pickle 类定义

Python mysql.connector.errors。 %s 传递给带引号的 SQL 查询

python - 从 4 个指定列中只获取两个值并将有效值合并到 2 列中

git:自特定提交以来*未*更改的行数?

git - 加速 `git blame` 在有很多提交的存储库上

Git 提交不会覆盖 git blame 中的原始作者

git - 改进 IntelliJ 注释(git blame)

python - Scrapy的初始化错误

git-blame - tig 责备 View : How to come back (child commit) after loading parent commit