我正在尝试使用 R 来计算 git 存储库上的一些统计信息,并试图找出如何解析 git --numstats 给出的 r 表的格式,但我在解析输出时遇到困难。如何解析输出?
我正在使用以下 git 命令从日志中提取所需的数据:
git log --pretty=format:"[%H],%an,%ae,%aD,%aI,%x22%s%x22" --numstat --perl-regexp --no-merges > Commits.txt
这会创建与此类似的输出
(列:“哈希”、“名称”、“电子邮件”、“日期”、“日期”、“DateISO8608”、“主题”)
[b5db76a6403a2354e46fa9bbcc314689adb3a75d],AuthorName1,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="cd8cb8b9a5a2bffc88a0aca4a18da8a0aca4a1e3aea2a0" rel="noreferrer noopener nofollow">[email protected]</a>,Fri, 27 Oct 2017 11:38:31 -0700,2017-10-27T11:38:31-07:00,"Some subject line"
31 1 MergedComponents/SourceFolder/sources.cpp
2 0 MergedComponents/SourceFolder/sources.h
[81da4b2cf2531e7a5e771d0dc8e344dd9ad69843],AuthorName2,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="01407475696e7333446c60686d41646c60686d2f626e6c" rel="noreferrer noopener nofollow">[email protected]</a>,,Fri, 27 Oct 2017 11:08:13 -0700,2017-10-27T11:08:13-07:00,"Another subject line"
24 0 MergedComponents/SourceFolder/sources.cpp
1 2 MergedComponents/SourceFolder/sources.h
我不知道如何创建一个每个源文件一行的表。这样列就如下所示:
列(“哈希”、“作者姓名”、“电子邮件”、“日”、“日期”、“DateISO8608”、“主题”、“LinesAdded”、“LinesDeleted", "源文件*")
到目前为止,我有以下 R 代码(见下文),但在尝试将当前源代码行与之前的元数据行组合时遇到了困难。
# data was created via
# git log --pretty=format:"[%H],%an,%ae,%aD,%aI,%x22%s%x22" --numstat --no-merges > Commits.txt
allData <- readLines("Commits.txt");
head(allData);
# find all lines that start with [
commitEntries = grep("^\\[", allData)
# create table for the commit metadata
commitTable = read.table(text = allData[commitEntries], sep = ",")
colnames(commitTable) <- c("Hash","Name","Email","Day","Date","DateISO8608","Subject")
head(commitTable)
#create table for each file
numOfLines <- length(allData)
lastHeaderLine <- ""
for (i in 0:numOfLines) {
if (i %in% commitEntries) {
lastCommitMetaLine <- allData[i]
i = i + 1
while (!(i %in% commitEntries) & i < numOfLines) {
print(c(lastCommitMetaLine, allData[i]))
# how to append lastCOmmitMetaLine joined to allData[i] to a table row??
i=i+1
}
}
} }
最佳答案
你可以试试我的gitsum为解析 git 存储库历史而编写的包。它可以解析详细的文件特定信息,例如您会得到一个表,其中提交中的每个更改的文件对应于一行,您可以看到该文件中有多少行被更改。
关于r - 将 Git 日志 numstats 解析到 R 表中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47042101/