linux - 查找重复行之间的平均时间/距离

我有一个包含数万行重复项的文件。我想根据行号找到重复项之间的平均时间/距离。

例如:(其中第一列是行号)

1 string1
2 string2
3 string2
4 string1
5 string3

将给出 2(第一对重复项之间有 3 行，第二对重复项之间有 1 行，除以 2，因为有 2 个重复项)。

关于如何解决这个问题有什么想法吗？

编辑

Starting test!
32-bit hash: 0x995D9A6E
32-bit hash: 0xA27B264D
32-bit hash: 0x856ED0A5
32-bit hash: 0x3B83614D
32-bit hash: 0x23D92F43
32-bit hash: 0xA1D0BE63
32-bit hash: 0xB0BF66B6
32-bit hash: 0x968F7074
32-bit hash: 0x76F75FD1
32-bit hash: 0x76A51358

最佳答案

您可以使用 GNU awk 做到这一点:

$ cat a.txt 
string1
string2
string2
string1
string3

$ cat test.awk
{
    if($0 in lines) {
        distance += NR - lines[$0];
        ++count;
    }
    else {
        lines[$0] = NR;
    }
}
END {
    print distance / count;
}

$ awk -f test.awk < a.txt 
2

上面给出了第一次出现的线与其他线之间的距离。如果您想要同一行的下一个和上一个之间的距离，请执行以下操作:

    # ...
    if($0 in lines) {
        distance += NR - lines[$0];
        lines[$0] = NR; # <--- add this
        ++count;
    }
    # ...

关于linux - 查找重复行之间的平均时间/距离，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/21659608/

上一篇：linux - 为什么我设置了 ~/.bashrc 但它不起作用？

下一篇：linux - 如何启动 MATLAB 控制台并向其输入命令？

linux - Bash:根据用户输入在本地或远程计算机上运行函数中的所有命令

linux - 从 crontab 运行 shell 脚本时权限被拒绝

bash - 如何删除数组中的元素，然后在 Shell 脚本中移动数组？

bash - 使用 echo 生成元素组合

linux - Bash 关联数组大小

linux - 如何在多个文件上使用 awk？

c++ - 为什么在这里使用进程替换会导致挂起？

linux - -bash :/usr/bin/virtualenvwrapper. sh: 没有那个文件或目录

Windows - 如果发生错误，防止控制台窗口关闭