linux - 如何从文件中删除重复的 header (Linux 中第一次出现除外)

我有一个如下所示的文件。

文件1:

No name city country
1  xyz yyyy zzz
No name city country
2 test dddd xxxx
No name city country
3  xyz yyyy zzz

我想从此文件中删除除第一次出现之外的重复行，并将结果保存在同一个文件中。

我尝试了下面的代码，但没有帮助。

header=$(head -n 1 file1)
(printf "%s\n" "$header";
 grep -vFxe "$header" file1
) > file1

最佳答案

在Awk中非常简单，只需将行中的所有字段作为唯一键即可，

awk '!unique[$1$2$3$4]++' file > new-file

产生的输出为

No name city country
1  xyz yyyy zzz
2 test dddd xxxx
3  xyz yyyy zzz

Awk 中的一个更易读的版本是由循环到行中的最大字段(循环到 NF)组成

awk '{key=""; for(i=1;i<=NF;i++) key=key$i;}!unique[key]++' file > new-file

(或)下面 Sundeep 评论中的易读版本，使用 $0 表示整行内容

awk '!unique[$0]++' file

OP 提出的关于就地保存文件的后续问题，

GNU Awk 的最新版本(自 4.1.0 released 开始)，具有 "inplace" file editing 选项:

[...] The "inplace" extension, built using the new facility, can be used to simulate the GNU "sed -i" feature. [...]

使用示例:

gawk -i inplace '{key=""; for(i=1;i<=NF;i++) key=key$i;}!unique[key]++' file

保留备份:

gawk -i inplace -v INPLACE_SUFFIX=.bak '{key=""; for(i=1;i<=NF;i++) key=key$i;}!unique[key]++' file

(或者)如果您的 Awk 不支持，请使用 shell 内置函数

tmp=$(mktemp) 
awk '{key=""; for(i=1;i<=NF;i++) key=key$i;}!unique[key]++' file > "$tmp" && mv "$tmp" file

关于linux - 如何从文件中删除重复的 header (Linux 中第一次出现除外)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45082275/