linux - tsv 文件两列的并集

我有一个存储有向图的文件。每行表示为

node1 TAB node2 TAB权重

我想找到节点集。有没有更好的方式获得工会？我当前的解决方案涉及创建临时文件:

cut -f1 input_graph | sort | uniq > nodes1
cut -f2 input_graph | sort | uniq > nodes2
cat nodes1 nodes2 | sort | uniq > nodes

最佳答案

{ cut -f1 input_graph; cut -f2 input_graph; } | sort | uniq

无需排序两次。

{ cmd1;命令2； } 语法等同于 (cmd1; cmd2) 但可以避免子 shell。

在另一种语言(例如 Perl)中，您可以将第一列放入散列中，然后按顺序处理第二列。

仅使用 Bash，您可以使用语法 cat <(cmd1) <(cmd2) 来避免临时文件. Bash 负责创建临时文件描述符和设置管道。

在脚本中(您可能希望避免需要 bash)，如果您最终需要临时文件，请使用 mktemp

关于linux - tsv 文件两列的并集，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/19020255/

相关文章：

java - 编译错误java hadoop程序