perl - 如何修剪文件 - 删除具有相同值的列

我想通过删除具有相同值的列来帮助您修剪文件。

# the file I have (tab-delimited, millions of columns)
jack 1 5 9
john 3 5 0
lisa 4 5 7

# the file I want (remove the columns with the same value in all lines)
jack 1 9
john 3 0
lisa 4 7

你能给我任何关于这个问题的指导吗？我更喜欢 sed 或 awk 解决方案，或者可能是 perl 解决方案。

提前致谢。
最好的事物，

最佳答案

这是一个快速的 perl 脚本，用于确定可以剪切哪些列。

open FH, "file" or die $!;
my @baseline = split /\t/,<FH>;         #snag the first row
my @linemap = 0..$#baseline;            #list all equivalent columns (all of them)

while(<FH>) {                           #loop over the file
    my @line = split /\t/;
    @linemap = grep {$baseline[$_] eq $line[$_]}  @linemap; #filter out any that aren't equal
}
print join " ", @linemap;
print "\n";

您可以使用上述许多建议来实际删除列。我最喜欢的可能是 cut 实现，部分原因是可以修改上面的 perl 脚本来为您提供精确的命令(甚至为您运行它)。

@linemap = map {$_+1} @linemap;                   #Cut is 1-index based
print "cut --complement -f ".join(",",@linemap)." file\n";

关于perl - 如何修剪文件 - 删除具有相同值的列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/6363583/

上一篇：vim - 在 Vim 中，为什么我不能将键重新映射到 `w` ？

下一篇：r - 从 coeftest 对象列表中提取列

相关文章：

perl hash hash 中奇数个元素

linux - 递归查找最后 n 行的特定文本

bash - 合并 linux 中不同列的两个文件

c++ - Solaris中如何检测文件泄漏以及相应的代码？

linux - 未终止的 `s' 命令与 sed 故障排除

linux - 使用 sed 在包含两个特定字符串的行中添加字符

mysql - 记录集迭代

git - 无法使用 "Git add -i"

windows - 调用 ssh2 命令的 Perl 脚本无法在 Windows 任务计划程序中取回输出

linux - cp 的所有可能的退出代码