linux - 在 Linux 中操作文本文件

我在一个文件夹中有一堆 CSV 文件(操作系统:Ubuntu)。他们都在同一个结构上。超过 2k 列(这就是得到它的方式)。第一列是 ID。

我无法使用 SQL(不管为什么)，所以我想我需要使用 bash 命令，例如 awk、cut、sed 等，我对它们有基本的了解。

我需要做以下事情: 遍历文件(就像文件合并为一个文件):对于每个偶数列，检查它是否有一个等于 0 的不同值 --> 如果是，则删除该列和下一列. 此外，我需要将已删除列的索引打印到新文件中。

例子

file_1:
2231, 0, 5, 0, 9, 0, 9, 3, 3
1322, 0, 5, 0, 1, 0, 9, 2, 5
1233, 5, 5, 0, 3, 0, 9, 4, 6
1543, 2, 5, 0, 4, 0, 9, 6, 1
2341, 0, 5, 0, 7, 0, 9, 0, 2

files_2:
1322, 0, 5, 0, 3, 0, 9, 1, 2
1432, 0, 5, 0, 0, 0, 9, 3, 7
1434, 0, 5, 0, 8, 0, 9, 1, 4
1132, 0, 5, 0, 4, 0, 9, 3, 5
1434, 0, 5, 0, 7, 0, 9, 1, 0

预期结果:

Removed index columns file: 4, 5, 6, 7

    file_1 content:
    2231, 0, 5, 3, 3
    1322, 0, 5, 2, 5
    1233, 5, 5, 4, 6
    1543, 2, 5, 6, 1
    2341, 0, 5, 0, 2

    files_2 content:
    1322, 0, 5, 1, 2
    1432, 0, 5, 3, 7
    1434, 0, 5, 1, 4
    1132, 0, 5, 3, 5
    1434, 0, 5, 1, 0

是否可以使用这些 bash 命令来做到这一点？如果是这样，如何？任何其他解决方案也都不错，但我更喜欢 bash 命令。

最佳答案

您可以使用 awk 跳过这些全为零的列:

awk 'BEGIN { FS=OFS=", " }
NR==1 {
   for (i=2; i<=NF; i+=2)
      a[i]
} FNR==NR {
   for (i=2; i<=NF; i+=2)
      if (i in a && $i>0)
         delete a[i];
   next
} {
   for (i=1; i<=NF; i++)
      if (!(i in a))
         printf "%s%s", $i, (i<NF)? OFS : RS
}' file1 file1

输出:

2231, 0, 5, 9, 9, 3, 3
1322, 0, 5, 1, 9, 2, 5
1233, 5, 5, 3, 9, 4, 6
1543, 2, 5, 4, 9, 6, 1
2341, 0, 5, 7, 9, 0, 2

它使用数组 a 来保留输出中应该跳过的偶数列。

在第 1 遍时:

NR==1   # will run for first row to create an array a with even # of columns as index
FNR==NR # block will run for 1st pass of the file. It will delete entries from array a
        # if current value is greater than zero.
{...}   # in the 2nd pass we iterate each column and print if col is not in array a

更新:

根据下面的评论

awk 'BEGIN{FS=OFS=","}
FNR==NR {
   for (i=1; i<=NF; i++)
      sums[i] += $i;
   ++r;
   next
} {
   for (i=1; i<=NF; i++)
      if (sums[i] > 0 && sums[i+1]>0 && sums[i] != 100*r)
         printf "%s%s", (i>1)?OFS:"", $i;
      print ""
}' file file

关于linux - 在 Linux 中操作文本文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32489330/

linux - 在 Linux 中操作文本文件

上一篇：java - 在 Linux (Ubuntu) 中在 java 7 和 java 8 之间切换的推荐方法是什么？

下一篇：python - E501 行太长(99 > 79 个字符)