bash - 如何按与另一组文件对应的行数拆分文本文件？

根据列表中的编号将一个文件切割成多个文件:

$ wc -l all.txt
    8500   all.txt

$ wc -l STS.*.txt  
   2000 STS.input.answers-forums.txt
   1500 STS.input.answers-students.txt
   2000 STS.input.belief.txt
   1500 STS.input.headlines.txt
   1500 STS.input.images.txt

如何将我的 all.txt 拆分为编号。 STS.*.txt 的行，然后将它们保存到相应的 STS.output.*.txt？

我一直在手动执行此操作:

$ sed '1,2000!d' all.txt > STS.output.answers-forums.txt
$ sed '2001,3500!d' all.txt > STS.output.answers-students.txt
$ sed '3501,5500!d' all.txt > STS.output.belief.txt
$ sed '5501,7000!d' all.txt > STS.output.headlines.txt
$ sed '7001,8500!d' all.txt > STS.output.images.txt

all.txt 输入看起来像这样:

$ head all.txt
2.3059
2.2371
2.1277
2.1261
2.0576
2.0141
2.0206
2.0397
1.9467
1.8518

有时all.txt看起来像这样:

$ head all.txt
2.3059  92.123
2.2371  1.123
2.1277  0.12452
2.1261123   213
2.0576  100
2.0141  0
2.02062 1
2.03972 34.123
1.9467  9.23
1.8518  9123.1

对于STS.*.txt，它们只是纯文本行，例如:

$ head STS.output.answers-forums.txt
The problem likely will mean corrective changes before the shuttle fleet starts flying again.   He said the problem needs to be corrected before the space shuttle fleet is cleared to fly again.
The technology-laced Nasdaq Composite Index .IXIC inched down 1 point, or 0.11 percent, to 1,650.   The broad Standard & Poor's 500 Index .SPX inched up 3 points, or 0.32 percent, to 970.
"It's a huge black eye," said publisher Arthur Ochs Sulzberger Jr., whose family has controlled the paper since 1896.   "It's a huge black eye," Arthur Sulzberger, the newspaper's publisher, said of the scandal.

最佳答案

希望您发布了一些示例输入，用于将 10 行的输入文件拆分为 2、3 和 5 行的输出文件，而不是 8500 行......因为这会给我们带来好处用于测试解决方案的东西。哦，好吧，这可能有效，但当然未经测试:

awk '
ARGIND < (ARGC-1) { outfile[NR] = gensub(/input/,"output","",FILENAME); next }
{ print > outfile[FNR] }
' STS.input.* all.txt

上面使用了 GNU awk 作为 ARGIND 和 gensub()。

它只是创建一个数组，将所有“输入”文件中的每个行号映射到应写入“all.txt”的相同行号的“输出”文件的名称。

每当您在 shell 中编写循环只是为了操作文本时，您的方法都是错误的。创建 shell 的人还创建了 awk，以便 shell 调用来操作文本，所以就这么做吧。

关于bash - 如何按与另一组文件对应的行数拆分文本文件？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/27553750/

bash - 如何按与另一组文件对应的行数拆分文本文件？

上一篇：c# - 如何在 Controller 中生成防伪 cookie 和 token

下一篇：带字符串参数的 Lua popen