我有一个制表符分隔的文本文件,每个记录有 10 列,如下所示:
p001 64 20141209 meals (attendees) ML ENTER Entertainment xyz Restaurants 6.0 "_e' Restaurants (123) 456-7890 \r\n FORUM \r\n ,Around \r\n\r\n':33 113-2 \r\n\r\n 8440 XYZ09'15 1:11PM \r\n\r\n 1 Burger 6.00 \r\n\r\n SSIONS 6.00 \r\n TOTAL PAID 6 .00 \r\n XXXXXXXXXXX2012 XX/XX \r\n XYZ EXPRESS
6.00 \r\n\r\n\r\n 7,-10( YOU! FOR DINING WITH US! \r\n\r\n 113-2 \r\n\r\nYour r is: 840 \r\n"
P.S:最后一列的文本用“”括起来。我的第一专栏并不是独一无二的。
我想将此文本文件转换为 csv 文件,以便仅从记录的第 1、2、8、9 和 10 列中选取数据。另外,所有数据都应该用“”括起来。
例如,上述记录应转换为输出 csv 文件中的以下行:
"p001","64","xyz Restaurants","6.0","_e' Restaurants (123) 456-7890 \r\n FORUM \r\n ,Around \r\n\r\n':33 113-2 \r\n\r\n 8440 XYZ09'15 1:11PM \r\n\r\n 1 Burger 6.00 \r\n\r\n SSIONS 6.00 \r\n TOTAL PAID 6 .00 \r\n XXXXXXXXXXX2012 XX/XX \r\n XYZ EXPRESS
6.00 \r\n\r\n\r\n 7,-10( YOU! FOR DINING WITH US! \r\n\r\n 113-2 \r\n\r\nYour r is: 840 \r\n"
最佳答案
这应该适合你。请注意,这使用 csv对于输入和输出库,我们只需更改文本分隔符。 CSV当您写入文件时,应该自动转义引号字符。
import csv
try:
with open(r'input.tsv', 'r', newline='\n') as in_f, \
open(r'output.csv', 'w', newline='\n') as out_f:
reader = csv.reader(in_f, delimiter='\t')
writer = csv.writer(out_f, delimiter=',', quoting=csv.QUOTE_ALL) # Quoting added per comment from @Rob.
for li in reader:
try:
writer.writerow([li[0], li[1], li[2], li[7], li[8], li[9],])
except IndexError: # Prevent errors on blank lines.
pass
except IOError as err:
print(err)
我无法解析出制表符应位于示例数据中的位置(而不是空格),但使用以下 input.tsv
数据对其进行测试:
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
将在 output.csv
中生成以下结果:
"1","2","3","8","9","10"
"11","12","13","18","19","20"
"21","22","23","28","29","30"
更新
请注意,添加 quoting=csv.QUOTE_ALL
的代码更新是根据 Rob 评论中的建议进行的。感谢您的捕获!
关于Python - 以特定方式将制表符分隔文件转换为 csv,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29951191/