示例文本文件:
["abc","123","apple","red","<a href='link1'>zzz</a>"],
["abc","124","orange","blue","<a href='link1'>zzz</a>"],
["abc","125","almond","black","<a href='link1'>zzz</a>"],
["abc","126","mango","pink","<a href='link1'>zzz</a>"]
预期输出:
abc 123 apple red 'link1'>zzz
abc 124 orange blue 'link1'>zzz
abc 125 almond black 'link1'>zzz
abc 126 mango pink 'link1'>zzz
我只希望文件中没有大括号、由空格分隔的逗号,并且只获取行中最后一个元素的链接。
我尝试在 Python 中使用列表。
我不知道如何进行。猜猜,我在某处出错了。帮助将不胜感激。提前致谢:)
import sys
import re
Lines = [Line.strip() for Line in open (sys.argv[1],'r').readlines()]
for EachLine in Lines:
Parts = EachLine.split(",")
for EachPart in Parts:
EachPart = re.sub(r'[', '', EachPart)
EachPart = re.sub(r']', '', EachPart)
最佳答案
这可以使用以下脚本完成:
import csv
import re
with open('input.txt', 'r') as f_input, open('output.txt', 'w') as f_output:
csv_input = csv.reader(f_input, delimiter='"')
for cols in csv_input:
if cols:
cols = [x for x in cols[1:-1:2]]
link = re.search(r"('.*?)<", cols[-1])
if link:
cols[-1] = link.group(1)
f_output.write('{}\n'.format(' '.join(cols)))
这将为您提供 output.txt
包含:
abc 123 apple red 'link1'>zzz
abc 124 orange blue 'link1'>zzz
abc 125 almond black 'link1'>zzz
abc 126 mango pink 'link1'>zzz
更新 - 此代码的简化版本在 repl.it 上运行显示正确的输出。输入来自字符串,并显示输出。只需单击 Run
按钮即可。
更新 - 更新为跳过空行
关于python - 在 python 中格式化文本文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32473343/