这是我的txt文件:
In File Name: C:\Users\naqushab\desktop\files\File 1.m1
Out File Name: C:\Users\naqushab\desktop\files\Output\File 1.m2
In File Size: Low: 22636 High: 0
Total Process time: 1.859000
Out File Size: Low: 77619 High: 0
In File Name: C:\Users\naqushab\desktop\files\File 2.m1
Out File Name: C:\Users\naqushab\desktop\files\Output\File 2.m2
In File Size: Low: 20673 High: 0
Total Process time: 3.094000
Out File Size: Low: 94485 High: 0
In File Name: C:\Users\naqushab\desktop\files\File 3.m1
Out File Name: C:\Users\naqushab\desktop\files\Output\File 3.m2
In File Size: Low: 66859 High: 0
Total Process time: 3.516000
Out File Size: Low: 217268 High: 0
我正在尝试将其解析为这样的 XML 格式:
<?xml version='1.0' encoding='utf-8'?>
<root>
<filedata>
<InFileName>File 1.m1</InFileName>
<OutFileName>File 1.m2</OutFileName>
<InFileSize>22636</InFileSize>
<OutFileSize>77619</OutFileSize>
<ProcessTime>1.859000</ProcessTime>
</filedata>
<filedata>
<InFileName>File 2.m1</InFileName>
<OutFileName>File 2.m2</OutFileName>
<InFileSize>20673</InFileSize>
<OutFileSize>94485</OutFileSize>
<ProcessTime>3.094000</ProcessTime>
</filedata>
<filedata>
<InFileName>File 3.m1</InFileName>
<OutFileName>File 3.m2</OutFileName>
<InFileSize>66859</InFileSize>
<OutFileSize>217268</OutFileSize>
<ProcessTime>3.516000</ProcessTime>
</filedata>
</root>
这是我试图实现的代码(我使用的是 Python 2):
import re
import xml.etree.ElementTree as ET
rex = re.compile(r'''(?P<title>In File Name:
|Out File Name:
|In File Size: Low:
|Total Process time:
|Out File Size: Low:
)
(?P<value>.*)
''', re.VERBOSE)
root = ET.Element('root')
root.text = '\n' # newline before the celldata element
with open('Performance.txt') as f:
celldata = ET.SubElement(root, 'filedata')
celldata.text = '\n' # newline before the collected element
celldata.tail = '\n\n' # empty line after the celldata element
for line in f:
# Empty line starts new celldata element (hack style, uggly)
if line.isspace():
celldata = ET.SubElement(root, 'filedata')
celldata.text = '\n'
celldata.tail = '\n\n'
# If the line contains the wanted data, process it.
m = rex.search(line)
if m:
# Fix some problems with the title as it will be used
# as the tag name.
title = m.group('title')
title = title.replace('&', '')
title = title.replace(' ', '')
e = ET.SubElement(celldata, title.lower())
e.text = m.group('value')
e.tail = '\n'
# Display for debugging
ET.dump(root)
# Include the root element to the tree and write the tree
# to the file.
tree = ET.ElementTree(root)
tree.write('Performance.xml', encoding='utf-8', xml_declaration=True)
但是我得到的是空值,是否可以将此 txt 解析为 XML?
最佳答案
对你的正则表达式的更正:它应该是
m = re.search('(?P<title>(In File Name)|(Out File Name)|(In File Size: *Low)|(Total Process time)|(Out File Size: *Low)):(?P<value>.*)',line)
而不是你给的。因为在您的正则表达式中,In File Name|Out File Name
意味着,它将检查 In File Nam
后跟 e
或 O
后跟 ut 文件名
等等。
建议,
您无需使用正则表达式即可完成。 xml.dom.minidom 可用于美化您的 xml 字符串。
为了更好地理解,我添加了内联评论!
Node.toprettyxml([indent=""[, newl=""[, encoding=""]]])
Return a pretty-printed version of the document. indent specifies the indentation string and defaults to a tabulator; newl specifies the string emitted at the end of each line and defaults to
编辑
import itertools as it [line[0] for line in it.groupby(lines)]
you can use groupby of itertools package to group consucutive dedup in list lines
所以,
import xml.etree.ElementTree as ET
root = ET.Element('root')
with open('file1.txt') as f:
lines = f.read().splitlines()
#add first subelement
celldata = ET.SubElement(root, 'filedata')
import itertools as it
#for every line in input file
#group consecutive dedup to one
for line in it.groupby(lines):
line=line[0]
#if its a break of subelements - that is an empty space
if not line:
#add the next subelement and get it as celldata
celldata = ET.SubElement(root, 'filedata')
else:
#otherwise, split with : to get the tag name
tag = line.split(":")
#format tag name
el=ET.SubElement(celldata,tag[0].replace(" ",""))
tag=' '.join(tag[1:]).strip()
#get file name from file path
if 'File Name' in line:
tag = line.split("\\")[-1].strip()
elif 'File Size' in line:
splist = filter(None,line.split(" "))
tag = splist[splist.index('Low:')+1]
#splist[splist.index('High:')+1]
el.text = tag
#prettify xml
import xml.dom.minidom as minidom
formatedXML = minidom.parseString(
ET.tostring(
root)).toprettyxml(indent=" ",encoding='utf-8').strip()
# Display for debugging
print formatedXML
#write the formatedXML to file.
with open("Performance.xml","w+") as f:
f.write(formatedXML)
输出: 性能.xml
<?xml version="1.0" encoding="utf-8"?>
<root>
<filedata>
<InFileName>File 1.m1</InFileName>
<OutFileName>File 1.m2</OutFileName>
<InFileSize>22636</InFileSize>
<TotalProcesstime>1.859000</TotalProcesstime>
<OutFileSize>77619</OutFileSize>
</filedata>
<filedata>
<InFileName>File 2.m1</InFileName>
<OutFileName>File 2.m2</OutFileName>
<InFileSize>20673</InFileSize>
<TotalProcesstime>3.094000</TotalProcesstime>
<OutFileSize>94485</OutFileSize>
</filedata>
<filedata>
<InFileName>File 3.m1</InFileName>
<OutFileName>File 3.m2</OutFileName>
<InFileSize>66859</InFileSize>
<TotalProcesstime>3.516000</TotalProcesstime>
<OutFileSize>217268</OutFileSize>
</filedata>
</root>
希望对您有所帮助!
关于python - 如何将 .txt 文件解析为 .xml?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42835956/