Python XML 解析并计算字符串的出现次数，然后输出到 Excel

这就是我的难题!

我有 100 多个 XML 文件，需要解析它们并通过标签名称(或正则表达式)查找字符串。

一旦找到该字符串/标签值，我需要计算它出现的次数(或找到该字符串的最高值。)

示例:

<content styleCode="Bold">Value 1</content>
<content styleCode="Bold">Value 2</content>
<content styleCode="Bold">Value 3</content>

<content styleCode="Bold">Another Value 1</content>
<content styleCode="Bold">Another Value 2</content>
<content styleCode="Bold">Another Value 3</content>
<content styleCode="Bold">Another Value 4</content>

所以基本上我想解析 XML，找到上面列出的标签，并将找到的最高值输出到 Excel 电子表格。电子表格已有标题，因此仅将数值输出到 Excel 文件。

因此输出将在 Excel 中:

Value    Another Value
3               4

每个文件都会输出到另一行。

最佳答案

我不确定您的 XML 文件是如何命名的。对于简单的情况，假设它们是按以下模式命名的:

file1.xml，file2.xml，...它们存储在与 python 脚本相同的文件夹中。

然后您可以使用以下代码来完成这项工作:

import xml.etree.cElementTree as ElementTree
import re
from xlrd import open_workbook
from xlwt import Workbook
from xlutils.copy import copy

def process():
    for i in xrange(1, 100): #loop from file1.xml to file99.xml
        resultDict = {}
        xml = ElementTree.parse('file%d.xml' %i)
        root = xml.getroot()
        for child in root:
            value = re.search(r'\d+', child.text).group()
            key = child.text[:-(1+len(value))]
            try:
                if value > resultDict[key]:
                    resultDict[key] = value
            except KeyError:
                resultDict[key] = value

        rb = open_workbook("names.xls")
        wb = copy(rb)
        s = wb.get_sheet(0)
        for index, value in enumerate(resultDict.values()):
            s.write(i, index, value)
        wb.save('names.xls')

if __name__ == '__main__':
    process()

关于Python XML 解析并计算字符串的出现次数，然后输出到 Excel，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/30732696/

Python XML 解析并计算字符串的出现次数，然后输出到 Excel

上一篇：python - 处理 numpy.cov 中缺失观察结果的简洁方法？

下一篇：python 正则表达式与 re.split()