Python 3 : Unable to convert XML to dict using xmltodict

我正在尝试将数据从 XML 文件转换为 python 字典，但无法执行此操作。以下是我正在编写的代码。

import xmltodict
input_xml  = 'data.xml'  # This is the source file

with open(input_xml, encoding='utf-8', errors='ignore') as _file:
    data = _file.read()
    data = xmltodict.parse(data,'ASCII')
    print(data)
    exit()

执行此代码时，出现以下错误:
xml.parsers.expat.ExpatError:格式不正确(无效标记):第 239 行，第 40 列。
经过多次点击和尝试，我意识到我的xml在特定标签内有一些印地语字符，如下所示

<DECL>!! आप की सेवा में पुनः पधारे !!</DECL>

如何在运行 xmltodict.parse 之前忽略这些未编码的字符？

最佳答案

我猜这个问题与您正在读取的文件的编码有关。你为什么要尝试用“ASCII”来解析它？？

如果您尝试从不带 ASCII 的 Python 字符串读取相同的 XML，它应该可以正常工作:

import xmltodict
xml = """<DECL>!! आप की सेवा में पुनः पधारे !!</DECL>"""
xmltodict.parse(xml, process_namespaces=True)

结果:

OrderedDict([('DECL', '!! आप की सेवा में पुनः पधारे !!')])

使用具有单个输入行的文件，我可以正确解析它:

import xmltodict
input_xml  = 'tmp.txt'  # This is the source file

with open(input_xml, encoding='utf-8', mode='r') as _file:
    data = _file.read()
    data = xmltodict.parse(data)
    print(data)

问题很可能是您试图将其解析为“ASCII”。

关于Python 3 : Unable to convert XML to dict using xmltodict，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56804129/

Python 3 : Unable to convert XML to dict using xmltodict

上一篇：python - 当输入的 dtype 为 uint8 时，tf.keras.Model.save 抛出 Not JSON Serialized

下一篇：python - 更新防火墙后面的 conda