python - 替换 XML 文件中的单词

我需要编写 python 脚本来替换 xml 文件中所有出现的特定单词。我只需要替换标记中包含的禁用词。

这应该被替换:

<some_xml_tag>some text REPLACE_ME some text</some_xml_tag>

这不应该:

<some_xml_tag attr="REPLACE_ME">some text</some_xml_tag>

<REPLACE_ME>some text</REPLACE_ME>

我不是正则表达式专家，但它是否可行？

最佳答案

改用XML 解析器。

示例使用 lxml图书馆。这里我们使用 xpath() 搜索具有所需文本的节点，然后使用 replace() 替换它:

import lxml.etree as ET

ban_word = 'REPLACE_ME'
replacement = 'HELLO'

data = """<root>
    <some_xml_tag>REPLACE_ME</some_xml_tag>
    <some_xml_tag attr="REPLACE_ME">some text</some_xml_tag>
    <REPLACE_ME>some text</REPLACE_ME>
</root>
"""

root = ET.fromstring(data)

for item in root.xpath('//*[. = "%s"]' % ban_word):
    item.text = item.text.replace(ban_word, replacement)

print ET.tostring(root)

打印:

<root>
    <some_xml_tag>HELLO</some_xml_tag>
    <some_xml_tag attr="REPLACE_ME">some text</some_xml_tag>
    <REPLACE_ME>some text</REPLACE_ME>
</root>

注意事项:

比较不区分大小写
xml.etree.ElementTree 不会处理这种特殊方法，因为它仅提供有限的 xpath 支持
正如@tdelaney 在评论中指出的那样，如果您有一个要替换的单词列表，那么简单地遍历所有节点并在必要时替换文本可能是个好主意

关于python - 替换 XML 文件中的单词，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/27869571/

python - 替换 XML 文件中的单词

上一篇：python - 遍历文件中的行时进行循环？

下一篇：python - 用正则表达式插入字符