python - 使用Python解析XML : Keeping text within attribute while deleting tag around it

标签 python xml parsing elementtree

Input:
<p>
<milestone n="14" unit="verse" />
 The name of the third river is
<placeName key="tgn,1130850" authname="tgn,1130850">Hiddekel</placeName>: this is the one which flows in front of Assyria. The fourth
river is the <placeName key="tgn,1123842" authname="tgn,1123842">Euphrates</placeName>. 
</p>

期望的输出:

<p>
<milestone n="14" unit="verse" />
 The name of the third river is Hiddekel: this is the one which flows in front of Assyria. The fourth river is the Euphrates. 
</p>

您好,我想找到一种从子元素 (placeName) 中提取文本并将其放回较大文本正文的方法。我在 XML 文件的其他地方也遇到类似的问题,例如人名。我希望能够在不删除里程碑的情况下提取名称和地点。感谢您的帮助!

当前代码:

for p in chapter.findall('p'):
    i = 1
    for text in p.itertext():
        file.write(body.attrib["n"] + " " + chapter.attrib["n"] + ":" +  str(i) + text)
        i = i + 1

最佳答案

可以使用 beautifulsoup 和 unwrap() 方法来完成:

from bs4 import BeautifulSoup as bs

snippet = """your html above"""

soup = bs(snippet,'lxml')
pl = soup.find_all('placename')
for p in pl:
    p.unwrap()
soup

输出:

<html><body><p>
<milestone n="14" unit="verse"></milestone>
 The name of the third river is
Hiddekel: this is the one which flows in front of Assyria. The fourth
river is the Euphrates. 
</p>
</body></html>

关于python - 使用Python解析XML : Keeping text within attribute while deleting tag around it,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59865426/

相关文章:

java - 无法删除 XML 中的特定节点

c# - 如何创建表示 DateTime 中日期的 XElement 类型为 xs :Date

regex - 解析源代码中的注释和字符串

python - Google Appengine 上的登录 Hook

python - 是否可以指定类方法的 `cls` 参数,就像普通方法的 `self` 一样?

Python pygame 如何设置FPS

java - 解析带有时区的时间戳

python - pyspark 将两个 rdd 合并在一起

xml - 解析器错误 : String not started expecting ' or " in php

java - 重新解析 Map 的干净方法