python - BeautifulSoup 迭代多个 XML 标签，提取字符串列表

标签 python xml beautifulsoup iterator

# Sample XML file.
xml = """
<1 sno=1>
    <2>
        Some content
    </2>
    <2>
        Some other content
    </2>
    <2>
        Some more contents
    </2>
</1>
<1 sno=2>
<2>
        Some content
    </2>
    <2>
        Some other content
    </2>
    <2>
        Some more contents
    </2>
</1>
<1 sno=3>
<2>
        Some content
    </2>
    <2>
        Some other content
    </2>
    <2>
        Some more contents
    </2>
</1>
"""

这是示例 XML 文件；我想处理所有 <1> 标签。

首先我需要找到所有 1 个标签，
其次，以列表的形式获取内容。我希望 <2> 是单独的列表元素。例如我期待像 ['<2>','some content','</2>' .....] 这样的列表而不是这样['<2>Some content</2>' , ....]

from bs4 import BeautifulSoup as BS

xml = BS(xml)
xmlList = []
for line in xml.1:
    xmlList.append(line)
print xmlList    

# To grab multiple '1' tags:
from bs4 import BeautifulSoup as BS

xml = BS(xml)
xmlList = []
for line in xml.findall('1'):
    xmlList.append(line)
print xmlList

显示类似 ['<2>Some content</2>' , ....] 的列表，这是我不想要的。

如果我使用 find_all()抓取所有'1'标签的语句，结果是一样的。如何克服这个问题？

最佳答案

在添加到列表之前将结尾切掉怎么样？

for line in xml.findAll('1'):  #also should be findAll() vs findall()
    xmlList.append(line[:-4])

关于python - BeautifulSoup 迭代多个 XML 标签，提取字符串列表，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/23693110/

上一篇：xml - 迭代不起作用，只返回一个项目

下一篇：mysql - 加载 XML INFILE 和子列

相关文章：

python - django图像上传表单

java - 异常 : No protocol while reading XML

xml - XQuery 列表节点包含所有属性

python - 漂亮的汤刮刀和 if...else

Netbeans 中的 Python 引用外部模块

python - 编译Python，为什么会忽略一些错误的东西？

Python-sklearn.MLPClassifier : How to obtain output of the first hidden layer

java - 使用多个命名空间解析 Youtube XML 响应

python - 美丽汤找不到元素

python - Python 中 BeautifulSoup 中的 .find()