Python BeautifulSoup 不会从 XML 返回标签

标签 python xml parsing beautifulsoup xml-parsing

在我的工作目录中给出以下名为 test.xml 的 XML:

<workbook>
    <style>
          <style-rule element='worksheet'>
            <format attr='font-family' value='Tahoma' />
            <format attr='font-size' value='15' />
            <format attr='font-weight' value='bold' />
            <format attr='color' value='#ffbe7d' />
          </style-rule>
    </style>
</workbook>

我正在尝试返回 style-rule 中的元素，并最终返回每个 format 元素。我已经尝试了下面的 python 代码并返回了 None:

from bs4 import BeautifulSoup
import os

with open(os.getcwd()+'//test.xml') as xmlfile:
    soup = BeautifulSoup(xmlfile, 'html.parser')
    print(soup.style.find('style-rule'))

由于元素名称中存在连字符，我知道使用 find 命令，并且在 xml 文件的其他带连字符的部分成功使用了此技术。但是，由于某种我不知道的原因，这个实例给我带来了问题。

最佳答案

问题不是因为连字符，如果您尝试打印样式标签的 innerText，出于某种原因您将获得字符串类型的样式规则。

我的猜测是样式标签通常带有在 bs4 中被视为字符串的内容，但在这里您将其用作 html 容器。

解决方法:

from bs4 import BeautifulSoup
import os

soup = BeautifulSoup(text)
soup = BeautifulSoup(soup.find('style').text)

for format in soup.select('style-rule > format'):
  print(format)

演示: Here

关于Python BeautifulSoup 不会从 XML 返回标签，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58532451/

上一篇：python - Discord.py - 如何从 Bot Command 中提及 Discord 用户

下一篇：python - 如何在 Plotly 3D 散点图中设置样式/格式化点标记？

php - 实现编译为 PHP 的元语言的最佳方法

python - python中的错误- 'NoneType'类型的对象没有len()

python - 在 Python 中匹配长度超过八个字母的正则表达式

python - TimedJSONWebSignatureSerializer 与 URLSafeTimedSerializer : When should I use what?

python - 修改离散 LinearSegmentedColormap

android - 如何在 activity(dot)xml 中包装 android 代码

java - Button.setOnclickListener(this) 的问题

c# - 使用 FileHelper 库解析具有 n 级层次结构的位置记录文件

xml - Bash:将 XML block 解析为数组