python - 有没有一种方法可以使用BeautifulSoup从HTML文件中提取所有类名？

标签 python beautifulsoup

<tr id="section_1asd8aa" class="main">
<td class="header">
  <table cellspacing="0" cellpadding="0">
      <tbody>
        <tr>
            <td style="font-family: arial,sans-serif; font-size: 11px;>DUMMY TEXT<a href="#">browser.</a>
            </td>
          </tr>
      </tbody>
    </table>
</td></tr>

上面是一个示例html，我想从html文件中提取所有的类名。
输出：'{“ c1”：“ main”，“ c2”：“ header”}'

最佳答案

您可以使用find_all获取一组节点，然后遍历该组节点并检查该节点是否具有class属性，如果具有，则返回该类：

from bs4 import BeautifulSoup
soup = BeautifulSoup("""<tr id="section_1asd8aa" class="main">
<td class="header">
  <table cellspacing="0" cellpadding="0">
      <tbody>
        <tr>
            <td style="font-family: arial,sans-serif; font-size: 11px;>DUMMY TEXT<a href="#">browser.</a>
            </td>
          </tr>
      </tbody>
    </table>
</td></tr>""", "html.parser")

要获取类名列表：

lst = [node['class'] for node in soup.find_all() if node.has_attr('class')]
lst
# [['main'], ['header']]

将列表转换为字典：

{"c"+str(i): v  for i, v in enumerate(lst)}
# {'c0': ['main'], 'c1': ['header']}

请注意，这些类包含在列表中，因为某些类可以具有多个值。如果需要，可以将列表作为单个字符串加入。

{"c"+str(i): " ".join(v)  for i, v in enumerate(lst)}
# {'c0': 'main', 'c1': 'header'}

关于python - 有没有一种方法可以使用BeautifulSoup从HTML文件中提取所有类名？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43600813/

上一篇：python - 将 FreeTDS ODBC 与 pyodbc 结合使用时，glibc 检测到损坏的双链表

下一篇：python - 试图弄清楚为什么我的程序在我尝试输入三项时只允许我输入一项

相关文章：

python - 在 python 中对数字总和和尊重标准进行排序的算法

Python 根据名称 move 文件

Python，BeautifulSoup，重新: How to convert extracted texts to dictionary from web?

python - table 和汤的问题

Python 网络抓取工具在输入 520 个 url 时卡住。它出什么问题了？

python-3.x - 如何解析所有现有数据计数变量

python - 模块 'cv2.ml' 没有属性 'dtree_create'

Python:规范化多维数组

python - 在 Keras 中使用有状态 LSTM 训练多变量多序列回归问题

python - 搜索正则表达式时忽略子节点