给定一个 html 代码让我们说:
<div class="class1">
<span class="class2">some text</span>
<span class="class3">some text</span>
<span class="class4">some text</span>
</div>
如何检索所有类名?即:['class1','class2','class3','class4']
我试过:
soup.find_all(class_=True)
但它会检索整个标签,然后我需要对字符串做一些正则表达式
最佳答案
您可以 treat each Tag
instance found as a dictionary在检索属性时。请注意,class
属性值将是一个列表,因为class
是一个特殊的"multi-valued" attribute。 :
classes = []
for element in soup.find_all(class_=True):
classes.extend(element["class"])
或者:
classes = [value
for element in soup.find_all(class_=True)
for value in element["class"]]
演示:
from bs4 import BeautifulSoup
data = """
<div class="class1">
<span class="class2">some text</span>
<span class="class3">some text</span>
<span class="class4">some text</span>
</div>
"""
soup = BeautifulSoup(data, "html.parser")
classes = [value
for element in soup.find_all(class_=True)
for value in element["class"]]
print(classes)
# Returns
# ['class1', 'class2', 'class3', 'class4']
关于Python,美汤,获取所有类名,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43751699/