html代码是这样的:
<div class="AAA">Text of AAA<a href="......AAA/url">Display text of URL A</a></div>
<div class="BBB">Text of BBB<a href="......BBB/url">Display text of URL B</a></div>
<div class="CCC">Text of CCC</div>
<div class="DDD">Text of DDD</div>
我想解析所有div的文本,同时检查是否存在url,如果存在则将其提取出来并显示在输出中
输出如下:
Text of AAA
Display text of URL A
......AAA/url
Text of BBB
Display text of URL B
......BBB/url
Text of CCC
Text of DDD
我试图将 find_all('a') 的循环嵌套在 find_all('div') 循环中,但弄乱了我的输出
最佳答案
from bs4 import BeautifulSoup
html="""
<div class="AAA">Text of AAA<a href="......AAA/url">Display text of URL A</a></div>
<div class="BBB">Text of BBB<a href="......BBB/url">Display text of URL B</a></div>
<div class="CCC">Text of CCC</div>
<div class="DDD">Text of DDD</div>
"""
soup = BeautifulSoup(html, "lxml")
for div in soup.findAll('div'):
print(div.text)
try:
print(div.find('a').text)
print(div.find('a')["href"])
except AttributeError:
pass
输出
Text of AAADisplay text of URL A
Display text of URL A
......AAA/url
Text of BBBDisplay text of URL B
Display text of URL B
......BBB/url
Text of CCC
Text of DDD
关于python - 如何检查 <a href> 元素是否存在于 <div> 元素中?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53904852/