python - 如何在 Python 中使用 Beautifulsoup 查找 div 内的所有 anchor 标签

标签 python html python-2.7 web-scraping beautifulsoup

这就是我正在解析的 HTML 的样子。它全部在一个表中并重复多次，我只想要 div 内的 href 属性值，其属性为 class="Special_Div_Name"。然后所有这些 div 都在表格行内，并且有很多行。

<tr>
   <div class="Special_Div_Name">
      <a href="something.mp3">text</a>
   </div>
</tr>

我想要的只是以“.mp3”结尾的 href 属性值，它们位于具有属性 class="Special_Div_Name" 的 div 中。

到目前为止，我能够想出这段代码:

download = soup.find_all('a', href = re.compile('.mp3'))
for text in download:
    hrefText = (text['href'])
    print hrefText

这段代码目前打印出页面上以“.mp3”结尾的每个 href 属性值，它非常接近我想要的效果。我只是想要那个 div 类中的“.mp3”。

最佳答案

这个小调整应该可以让你得到你想要的:

special_divs = soup.find_all('div',{'class':'Special_Div_Name'})
for text in special_divs:
    download = text.find_all('a', href = re.compile('\.mp3$'))
    for text in download:
        hrefText = (text['href'])
        print hrefText

关于python - 如何在 Python 中使用 Beautifulsoup 查找 div 内的所有 anchor 标签，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35471319/

上一篇：html - 如何使用CSS将覆盖固定的div定位到中心？

下一篇：html - 如何 'hide' :first-line for white-space:pre-line in FIREFOX browser

相关文章：

python - 漂亮的汤，使用 "findAll()"时完全匹配

html - 如何通过元素的 id 选择元素的曾孙

python - 为什么在 Windows 上启动新进程时 Python 的多处理模块会导入 __main__ ？

python - 无法通过 ERROR_INVALID_HANDLE 关闭套接字句柄 (6)

python - Linux 上最好的 Python IDE

javascript - 使用 BeautifulSoup 解析 Javascript 按钮元素中的 HTML

Python:在个人标准输出上编辑文本 - 双倍文本

html - 哪些 HTML 元素可以接收焦点？

python - 使用 os.walk() 时如何排除目录？其他方法没有效果

python - C(nanomsg)和Python(非nanomsg)之间的套接字连接