python - 漂亮的汤检查标签中的标签

标签 python python-3.x screen-scraping beautifulsoup scraper

我正在使用 Beautiful Soup 4 来抓取页面。有一段我不想要的文本:

<p class="MsoNormal" style="text-align: center"><b>
                            <span lang="EN-US" style="font-family: Arial; color: blue">
                            <font size="4">1 </font></span>
                            <span lang="AR-SA" dir="RTL" style="font-family: Arial; color: blue">
                            <font size="4">&#1600;</font></span><span lang="EN-US" style="font-family: Arial; color: blue"><font size="4"> 
                            с&#1199;р&#1241; фати&#1211;&#1241;</font></span></b></p>

它的独特之处在于它有一个标签。我已经使用 findall() 来获取所有

标签。所以现在我有一个 for 循环，例如:

for el in doc.findall('p'):
    if el.hasChildTag('b'):
        break;

不幸的是bs4没有“hasChildTag”功能

最佳答案

也应该可以使用 css 选择器。

http://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors

soup.select("p b")

关于python - 漂亮的汤检查标签中的标签，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/14307042/

上一篇：Python递归错误，但没有使用递归

下一篇：Python - 在不使用 100% cpu 的情况下检查按键

python - 使用 Python 从 HTML 中提取歌曲长度和大小

java - 有没有一个库可以通过 AJAX/javascript 艰难地进行？

python - Scrapy 的日志处理程序

python-3.x - 在网格中查找最大值的坐标

python - 如何从静态网站开始查询？

python - Pymongo:bson.errors.InvalidDocument:无法编码对象:<pymongo.cursor.Cursor object at 0xc61990>

python - 使用 Python 在 quantlib 中为 float 债券定价

Python:使用批量 API V3 将订阅者添加到 mailchimp

python-3.x - 为什么我不能在 python 3 for Windows 中使用 pip 安装 hashlib？