jquery - 等效于 BeautifulSoup/Python 中的 contains() 选择器

标签 jquery python web-scraping beautifulsoup

使用 jQuery 选择器，您可以选择包含带 $("div:contains('John')") 的 innerText“John”的 div。 , 所以你可以匹配第二个 <div>在:

<div>Bill</div>
<div>John</div>
<div>Joe</div>

如何在 Python 的 Beautiful Soup 或其他 Python 模块中执行此操作？

我刚看了a lecture on scraping form PyCon 2010他提到你可以在 lxml 中使用 CSS 选择器。我是否必须使用它，或者有没有办法只使用 Soup？

背景:询问解析抓取网页的目的。

最佳答案

使用 BeautifulSoup 的更简洁的方法:

>>> soup('div', text='John')
[u'John']
>>> import re
>>> soup('div', text=re.compile('Jo'))
[u'John', u'Joe']

soup() 等同于 soup.findAll()。您可以使用字符串、正则表达式、任意函数来选择您需要的内容。

标准库的 ElementTree在你的情况下就足够了:

from xml.etree import cElementTree as etree

xml = """
    <div>Bill</div>
    <div>John</div>
    <div>Joe</div>
"""
root = etree.fromstring("<root>%s</root>" % xml)
for div in root.getiterator('div'):
    if "John" in div.text:
       print(etree.tostring(div))

关于jquery - 等效于 BeautifulSoup/Python 中的 contains() 选择器，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/8569322/

上一篇： python : Display a Dict of Dicts using a UI Tree for the keys and any other widget for the values

下一篇：python:使列表的元素在一定范围内

python - 删除 DataFrame 中的多个空白

python - 通过 webdriver 点击 javascript 弹出窗口

python - 查找列表的公共(public)元素

python - 从 2D 矩阵中随机选取样本并将索引保留在 python 中

javascript - 如何在 javascript 中加载 html 页面

vba - 无法使用 selenium VBA 从网页的下拉列表中进行选择

javascript - 刷卡轮播 : How to give different properties to the same carousel for different screen sizes

javascript - 使用 ajax 从 mysql 添加和检索记录

javascript - Jquery- offsetLeft 在向右滚动时不起作用