python - 如何在选择器中使用部分文本而不是精确文本？

标签 python python-3.x web-scraping beautifulsoup css-selectors

我用 python 编写了一个脚本，用于从 torrent 站点收集电影名称及其类型。由于 BeautifulSoup 不支持伪选择器，我找到了一种技术来克服这个问题。我目前面临的唯一问题是，要获得结果，下面脚本中的反逗号内的文本必须准确。有什么方法可以在部分匹配中使用类似于 :contains 属性的内容，这样即使查询中的文本包含部分单词，我仍然会解析 Genre我在追。 [预计在脚本中使用 Gen 或 nre: 或 enr 而不是 Genre:]

这是脚本:

import requests 
from bs4 import BeautifulSoup

soup = BeautifulSoup(requests.get("https://www.yify-torrent.org/search/1080p/").text,"lxml")
for title in soup.select("div.mv"):
    names = title.select("h3 a")[0].text
    genre = ' '.join([item.next_sibling for item in title.select(".mdif li b") if item.text=="Genre:"])
    print(names, genre)

结果:

Swelter (2014) 1080p Action
Larry Crowne (2011) 1080p Comedy
Terminal Island (1973) 1080p Action
Heart of Midnight (1988) 1080p Drama
The Lift (1983) 1080p Fantasy

最佳答案

您可以简单地使用 in运算符检查字符串是否包含子字符串:

genre = ' '.join([item.next_sibling for item in title.select(".mdif li b") if "Genre:" in item.text])

您可以使用 if "Genre:"in item.text 以及 if "nre:"in item.text、if "Gen"in item.text 等...

关于python - 如何在选择器中使用部分文本而不是精确文本？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47163072/

上一篇：python - 使用当前ec2公共(public)IP配置apache虚拟主机

下一篇：python - 在 python selenium phantomjs 中单击可靠的随机链接

相关文章：

python - 如何在 Python 中的 if 语句内运行函数

python - 尝试访问 gandi api 时出现 xmlrpclib 错误 'module not found'

python - 如何计算按两列分组的数据框中的百分比

Python Scrapy - Ajax 分页 Tripadvisor

python - td 数据的网络抓取

python - 检查最近n分钟内是否出现过一个值

python - 来自 Itunes 的 EPFImporter 显示 undefined symbol : _PyObject_NextNotImplemented after just installing and running

python - 如何在 ajax 模板中为 Python 回调函数创建 Django 单击按钮

django - 'str' 的 Pylint Django 模型实例没有成员

python - 使用 selenium 进行网页抓取，单击具有相同/相似名称且输入文本忽略大小写的链接