python - 仅解析 div 类 python 中的文本

标签 python python-2.7 beautifulsoup html-parsing

所以我想做的是阅读源代码，搜索名为“gsc_prf_il”的div类，然后在这个div类中，仅提取文本，忽略href链接。例如

<div class="gsc_prf_il"><a href="/citations?view_op=view_org&hl=en&org=13784427342582529234">McGill University</a></div>

但是当我使用此代码时，它不起作用，只给我错误:AttributeError:'NoneType'对象没有属性'contents'

soup=BeautifulSoup(p.readlines()[0], 'html.parser')
s=soup.find(id='gsc_prf_il')
scholar_info['department']= s.contents

然后我尝试了这个:

scholar_info['department']=[s.find('a')['href'], s.find('a').contents[0]]

这也不起作用。我究竟做错了什么？

最佳答案

只需找到div并拉出文本，您正在寻找soup.find(id='gsc_prf_il')它正在寻找具有 id 的元素的gsc_prf_il不是该类的 div:

from bs4 import BeautifulSoup
url = "http://python-data.dr-chuck.net/comments_283660.html"

soup = BeautifulSoup("""<div class="gsc_prf_il"><a href="/citations?view_op=view_org&hl=en&org=13784427342582529234">McGill University</a></div>""")

所以使用class_="gsc_prf_il" :

print(soup.find("div", class_="gsc_prf_il").text) -> McGill University

或者使用 CSS 选择器:

print(soup.select_one("div.gsc_prf_il").text) -> McGill University

关于python - 仅解析 div 类 python 中的文本，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39681359/

上一篇：python - Python 中的子进程隔离

下一篇：python - 如何在Python中调用Qt5中的UI类

python - 如何从二维列表中选择一个随机元素

python - Django相关模型无法解析

python - 在 django 应用程序中使用 beautifulsoup 和 requests 跟踪链接的正确语法是什么？

Python:通用的 getter 和 setter

python - 如何使用 python/django 从 gmail 或 yahoo 等各种服务导入联系人

python - 为什么Flask teardown_request在debug模式下获取不到异常对象(总是None，debug=True时正常)？

python - 如何使用 lxml 创建文档的子集？

python - 迭代大量 xml 文档

python - BeautifulSoup Python 到 Dataframe