python - 使用 Python 和 Beautifulsoup4 在指定元素后提取链接 URL

我正在尝试使用 python 和 beautifulsoup 库从页面中提取链接，但我被卡住了。该链接位于下一页侧边栏区域，位于 h4 副标题“原始来源:

”的正下方

http://www.eurekalert.org/pub_releases/2016-06/uonc-euc062016.php

我已经成功隔离了链接(大部分)，但我不确定如何进一步推进我的定位以实际提取链接。到目前为止，这是我的代码:

import requests
from bs4 import BeautifulSoup

url = "http://www.eurekalert.org/pub_releases/2016-06/uonc-euc062016.php"
data = requests.get(url)
soup = BeautifulSoup(data.text, 'lxml')

source_url = soup.find('section', class_='widget hidden-print').find('div', class_='widget-content').findAll('a')[-1]

print(source_url)

我目前正在获取我隔离的最后一个元素的完整 html，我试图在其中简单地获取链接。值得注意的是，这是我试图获取的页面上的唯一链接。

最佳答案

您正在寻找 href html 属性的链接。 source_url 是一个 bs4.element.Tag ，它具有 get 方法，例如:

source_url.get('href')

关于python - 使用 Python 和 Beautifulsoup4 在指定元素后提取链接 URL，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37930164/

上一篇：python - 选择 NumPy 数组中每列中出现的所有前 K 值

下一篇：python - 有没有更快的方法来查找两个数组(Python)中的匹配特征？

python - 来自 Colab 的 Google API 问题

python - 美丽汤4 : Missing Parsed Table Data

python - 我如何编写给定代码的列表理解？

javascript - 使用 Python-BeautifulSoup 和 urllib 抓取奇怪的 html 设置

python - 将Python列表项逐行插入MySQL表中

php - 使用 curl 从 asp.net 页面获取数据

python - 如何用美汤正确提取ul中的li元素？

Python Beautifulsoup htmltable提取问题

python - 不断增长的 matplotlib 条形图