python - 如何使用 BeautifulSoup 获取最后一个 URL 链接元素

如何使用 BeautifulSoup 从给定页面获取最后一个 html 链接？我正在尝试获取其中包含 lenta.ru 的链接。但是，如果网页包含多个 lenta.ru，则会打印每个 lenta.ru。不过，我只想获取最后一个 lenta.ru 链接，这是翻译的指针链接。

我得到这些结果

http://lenta.ru/news/2012/09/03/ipsos/ https://uynaa.wordpress.com/2012/09/06/%d0%b0%d1%80%d0%b0%d0%b2%d0%b4%d1%83%d0%b3%d0%b0%d0%b0%d1%80-%d1%81%d0%b0%d1%80%d1%8b%d0%bd-%d0%b1%d1%8d%d0%bb%d1%8d%d0%b3/
http://lenta.ru/news/2012/09/04/endofobama/ https://uynaa.wordpress.com/2012/09/06/%d0%b0%d1%80%d0%b0%d0%b2%d0%b4%d1%83%d0%b3%d0%b0%d0%b0%d1%80-%d1%81%d0%b0%d1%80%d1%8b%d0%bd-%d0%b1%d1%8d%d0%bb%d1%8d%d0%b3/
http://lenta.ru/news/2012/09/04/response/ https://uynaa.wordpress.com/2012/09/06/%d0%b0%d1%80%d0%b0%d0%b2%d0%b4%d1%83%d0%b3%d0%b0%d0%b0%d1%80-%d1%81%d0%b0%d1%80%d1%8b%d0%bd-%d0%b1%d1%8d%d0%bb%d1%8d%d0%b3/
http://www.lenta.ru/articles/2012/09/05/threat/ https://uynaa.wordpress.com/2012/09/06/%d0%b0%d1%80%d0%b0%d0%b2%d0%b4%d1%83%d0%b3%d0%b0%d0%b0%d1%80-%d1%81%d0%b0%d1%80%d1%8b%d0%bd-%d0%b1%d1%8d%d0%bb%d1%8d%d0%b3/
http://lenta.ru/articles/2012/08/21/terranova/ https://uynaa.wordpress.com/2012/08/23/%d1%85%d2%af%d0%bd-%d0%b1%d0%b0-%d0%bc%d3%a9%d1%81/

预期输出

http://www.lenta.ru/articles/2012/09/05/threat/ https://uynaa.wordpress.com/2012/09/06/%d0%b0%d1%80%d0%b0%d0%b2%d0%b4%d1%83%d0%b3%d0%b0%d0%b0%d1%80-%d1%81%d0%b0%d1%80%d1%8b%d0%bd-%d0%b1%d1%8d%d0%bb%d1%8d%d0%b3/
http://lenta.ru/articles/2012/08/21/terranova/ https://uynaa.wordpress.com/2012/08/23/%d1%85%d2%af%d0%bd-%d0%b1%d0%b0-%d0%bc%d3%a9%d1%81/

我的代码

import re
import requests
from lxml import html
from bs4 import BeautifulSoup
from urllib.request import urlopen

with open("./uynaa.txt") as inFile:
    uynaa_txt = inFile.readlines()

for tmp in uynaa_txt:

    html = urlopen(tmp).read()
    soup = BeautifulSoup(html, "lxml")

    for a in soup.select('div.entry a'):
        if "lenta.ru" in a.get('href', ''):
            print(a, tmp)

uynaa.txt

https://uynaa.wordpress.com/2012/09/06/%d0%b0%d1%80%d0%b0%d0%b2%d0%b4%d1%83%d0%b3%d0%b0%d0%b0%d1%80-%d1%81%d0%b0%d1%80%d1%8b%d0%bd-%d0%b1%d1%8d%d0%bb%d1%8d%d0%b3/
https://uynaa.wordpress.com/2012/08/23/%d1%85%d2%af%d0%bd-%d0%b1%d0%b0-%d0%bc%d3%a9%d1%81/

最佳答案

解决方案

soup.select('div.entry a')[-1]

说明

soup.select 返回一个列表。您可以使用 [-1] 检索列表中的最后一项。 如果页面只有一个匹配的链接，则最后一项也将是第一项，但这不会给您带来任何影响问题。

# full working code

from bs4 import BeautifulSoup
example_page = """
<body>
<a href="http://lenta.ru/news/2012/09/03/ipsos/ https://uynaa.wordpress.com/2012/09/06/%d0%b0%d1%80%d0%b0%d0%b2%d0%b4%d1%83%d0%b3%d0%b0%d0%b0%d1%80-%d1%81%d0%b0%d1%80%d1%8b%d0%bd-%d0%b1%d1%8d%d0%bb%d1%8d%d0%b3/"></a>
<a href="http://lenta.ru/news/2012/09/04/endofobama/ https://uynaa.wordpress.com/2012/09/06/%d0%b0%d1%80%d0%b0%d0%b2%d0%b4%d1%83%d0%b3%d0%b0%d0%b0%d1%80-%d1%81%d0%b0%d1%80%d1%8b%d0%bd-%d0%b1%d1%8d%d0%bb%d1%8d%d0%b3/" ></a>
<a href="http://lenta.ru/news/2012/09/04/response/ https://uynaa.wordpress.com/2012/09/06/%d0%b0%d1%80%d0%b0%d0%b2%d0%b4%d1%83%d0%b3%d0%b0%d0%b0%d1%80-%d1%81%d0%b0%d1%80%d1%8b%d0%bd-%d0%b1%d1%8d%d0%bb%d1%8d%d0%b3/" ></a>
</body>
"""
soup = BeautifulSoup(example_page, "lxml")

print(soup.body.select("a")[-1])

关于python - 如何使用 BeautifulSoup 获取最后一个 URL 链接元素，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/63904967/

python - 如何使用 BeautifulSoup 获取最后一个 URL 链接元素

解决方案

说明

上一篇：r-exams - 重新考试 mchoice : Error in moodlePercent(frac) : Percentage not in list of moodle fractions

下一篇：c# - .Net Core 单元测试错误 - 源 IQueryable 未实现 IAsyncEnumerable<...>