javascript - 如何从 Beautiful Soup 获取 URL？

标签 javascript python html beautifulsoup html-parsing

我是 Python 的新手，正在尝试编写一个爬虫程序；我想使用 Beautiful Soup 从 BBC 新闻中抓取一些数据。

但是当我用 Firebug 检查元素时，我发现这个页面中的 HTML 没有 URL 链接。

<li class="">
<a class="navigation-wide-list__link navigation-arrow--open" data-panel-id="js-navigation-panel-World" href="/news/world">
    <span>World</span>
</a>

在 href = '/news/world' 中，它不显示真实的 URL 链接。如果我想爬取这个网页的所有链接怎么办？这是因为该网站正在使用 Javascript 吗？

最佳答案

您需要根据基本/当前 URL 和来自 href 值的相对值生成一个绝对 URL。推荐的方法是使用 urlparse.urljoin() :

from urlparse import urljoin  # on Python 3: from urllib.parse import urljoin

absolute_url = urljoin(url, href)

关于javascript - 如何从 Beautiful Soup 获取 URL？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36988287/

上一篇：html - 使用内联 block 的 div 重叠

下一篇：jquery - 跨越行jquery

相关文章：

javascript - 从具有相同类的元素集中选择随机 id

javascript - 未捕获的类型错误 : Cannot read property 'length' of undefined when there is a parent

python - 告诉 urllib2 使用自定义 DNS

python - 在 Python Tkinter GUI-PY_VAR21 错误中使用 DataFrame 的多个动态选项菜单

python - 带有 DRF 错误用户名字段的 Djoser 是必需的

html - CSS - 以百分比表示的圆 Angular div 边框

html - 添加 li 的所有属性

javascript - 位置和 slider (见图)

javascript - setTimeout jquery 中的 ajax 调用

javascript - 如何在没有 Node.js/NPM 的情况下将我的 Web 应用程序转换为 Angular 4