python - 无法使用 BS4 从 <a> 标签中提取 href 值

我正在使用 BS4 进行网页抓取，并具有以下 html :

<a style="display:inline; position:relative;" href="

                                      /aems/file/filegetrevision.do?fileEntityId=8120070&cs=LU31NT9us5P9Pvkb1BrtdwaCrEraskiCJcY6E2ucP5s.xyz">
                                Screenshot.docx                      </a>

现在如何使用 BS4 获取 href 的值，无法获取。你能帮忙吗？

谢谢，

最佳答案

这还不够吗？

for a in soup.find_all('a', href=True):
    print a['href']

如果你需要你可以在 find_all 中使用 attrs:

soup.find_all("div", {"style": "display:inline; position:relative;"})

去除空格并使链接成为绝对链接:

import urlparse
urlparse.urljoin(url, a['href'].strip())

关于python - 无法使用 BS4 从 <a> 标签中提取 href 值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/14196538/

上一篇：Python Pyramid - URL 调度和遍历问题与哈希 (/)

下一篇：python - python 中的列表扩展

相关文章：

python - 确定稀疏矩阵(Lil 矩阵)的稀疏性

java - Python 脚本在调用子进程后不继续

Python，BeautifulSoup 寻找 HTML 片段

python - 为什么我从 BeautifulSoup 获得的 HTML 与我检查元素时看到的不一样？

python - 抓取 Google Scholar 时防止 503 错误

python - django查询中传入参数

python - 如何从代码而非数据库的角度主动避免重复行？

python - 将数据透视表转换为 Pandas 中的 "tidy"数据框

python - BeautifulSoup : Get specific text that has no specific class

python - 无法从网页中选择正确的 div