python - 为什么我的链接提取不起作用？

我正在学习 Beautiful Soup，并尝试从页面 http://www.popsci.com 中提取所有链接...但我收到语法错误。

此代码应该可以工作，但不适用于我尝试使用的任何页面。我试图找出它不起作用的确切原因。

这是我的代码:

from BeautifulSoup import BeautifulSoup
import urllib2

url="http://www.popsci.com/"

page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())

sci=soup.findAll('a')

for eachsci in sci:
    print eachsci['href']+","+eachsci.string

...这是我收到的错误:

Traceback (most recent call last):
  File "/root/Desktop/3.py", line 12, in <module>
    print eachsci['href']+","+eachsci.string
TypeError: coercing to Unicode: need string or buffer, NoneType found
[Finished in 1.3s with exit code 1]

最佳答案

当 a 元素不包含文本时，eachsci.string 为 None - 并且您无法连接 None code> 与使用 + 运算符的字符串，正如您尝试做的那样。

如果将 eachsci.string 替换为 eachsci.text，该错误即可解决，因为 eachsci.text 包含空字符串 '' 当 a 元素为空时，并且将其与另一个字符串连接没有问题。

但是，当您点击没有 href 属性的 a 元素时，您会遇到另一个问题 - 当发生这种情况时，您将得到 KeyError .

您可以使用 dict.get() 来解决这个问题，如果键不在字典中，它能够返回默认值(a 元素假装是字典，所以这有效)。

将所有这些放在一起，以下是有效的 for 循环的变体:

for eachsci in sci:
    print eachsci.get('href', '[no href found]') + "," + eachsci.text

关于python - 为什么我的链接提取不起作用？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/18289295/

python - 为什么我的链接提取不起作用？

上一篇：python - Tarjan 算法 - Python 到 scala

下一篇：python - 如何基于IOC脚本编写YARA规则？