我正在尝试抓取这个 afghanistan page通过提取 table
中的城市和区号.现在,当我尝试抓取这个 american-samoa page , findAll()
找不到 <td>
这是真的。如何捕获这个异常?
这是我的代码:
from bs4 import BeautifulSoup
import urllib2
import re
url = "http://www.howtocallabroad.com/american-samoa"
html_page = urllib2.urlopen(url)
soup = BeautifulSoup(html_page)
areatable = soup.find('table',{'id':'codes'})
d = {}
def chunks(l, n):
return [l[i:i+n] for i in range(0, len(l), n)]
li = dict(chunks([i.text for i in areatable.findAll('td')], 2))
if li != []:
print li
for key in li:
print key, ":", li[key]
else:
print "list is empty"
这是我得到的错误
Traceback (most recent call last):
File "extract_table.py", line 15, in <module>
li = dict(chunks([i.text for i in areatable.findAll('td')], 2))
AttributeError: 'NoneType' object has no attribute 'findAll'
这个我也试过,但是不行
def gettdtag(tag):
return "empty" if areatable.findAll(tag) is None else tag
all_td = gettdtag('td')
print all_td
最佳答案
错误表明 areatable
是 None
:
areatable = soup.find('table',{'id':'codes'})
#areatable = soup.find('table', id='codes') # Also works
if areatable is None:
print 'Something happened'
# Exit out
另外,我会使用 find_all
而不是 findAll
和 get_text()
而不是 text
。
关于python - 在 BeautifulSoup.findAll 函数中捕获异常,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16954693/