python - 使用BeautifulSoup

标签 python beautifulsoup attributeerror

我正在使用 BeautifulSoup 制作一个文本爬虫。但是当我运行此代码时,我收到错误代码:

Traceback (most recent call last):
  File "D:\Python27\Crawling.py", line 33, in <module>
    text = content.get_text()
AttributeError: 'NoneType' object has no attribute 'get_text'

如果您告诉我如何修复它,我将非常感激。

import urllib
from bs4 import BeautifulSoup
import xml.dom.minidom

keyWord = raw_input("Enter the key-word : ")
#Enter my Search KeyWord

address = "http://openapi.naver.com/search?key=8d4b5b7fef7a607863013302754262a3&query="                   + keyWord + "&display=5&start=1&target=kin&sort=sim"

search_result = urllib.urlopen(address)
raw_data = search_result.read()
parsed_result = xml.dom.minidom.parseString(raw_data)
links = parsed_result.getElementsByTagName('link')

source_URL = links[3].firstChild.nodeValue
#The number 3 has no meaning, it has 0 to 9 and I just chose 3
page = urllib.urlopen(source_URL).read()

#save as html file
g = open(keyWord + '.html', 'w')
g.write(page)
g.close()

#open html file
g = open(keyWord + '.html', 'r')
bs = BeautifulSoup(g)
g.close()


content = bs.find(id="end_content")
text = content.get_text()

#save as text file
h = codecs.open(keyWord + '.txt', 'w', 'utf-8')
h.write(keyWord + ' ')
h.write(text)

print "file created"

最佳答案

考虑到 @Hooked 和 @alecxe 的答案,使用 requests 执行此操作的方法如下。请注意,我将使用 handbag 关键字进行搜索查询。

import requests as rq
from bs4 import BeautifulSoup as bsoup
from xml.dom.minidom import parseString

url = "http://openapi.naver.com/search?key=8d4b5b7fef7a607863013302754262a3&query=handbag&display=100&start=1&target=kin&sort=sim"
result = rq.get(url)
parsed_result = parseString(result.content)
links = parsed_result.getElementsByTagName("link")

new_url = links[3].firstChild.nodeValue
new_result = rq.get(new_url).content

g = open("handbag.html", "w")
g.write(new_result)
g.close()

g = open("handbag.html", "r")
soup = bsoup(g)
g.close()

content = soup.find("div", class_="end_content")
text = content.get_text()

print text.encode("utf-8").strip()

.encode("utf-8")部分是处理韩文字符的输出。结果如下:

아디다스 그래픽핸드백
거의품절이던데............
어디파는데알수없을가요 ㅜ ㅜ ??!?!?
[Finished in 4.7s]

请告诉我们这是否有帮助。

关于python - 使用BeautifulSoup,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22763236/

相关文章:

python - 使用 python+beautifulSoup4 从动态图中抓取数据

python - 为什么我抓取的 Excel 文件中有打开的单元格?

python - 是否需要为他们定位的每个站点编写爬虫?

javascript - 自定义选择标签功能

python - 将带@times 的 bsxfun 转换为 numpy

python - AttributeError:计算两点之间的距离

python - AttributeError: 模块 'tensorflow' 没有属性 'python'

python - Class Attribute明明是有的,但是python找不到

python - 在进程之间共享具有文件句柄属性的对象

python - python 包的 bitbake 配方不起作用