python - 使用 Pycurl 获取 HTML

我一直在尝试使用 pycurl 检索 HTML 页面，因此我可以使用 str.split 和一些 for 循环来解析它以获取相关信息。我知道 Pycurl 检索 HTML，因为它将它打印到终端，但是，如果我尝试做类似的事情

html = str(c.perform())

该变量将只包含一个表示“None”的字符串。

我如何使用 pycurl 获取 html，或重定向它发送到控制台的任何内容，以便它可以用作如上所述的字符串？

非常感谢任何有任何建议的人!

最佳答案

这将发送请求并存储/打印响应正文:

from StringIO import StringIO    
import pycurl

url = 'http://www.google.com/'

storage = StringIO()
c = pycurl.Curl()
c.setopt(c.URL, url)
c.setopt(c.WRITEFUNCTION, storage.write)
c.perform()
c.close()
content = storage.getvalue()
print content

如果要存储响应 header ，请使用:

c.setopt(c.HEADERFUNCTION, storage.write)

关于python - 使用 Pycurl 获取 HTML，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/6554386/

上一篇：python - 如何在谓词第一个为 False 的地方将列表一分为二

下一篇：python - 使用 Python 计算文档中唯一单词的数量

相关文章：

python - 迭代/循环搁置

python - 从 Visual Studio Code 调试 Python - 导入 Numpy

Python 3.x tkinter 导入错误

python - 如何停止 python Tkinter Entry Widget 在网格中拉伸(stretch)

python - 一个包中的多个模块导入一个公共(public)模块

python-requests 钩子(Hook)类似于 pycurl.WRITEFUNCTION？

python - 如何安装 pyCurl？

python - PycURL RESUME_FROM

python - PYCURL获取json文件存在utf-8编码问题