python - 将 URL 的内容保存到文本文件

标签 python python-3.x

我正在尝试将 URL 的内容保存到文本文件中。我在网上找到了几个示例脚本来执行此操作,下面的两个似乎可以帮助我做我想做的事情,但都返回此错误:

TypeError: a bytes-like object is required, not 'str'

import html2text
import urllib.request

with urllib.request.urlopen("http://www.msnbc.com") as r:
    html_content = r.read()
rendered_content = html2text.html2text(html_content)
file = open('C:\\Users\\Excel\\Desktop\\URL.txt', 'w')
file.write(rendered_content)
file.close()



import sys
if sys.version_info[0] == 3:
    from urllib.request import urlopen
else:
    # Not Python 3 - today, it is most likely to be Python 2
    # But note that this might need an update when Python 4
    # might be around one day
    from urllib import urlopen
# Your code where you can use urlopen
with urlopen("http://www.msnbc.com") as r:
    s = r.read()
rendered_content = html2text.html2text(html_content)
file = open('C:\\Users\\Excel\\Desktop\\URL.txt', 'w')
file.write(rendered_content)
file.close()

我可能在这里遗漏了一些简单的东西,但我不知道它是什么。

我正在使用 Python 3.6。

最佳答案

您需要将方法 decode('utf-8') 添加到您的文本中:

with urlopen("http://www.msnbc.com") as r:
    s = r.read().decode('utf-8')

变量s包含一串字节,需要解码。 错误原因是unicode字符串和字节的区分问题:

Python 3's standard string type is Unicode based, and Python 3 adds a dedicated bytes type, but critically, no automatic coercion between bytes and unicode strings is provided. The closest the language gets to implicit coercion are a few text-based APIs that assume a default encoding (usually UTF-8) if no encoding is explicitly stated. Thus, the core interpreter, its I/O libraries, module names, etc. are clear in their distinction between unicode strings and bytes. Python 3's unicode support even extends to the filesystem, so that non-ASCII file names are natively supported.

This string/bytes clarity is often a source of difficulty in transitioning existing code to Python 3, because many third party libraries and applications are themselves ambiguous in this distinction. Once migrated though, most UnicodeErrors can be eliminated.

来源:https://www.python.org/dev/peps/pep-0404/#strings-and-bytes

关于python - 将 URL 的内容保存到文本文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48512988/

相关文章:

python - Django 1.6 - 消息未显示

python - Python中SQL-IN的使用

python - 使用 Bottle 文档中的示例,我仍然得到 query_string 的空值

python - 如何在python中为shell命令使用变量值?

python - 如何让随机算法更有效率

Python:如何将我获得的随机输出量限制为特定数量?

javascript - 如何将 javascript 或 css 文件加载到 BottlePy 模板中?

python - 为什么我的 lambda 不起作用?

python - 密码哈希和验证

python-3.x - python并行读取csv文件并连接数据框