我正在尝试将 URL 的内容保存到文本文件中。我在网上找到了几个示例脚本来执行此操作,下面的两个似乎可以帮助我做我想做的事情,但都返回此错误:
TypeError: a bytes-like object is required, not 'str'
import html2text
import urllib.request
with urllib.request.urlopen("http://www.msnbc.com") as r:
html_content = r.read()
rendered_content = html2text.html2text(html_content)
file = open('C:\\Users\\Excel\\Desktop\\URL.txt', 'w')
file.write(rendered_content)
file.close()
import sys
if sys.version_info[0] == 3:
from urllib.request import urlopen
else:
# Not Python 3 - today, it is most likely to be Python 2
# But note that this might need an update when Python 4
# might be around one day
from urllib import urlopen
# Your code where you can use urlopen
with urlopen("http://www.msnbc.com") as r:
s = r.read()
rendered_content = html2text.html2text(html_content)
file = open('C:\\Users\\Excel\\Desktop\\URL.txt', 'w')
file.write(rendered_content)
file.close()
我可能在这里遗漏了一些简单的东西,但我不知道它是什么。
我正在使用 Python 3.6。
最佳答案
您需要将方法 decode('utf-8') 添加到您的文本中:
with urlopen("http://www.msnbc.com") as r:
s = r.read().decode('utf-8')
变量s包含一串字节,需要解码。 错误原因是unicode字符串和字节的区分问题:
Python 3's standard string type is Unicode based, and Python 3 adds a dedicated bytes type, but critically, no automatic coercion between bytes and unicode strings is provided. The closest the language gets to implicit coercion are a few text-based APIs that assume a default encoding (usually UTF-8) if no encoding is explicitly stated. Thus, the core interpreter, its I/O libraries, module names, etc. are clear in their distinction between unicode strings and bytes. Python 3's unicode support even extends to the filesystem, so that non-ASCII file names are natively supported.
This string/bytes clarity is often a source of difficulty in transitioning existing code to Python 3, because many third party libraries and applications are themselves ambiguous in this distinction. Once migrated though, most UnicodeErrors can be eliminated.
来源:https://www.python.org/dev/peps/pep-0404/#strings-and-bytes
关于python - 将 URL 的内容保存到文本文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48512988/