html - 将 io.BytesIO 转换为 io.StringIO 来解析 HTML 页面

标签 html beautifulsoup pycurl stringio type-conversion

我正在尝试解析通过 pyCurl 检索到的 HTML 页面，但 pyCurl WRITEFUNCTION 将页面作为 BYTES 而不是字符串返回，因此我无法使用 BeautifulSoup 解析它。

有什么方法可以将 io.BytesIO 转换为 io.StringIO 吗？

或者有没有其他方法解析HTML页面？

我正在使用 Python 3.3.2。

最佳答案

接受的答案中的代码实际上完全从流中读取以进行解码。下面是正确的方法，将一个流转换为另一个流，其中可以逐 block 读取数据。

# Initialize a read buffer
input = io.BytesIO(
    b'Inital value for read buffer with unicode characters ' +
    'ÁÇÊ'.encode('utf-8')
)
wrapper = io.TextIOWrapper(input, encoding='utf-8')

# Read from the buffer
print(wrapper.read())

关于html - 将 io.BytesIO 转换为 io.StringIO 来解析 HTML 页面，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/24566630/

上一篇：javascript - 仅在选择特定选项时显示输入字段

下一篇：html - 如何使用 Bootstrap 使面板标题可点击？

相关文章：

javascript - 限制用户在文本区域中粘贴最多 (40) 个字符

python - 迭代大量 xml 文档

python - 如何从标签之间提取文本？

python - 构建 docker 镜像时出错，出现错误 : Failed to build wheel for pycurl(setup. py)

python - pycurl 和 unescape

python - 通过限制字节读取网站的部分内容

css - 子 float Div 脱离父 Div

html - CSS:悬停特定元素时更改类 "x"的所有元素

php - 以数字开头的 ID 总是不好的做法吗？ (CSS)

python - BeautifulSoup ，get_text 但不是 <span> 文本..我怎样才能得到它？