python - 当某些字符串具有 UTF-8 字符时，pandas DataFrame.style.render

使用 dataframe.style 时，重音字符在 Jupyter 上正确显示:

df = pandas.DataFrame([['Madrid', 'León']], index=['Spain'], columns=['BigCity', 'SmallCity'])
df.style    
        BigCity SmallCity
Spain   Madrid  León

但是，如果我们使用 style.render() 方法获取 HTML 并将其写入文件，则重音字符编码不正确:

df.style.render()
'<style  type="text/css" >\n</style>  \n<table id="T_a3788466_eb00_11e8_8a82_88e9fe638ee6" > \n<thead>    <tr> \n        <th class="blank level0" ></th> \n        <th class="col_heading level0 col0" >BigCity</th> \n        <th class="col_heading level0 col1" >SmallCity</th> \n    </tr></thead> \n<tbody>    <tr> \n        <th id="T_a3788466_eb00_11e8_8a82_88e9fe638ee6level0_row0" class="row_heading level0 row0" >Spain</th> \n        <td id="T_a3788466_eb00_11e8_8a82_88e9fe638ee6row0_col0" class="data row0 col0" >Madrid</td> \n        <td id="T_a3788466_eb00_11e8_8a82_88e9fe638ee6row0_col1" class="data row0 col1" >León</td> \n    </tr></tbody> \n</table> '

当然那是行不通的。这是浏览器显示的内容:

如何纠正这个问题？

最佳答案

您在这里遇到的问题不完全是 HTML 或 Pandas 问题，而是字符集问题。参见 https://www.w3schools.com/html/html_charset.asp

您的“带尖音符号的拉丁文小拉丁文 o”在 UTF-8 中是 0xC3 0xB3。因此，第一个字节是 195，第二个字节是 179。在上面的链接中，195 是“带波浪线的拉丁文大写字母 A”，而 179 是“上标三”。这就是您看到 ³ 的原因。

所以 Pandas 正在生成正确的 UTF-8 HTML，但没有人告诉浏览器。您可以将 HTML 字符集显式设置为 UTF-8，也可以将 HTML 版本显式设置为 5(默认为 UTF-8，尽管可能存在特定于浏览器的问题)。

修复它的另一种方法可能是从 Pandas 获取输出并调用 .encode('ISO-8859-1') 在写入文件之前对其进行转换。这会将 ó 写为 243，这应该可以在不更改 HTML header 的情况下工作。如果您的文档包含不在 ISO-8859-1 中的字符，这肯定无法正常工作，而将其保留在 UTF-8 中将支持所有字符。

关于python - 当某些字符串具有 UTF-8 字符时，pandas DataFrame.style.render，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53358725/

python - 当某些字符串具有 UTF-8 字符时，pandas DataFrame.style.render

上一篇：python - 在 matplotlib 中创建一个离散的颜色条

下一篇：python - 在脚本中转置矩阵时出现索引错误