python - 使用 DataFrame.to_dict 时 dtype 发生变化

标签 python pandas

我的 DataFrame 中有一个 uint64 列，但是当我使用 DataFrame.to_dict('record') 将该 DataFrame 转换为 python 字典列表时，之前的内容uint64 神奇地转换为 float:

In [24]: mid['bd_id'].head()
Out[24]:
0                0
1    6957860914294
2    7219009614965
3    7602051814214
4    7916807114255
Name: bd_id, dtype: uint64

In [25]: mid.to_dict('record')[2]['bd_id']
Out[25]: 7219009614965.0

In [26]: bd = mid['bd_id']

In [27]: bd.head().to_dict()
Out[27]: {0: 0, 1: 6957860914294, 2: 7219009614965, 3: 7602051814214, 4: 7916807114255}

如何避免这种奇怪的行为？

更新

奇怪的是，如果我使用 to_dict() 而不是 to_dict('records')，bd_id 列将是 int 类型:

In [43]: mid.to_dict()['bd_id']
Out[43]:
{0: 0,
 1: 6957860914294,
 2: 7219009614965,
...

最佳答案

这是因为另一列中有一个 float 。更具体地说， to_dict('records') 是使用数据框的 values 属性而不是列本身来实现的，这实现了“隐式向上转换”，在您的情况下转换uint64 float 。

如果你想绕过这个错误，你可以明确地将你的数据帧转换为 object 数据类型:

df.astype(object).to_dict('record')[2]['bd_id']
Out[96]: 7602051814214

顺便说一句，如果您使用的是 IPython，并且想了解一个函数是如何在库中实现的，您可以通过在方法调用的末尾放置 ?? 来将其关闭。对于 pd.DataFrame.to_dict?? 我们看到

    ...
    elif orient.lower().startswith('r'):
        return [dict((k, v) for k, v in zip(self.columns, row))
                for row in self.values]

关于python - 使用 DataFrame.to_dict 时 dtype 发生变化，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31374928/

上一篇：python - 在命名元组中设置一个可选变量

下一篇：python - 如何在 PySpark groupByKey() 中对迭代器中的值求和

相关文章：

Python 类型错误(dtype ('<U51')

python - 如何从Python脚本在Ubuntu中创建可执行文件+启动器？

python - 如何使用 SQLAlchemy 根据 ID 将对象合并到 session 中，同时保持子类的正确鉴别器？

javascript - 使用 Selenium 关闭浏览器弹出窗口

python - 根据另一列中的条件填充多个数据框列

json - 解析json文件

python - 我找不到有关TypeError : object of type 'bool' has no len() error的任何解决方案

python - 如何在没有符号链接(symbolic link)的情况下使用 git 和 buidout 构建具有共享子应用程序的 python 项目

python - 从一列列表中提取值

python - 将 Excel 文件上传到 Dropbox？