python - Pandas 在哪里过滤将数据类型重置为默认值？

我有一个列数据类型明确设置为“int32”的数据框。当我使用括号运算符进行过滤时，数据类型不会改变。

scripts[scripts['Security Id'] == 'ABB']['Security Code'].head()

0 500002 Name: Security Code, dtype: int32

但是，当我使用 where 过滤时，数据类型会重置回默认值，即 float64。

(scripts.where(scripts['Security Id'] == 'ABB')
       .dropna())['Security Code'].head()

数据类型更改回“float64”

0 500002.0 Name: Security Code, dtype: float64

所以我只是想知道为什么会出现这种情况，特别是因为操作符链接是 pandas 中的惯用方式。

最佳答案

第二种情况下dtype的变化是由于numpy没有NaN的整数表示的结果。因此，如果数字列中有 NaN，则 dtype 将转换为 float。

在第一种情况下:

scripts[scripts['Security Id'] == 'ABB']['Security Code'].head()

您只是根据条件返回脚本 DataFrame 的子集。由于基础 DataFrame 类型为 int32，因此子集将具有相同的 dtype。

但是，在第二种情况下，DataFrame.where 返回一个对象，在其中传递条件为 True 的行中的值，但将该值替换为 >np.NaN 否则。因此，您要修改 DataFrame 并引入 NaN 值，这会强制 pandas 将列转换为 float64。

例如:

import pandas as pd
scripts = pd.DataFrame({'Security Id': ['ABB', 'ABB', 'ABC', 'ABB'],
                        'Security Code': [1, 2, 3, 4]})
scripts['Security Code'] = scripts['Security Code'].astype('int32')

scripts.where(scripts['Security Id'] == 'ABB')

   Security Code Security Id
0            1.0         ABB
1            2.0         ABB
2            NaN         NaN
3            4.0         ABB
Security Code    float64
Security Id       object
dtype: object

关于python - Pandas 在哪里过滤将数据类型重置为默认值？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46007352/

python - Pandas 在哪里过滤将数据类型重置为默认值？

上一篇：python - 使用 pandas 写入数据帧上的特定行

下一篇：python ssh 密码提示