python - pd.DataFrame.set_index 可以维护数据类型吗？

我正在尝试调用 df.set_index，使我设置索引的列的 dtype 是新的 index.dtype。不幸的是，在下面的示例中，set_index 更改了 dtype。

df = pd.DataFrame({'a': pd.Series(np.array([-1, 0, 1, 2], dtype=np.int8))})
df['ignore'] = df['a']
assert (df.dtypes == np.int8).all() # fine
df2=  df.set_index('a')
assert df2.index.dtype == df['a'].dtype, df2.index.dtype

是否可以避免这种行为？我的 pandas 版本是 0.23.3

同样，

new_idx = pd.Index(np.array([-1, 0, 1, 2]), dtype=np.dtype('int8'))
assert new_idx.dtype == np.dtype('int64')

尽管 dtype 参数的文档说:“如果提供了实际的 dtype，我们会在安全的情况下强制使用该 dtype。否则，将引发错误。”

最佳答案

尽管我在上面的评论中夸大其词，但这可能足以获得一个适当的索引，该索引既低内存又从 -1 开始。

`pandas.RangeIndex`

采用开始和停止参数，如 range

df = df.set_index(pd.RangeIndex(-1, len(df) - 1))

print(df.index, df.index.dtype, sep='\n')

这应该非常节省内存。

尽管它仍然是 dtype int64(您应该想要的)，但它占用的内存非常少。

pd.RangeIndex(-1, 4000000).memory_usage()

84

和

for i in range(1, 1000000, 100000):
  print(pd.RangeIndex(-1, i).memory_usage())

84
84
84
84
84
84
84
84
84
84

关于python - pd.DataFrame.set_index 可以维护数据类型吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52249639/

上一篇：python - Django try/except on DoesNotExist 仍然抛出它

下一篇：python - 可调用前缀 : takes 0 positional arguments but 2 were given 中的 Discord.py 错误

相关文章：

python - python 中的握手失败(_ssl.c :590)

python - 如何从 Python 中的集合列表创建集合？

python - 如何使用 python/pandas 根据一列中的字符串拆分和复制行？

python - 错误 : float object has no attribute notnull

python - 将excel中的特定列读取到数据框

python - 如何使用 'loc' 在 dask 中选择数据帧的列

python - 使用正则表达式排除字符串搜索中的字符？

python - Pandas DataFrame 滚动计数

python - 将文件夹的文件夹中的文件重命名为其父文件夹？

python - 优化matplotlib pyplot : plotting for many small plots