python - Pandas 更改索引数据类型

我有一个系列normal_row，其索引值为:

Int64Index([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,
            ...
            910, 911, 912, 913, 914, 915, 916, 917, 918, 919],
           dtype='int64', length=919)

我有一个数据框resultp

resultp.index

Int64Index([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,
            ...
            910, 911, 912, 913, 914, 915, 916, 917, 918, 919],
           dtype='int64', length=919)

但是

resultp.loc[14].index

Index([u'1', u'2', u'3', u'4', u'5', u'6', u'7', u'8', u'9', u'10',
       ...
       u'910', u'911', u'912', u'913', u'914', u'915', u'916', u'917', u'918',
       u'919'],
      dtype='object', length=919)

这会产生问题，因为

resultp.mul(normal_row, axis = 1)

返回一个充满“NaN”值的数据框。数据框的形状也从 (919,919) 变为 (919,1838)

这似乎是因为索引类型在操作过程中发生了变化。这怎么能解决？以及为什么 pandas 不断更改索引类型，索引类型不应该与原始索引保持相同吗？

最佳答案

resultp.loc[14].index 是字符串。当您调用返回索引值为 14 的行的 loc[14] 时。这最终成为一个系列对象，其索引等于 resultp

的列

Index([u'1', u'2', u'3', u'4', u'5', u'6', u'7', u'8', u'9', u'10',
       ...
       u'910', u'911', u'912', u'913', u'914', u'915', u'916', u'917', u'918',
       u'919'],
      dtype='object', length=919)

这表示列是字符串。

考虑以下对象

idx = pd.RangeIndex(0, 5)
col = idx.astype(str)
resultp = pd.DataFrame(np.random.rand(5, 5), idx, col)
normal_row = pd.Series(np.random.rand(5), resultp.index)

请注意，col 看起来与 idx 相同，但类型为 str

print(resultp)

          0         1         2         3         4
0  0.242878  0.995860  0.486782  0.601954  0.500455
1  0.015091  0.173417  0.508923  0.152233  0.673011
2  0.022210  0.842158  0.302539  0.408297  0.983856
3  0.978881  0.760028  0.254995  0.610134  0.247800
4  0.233714  0.401079  0.984682  0.354219  0.816966

print(normal_row)

0    0.778379
1    0.019352
2    0.583937
3    0.227633
4    0.646096
dtype: float64

因为 resultp.columns 是字符串，所以这个乘法返回为 NaNs

resultp.mul(normal_row, axis=1)

    0   1   2   3   4   0   1   2   3   4
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

您需要将 resultp.columns 转换为 int

resultp.columns = resultp.columns.astype(int)

然后相乘

resultp.mul(normal_row, axis=1)

          0         1         2         3         4
0  0.305954  0.079327  0.351183  0.588635  0.209578
1  0.136023  0.152232  0.443796  0.493444  0.678651
2  0.411359  0.267142  0.202791  0.327760  0.307422
3  0.399191  0.225889  0.130076  0.147862  0.038032
4  0.039647  0.058929  0.358210  0.684927  0.180250

关于python - Pandas 更改索引数据类型，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41729016/

python - Pandas 更改索引数据类型

上一篇：python - 将函数放置在不依赖于状态的类之外

下一篇：python - 将祖鲁时间字符串转换为 MST 日期时间对象