python - 存在相同名称时使用 iloc 替换列

假设我有以下具有相同列名的 DataFrame

test = pd.DataFrame([[1, 2, 3, np.nan, np.nan],
                     [1, 2, 3,      4,      5],
                     [1, 2, 3, np.nan, np.nan],
                     [1, 2, 3,      4, np.nan]],
                    columns=['One', 'Two', 'Three', 'Three', 'Three'])

并且我想在第四列中填充 NaN。我希望能够像 iloc 一样使用

test.iloc[:, 3] = test.iloc[:, 3].fillna('F')

但这给出了

In [121]: test
Out[121]:
   One  Two Three Three Three
0    1    2     F     F     F
1    1    2     4     4     4
2    1    2     F     F     F
3    1    2     4     4     4

因此它根据列名而不是位置进行更改。我可以像下面这样非常天真地做到这一点。

c = test.columns
test.columns = range(len(test.columns))
test.iloc[:, 3] = test.iloc[:, 3].fillna('F')
test.columns = c

给出正确的结果

In [142]: test
Out[142]:
   One  Two  Three  Three  Three
0    1    2      3      F    NaN
1    1    2      3      4    5.0
2    1    2      3      F    NaN
3    1    2      3      4    NaN

但考虑到这个简单的任务，效率似乎有点低。

那么我的问题是双重的。

有没有更直接的方法？
为什么第一个不起作用？ (为什么 iloc 在替换列时仍然使用名称？)

最佳答案

关于为什么第一种技术不起作用的第二个问题的答案可能是因为 Pandas 处理重复列的方式。虽然 DataFrame 的构造函数对此没有任何设置，但 read_csv documentation有一个默认值为 True 的参数 mangle_dupe_cols。文档说传入 False 可能会导致数据覆盖。我怀疑 Pandas 以可疑的方式处理重复的列。

关于python - 存在相同名称时使用 iloc 替换列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44701507/

python - 存在相同名称时使用 iloc 替换列

上一篇：Python 2.7 匹配 CSV 文件行中的精确字符串

下一篇：python - IBM Bluemix Spark : Supplying python dependencies to spark-submit. sh