给定数据框
import pandas as pd
df = pd.DataFrame({
'transformed': ['left', 'right', 'left', 'right'],
'left_f': [1, 2, 3, 4],
'right_f': [10, 20, 30, 40],
'left_t': [-1, -2, -3, -4],
'right_t': [-10, -20, -30, -40],
})
我想创建两个新列,根据 transformed
的内容从 left_*
或 right_*
中选择:
df['transformed_f'] = df['right_f'].where(
df['transformed'] == 'right',
df['left_f']
)
df['transformed_t'] = df['right_t'].where(
df['transformed'] == 'right',
df['left_t']
)
我得到了预期的结果
df
# transformed left_f right_f left_t right_t transformed_f transformed_t
# 0 left 1 10 -1 -10 1 -1
# 1 right 2 20 -2 -20 20 -20
# 2 left 3 30 -3 -30 3 -3
# 3 right 4 40 -4 -40 40 -40
但是,当我尝试在一个操作中执行此操作时,我得到了包含 NaN
值的意外结果
df[['transformed_f', 'transformed_t']] = df[['right_f', 'right_t']].where(
df['transformed'] == 'right',
df[['left_f', 'left_t']]
)
df
# transformed left_f right_f left_t right_t transformed_f transformed_t
# 0 left 1 10 -1 -10 NaN NaN
# 1 right 2 20 -2 -20 20.0 -20.0
# 2 left 3 30 -3 -30 NaN NaN
# 3 right 4 40 -4 -40 40.0 -40.0
有没有办法同时在多个列上使用 df.where()
?
最佳答案
你很接近,只需添加 .values
或 .to_numpy()
切片使其成为 NDarray
:
根据文档:
other : scalar, NDFrame, or callable Entries where cond is False are replaced with corresponding value from other. If other is callable, it is computed on the NDFrame and should return scalar or NDFrame. The callable must not change input NDFrame (though pandas doesn’t check it).
因此,当您直接输入数据帧的切片时,索引(列名称)不匹配,因此它不会更新 df,当您传递 .values
时,它会忽略索引并添加值。
df[['transformed_f', 'transformed_t']]=(df[['right_f', 'right_t']].
where(df['transformed'] == 'right',df[['left_f', 'left_t']].values))
print(df)
transformed left_f right_f left_t right_t transformed_f transformed_t
0 left 1 10 -1 -10 1 -1
1 right 2 20 -2 -20 20 -20
2 left 3 30 -3 -30 3 -3
3 right 4 40 -4 -40 40 -40
关于python - 在多列上使用 Pandas df.where 会产生意外的 NaN 值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56791544/