我有一个数据框
data_in = {'A':['A1', '', '', 'A4',''],
'B':['', 'B2', 'B3', '',''],
'C':['C1','C2','','','C5']}
df_in = pd.DataFrame(data)
print(df_in)
A B C
0 A1 C1
1 B2 C2
2 B3
3 A4
4 C5
如果 C 列不为空且 A 或 B 不为空,我将尝试替换 A 或 B 列。 替换后,我需要清除C列中的值。
我期望这个输出
A B C
0 C1
1 C2
2 B3
3 A4
4 C5
我尝试了几种方法,最接近的是
df_in['A'] = np.where(
(df_in['A'] !='') & (df_in['C'] != '') , df_in['A'], df_in['C']
)
df_in['B'] = np.where(
(df_in['B'] !='') & (df_in['C'] != '') , df_in['B'], df_in['C']
)
但这也清楚了其他值,我失去了 A4 和 B3,并且我不清除 C1 和 C2
我得到了什么
A B C
0 C1 C1
1 C2 C2
2
3
4 C5
谢谢
最佳答案
你已经非常接近了,但是你在np.where
中切换了参数,语法是np.where(cond, if_cond_True, if_cond_False)
。如果满足条件 (if_cond_True
),列 A 和 B 应该具有列的值,否则它们将保留其原始值 (if_cond_False
)。
import pandas as pd
import numpy as np
data_in = {'A':['A1', '', '', 'A4',''],
'B':['', 'B2', 'B3', '',''],
'C':['C1','C2','','','C5']}
df_in = pd.DataFrame(data_in)
maskA = df_in['A'] != '' # A not empty
maskB = df_in['B'] != '' # B not empty
maskC = df_in['C'] != '' # C not empty
# If the column havs NaNs instead of '' then use :
#
# maskA = df_in['A'].notnull() # A not empty
# maskB = df_in['B'].notnull() # B not empty
# maskC = df_in['C'].notnull() # C not empty
# If A and C are not empty, A = C, else A keep its value
df_in['A'] = np.where(maskA & maskC, df_in['C'], df_in['A'])
# If B and C are not empty, B = C, else B keep its value
df_in['B'] = np.where(maskB & maskC, df_in['C'], df_in['B'])
# If (A and C are not empty) or (B and C are not empty),
# C should be empty, else C keep its value
df_in['C'] = np.where((maskA & maskC) | (maskB & maskC), "", df_in['C'])
输出
>>> df_in
A B C
0 C1
1 C2
2 B3
3 A4
4 C5
关于Python Pandas 根据多列条件替换值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69950220/