I have 3 columns in the dataframe. object, id and price. I want fill the blanks by reading the id column and discover which price should I use. For exemple: If the id ends in (A,B or C) the price should be 30 but if it's end (7A,7B or 7C) the price should be 50, If the id ends in (E,F or G) the price should be 20, If the id ends in (O,M or N), the price should be 10.
Here is the dataframe:
object id price
0 laptop 24A 30
1 laptop 37C NaN
2 laptop 21O NaN
3 laptop 17C 50
4 laptop 55A 30
5 laptop 34N NaN
6 laptop 05E 20
7 laptop 29B NaN
8 laptop 22M 10
9 laptop 62F NaN
10 laptop 23G 20
11 laptop 61O NaN
12 laptop 27A NaN
Expected output:
object id price
0 laptop 24A 30
1 laptop 37C 50
2 laptop 21O 10
3 laptop 17C 50
4 laptop 55A 30
5 laptop 34N 10
6 laptop 05E 20
7 laptop 29B 30
8 laptop 22M 10
9 laptop 62F 20
10 laptop 23G 20
11 laptop 61O 10
12 laptop 27A 50
最佳答案
您可以使用np.select
与 str.contains
条件:
conditions = {
30: df.id.str.contains('[^7][ABC]$'),
50: df.id.str.contains('7[ABC]$'),
20: df.id.str.contains('[EFG]$'),
10: df.id.str.contains('[OMN]$'),
}
df.price = np.select(conditions.values(), conditions.keys())
# object id price
# 0 laptop 24A 30
# 1 laptop 37C 50
# 2 laptop 21O 10
# 3 laptop 17C 50
# 4 laptop 55A 30
# 5 laptop 34N 10
# 6 laptop 05E 20
# 7 laptop 29B 30
# 8 laptop 22M 10
# 9 laptop 62F 20
# 10 laptop 23G 20
# 11 laptop 61O 10
# 12 laptop 27A 50
如果您想使用fillna
,您也可以使用loc
掩码。 :
for price, condition in conditions.items():
df.loc[condition, 'price'] = df.loc[condition, 'price'].fillna(price)
更新 1
如果想通过df.object
进一步限制,可以用&
添加df.object
条件:
conditions = {
30: df.object.eq('laptop') & df.id.str.contains('[^7][ABC]$'),
50: df.object.eq('laptop') & df.id.str.contains('7[ABC]$'),
20: df.object.eq('laptop') & df.id.str.contains('[EFG]$'),
10: df.object.eq('laptop') & df.id.str.contains('[OMN]$'),
1000: df.object.eq('phone') & df.id.str.contains('[OMN]$'),
}
更新2
如果你确实想使用某个函数,可以apply
沿着行(axis=1
),但是 row-apply 速度要慢得多,当您有像 np.select
这样的矢量化选项时,不建议这样做:
def price(row):
result = np.nan
if row.object == 'laptop':
if row.id[-2:] in ['7A', '7B', '7C']:
result = 50
elif row.id[-1] in list('ABC'):
result = 30
elif row.id[-1] in list('EFG'):
result = 20
elif row.id[-1] in list('OMN'):
result = 10
elif row.object == 'phone':
if row.id[-2:] in ['7A', '7B', '7C']:
result = 5000
...
return result
df.price = df.apply(price, axis=1)
关于python - 通过使用函数关联多个列来填充,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67953949/