python - 通过使用函数关联多个列来填充

I have 3 columns in the dataframe. object, id and price. I want fill the blanks by reading the id column and discover which price should I use. For exemple: If the id ends in (A,B or C) the price should be 30 but if it's end (7A,7B or 7C) the price should be 50, If the id ends in (E,F or G) the price should be 20, If the id ends in (O,M or N), the price should be 10.

Here is the dataframe:

    object    id  price
0   laptop   24A   30
1   laptop   37C   NaN
2   laptop   21O   NaN
3   laptop   17C   50
4   laptop   55A   30
5   laptop   34N   NaN
6   laptop   05E   20
7   laptop   29B   NaN
8   laptop   22M   10
9   laptop   62F   NaN
10  laptop   23G   20
11  laptop   61O   NaN
12  laptop   27A   NaN

Expected output:

    object    id  price
0   laptop   24A   30
1   laptop   37C   50
2   laptop   21O   10
3   laptop   17C   50
4   laptop   55A   30
5   laptop   34N   10
6   laptop   05E   20
7   laptop   29B   30
8   laptop   22M   10
9   laptop   62F   20
10  laptop   23G   20
11  laptop   61O   10
12  laptop   27A   50

最佳答案

您可以使用np.select与 str.contains条件:

conditions = {
    30: df.id.str.contains('[^7][ABC]$'),
    50: df.id.str.contains('7[ABC]$'),
    20: df.id.str.contains('[EFG]$'),
    10: df.id.str.contains('[OMN]$'),
}
df.price = np.select(conditions.values(), conditions.keys())

#     object   id  price
# 0   laptop  24A     30
# 1   laptop  37C     50
# 2   laptop  21O     10
# 3   laptop  17C     50
# 4   laptop  55A     30
# 5   laptop  34N     10
# 6   laptop  05E     20
# 7   laptop  29B     30
# 8   laptop  22M     10
# 9   laptop  62F     20
# 10  laptop  23G     20
# 11  laptop  61O     10
# 12  laptop  27A     50

如果您想使用fillna，您也可以使用loc掩码。 :

for price, condition in conditions.items():
    df.loc[condition, 'price'] = df.loc[condition, 'price'].fillna(price)

更新 1

如果想通过df.object进一步限制，可以用&添加df.object条件:

conditions = {
    30: df.object.eq('laptop') & df.id.str.contains('[^7][ABC]$'),
    50: df.object.eq('laptop') & df.id.str.contains('7[ABC]$'),
    20: df.object.eq('laptop') & df.id.str.contains('[EFG]$'),
    10: df.object.eq('laptop') & df.id.str.contains('[OMN]$'),
    1000: df.object.eq('phone') & df.id.str.contains('[OMN]$'),
}

更新2

如果你确实想使用某个函数，可以apply沿着行(axis=1)，但是 row-apply 速度要慢得多，当您有像 np.select 这样的矢量化选项时，不建议这样做:

def price(row):
    result = np.nan
    if row.object == 'laptop':
        if row.id[-2:] in ['7A', '7B', '7C']:
            result = 50
        elif row.id[-1] in list('ABC'):
            result = 30
        elif row.id[-1] in list('EFG'):
            result = 20
        elif row.id[-1] in list('OMN'):
            result = 10
    elif row.object == 'phone':
        if row.id[-2:] in ['7A', '7B', '7C']:
            result = 5000
        ...
    return result
df.price = df.apply(price, axis=1)

关于python - 通过使用函数关联多个列来填充，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/67953949/

python - 通过使用函数关联多个列来填充

更新 1

更新2

上一篇：r - read.csv中的动态NROW变量，如何选择所有行？

下一篇：python - 正则表达式:查找句点之后、最后一个斜杠之前的所有内容