python - Pandas:基于多个其他列创建一个列。申请失败()

我有一个包含多列的数据框。我想为每一行分配一个优先级。该优先级将根据其他列中的数据给出。

我定义了一个优先级函数

def priority(Bcat,Brand,IPC,Customer, Type):
    p=1
    if Bcat != "*":
        p+= len(Bcat)/3
    if Brand != "*":
        p+= 2
    if IPC != "*":
        p+= 4
    if Customer != "*" & Customer != "REPLCUST":
        p+= 8
    if Type == "Default":
        p+= -16
    return p

现在我想将它应用于我的数据框。

这是我的数据框的样子(2500 行):

Bcat Brand Customer   IPC   LOC MKT_BUD      Type   STARTEFF    Value
A    B     C          D      E   F            1     2001-01-01    1.0

我正在尝试这个，但它不起作用

df["Priority"] = df[["Bcat","Brand","IPC","Customer","Type"]].apply(priority,axis=1,args=("Bcat","Brand","IPC","Customer","Type"))

我收到这条消息

TypeError: ('priority() takes 5 positional arguments but 6 were given', 'occurred at index 0')

也试过了

df["Priority"] = np.vectorize(priority(df.Bcat,df.Brand,df.IPC,df.Customer,df.Type))

收到这条消息

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

最佳答案

如果你想在你的数据框上使用应用程序，你可能需要一个 lambda 函数:

def priority(Bcat,Brand,IPC,Customer, Type):
    p=1
    if Bcat != "*":
        p+= len(Bcat)/3
    if Brand != "*":
        p+= 2
    if IPC != "*":
        p+= 4
    if (Customer != "*") & (Customer != "REPLCUST"): # Here you need brackets
        p+= 8
    if Type == "Default":
        p+= -16
    return p

df= pd.DataFrame([['A','B','C','D','E','F','1','2001-01-01','1.0']],\
     columns = ['Bcat','Brand','Customer','IPC','LOC','MKT_BUD','Type','STARTEFF','Value'])

df.apply(lambda x: priority(x.Bcat,x.Brand,x.IPC,x.Customer,x.Type),axis = 1)

0    15.333333
dtype: float64

这将适用于数据帧，因此它可能不是最优的，因为它遍历行以访问 df.BCat 中字符串的长度。我会寻找更有效的方法。

编辑:

否则，您可以使用 str.len 来执行按列操作:

df['priority'] = 1
mask = df.Bcat != "*"
df.loc[mask,'priority'] += df.loc[mask,'Bcat'].str.len()/3
df.loc[df.Brand != "*",'priority'] += 2
df.loc[df.IPC != "*",'priority'] += 4
df.loc[~df.Customer.isin(['*','REPLCUST']),'priority'] += 8
df.loc[df.Type == "Default",'priority'] -= 16

    Bcat    Brand   Customer    IPC LOC MKT_BUD Type  STARTEFF    Value priority
0   A       B       C           D   E   F       1     2001-01-01  1.0   15.333333

当您处理 Series 而不是遍历行时，这会更快。

关于python - Pandas:基于多个其他列创建一个列。申请失败()，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45843618/

python - Pandas:基于多个其他列创建一个列。申请失败()

上一篇：python - 了解 Python 中的闭包作用域

下一篇：python - Django - 在 CreateView 中更改发布数据