python - 使用 numpy 向量化检查 Pandas 列中的引用列表

我有一份引用 list

ref = ['September', 'August', 'July', 'June', 'May', 'April', 'March']

和一个数据框

df = pd.DataFrame({'Month_List': [['July'], ['August'], ['July', 'June'], ['May', 'April', 'March']]})
df
    Month_List
0   [July]
1   [August]
2   [July, June]
3   [May, April, March]

我想检查引用列表中的哪些元素存在于每一行中，并将其转换为二进制列表

我可以使用 apply 来实现这一点

def convert_month_to_binary(ref,lst):
    s = pd.Series(ref)
    return s.isin(lst).astype(int).tolist()  

df['Binary_Month_List'] = df['Month_List'].apply(lambda x: convert_month_to_binary(ref, x))
df

    Month_List          Binary_Month_List
0   [July]              [0, 0, 1, 0, 0, 0, 0]
1   [August]            [0, 1, 0, 0, 0, 0, 0]
2   [July, June]        [0, 0, 1, 1, 0, 0, 0]
3   [May, April, March] [0, 0, 0, 0, 1, 1, 1]

但是，使用 apply在大型数据集上非常慢，因此我希望使用 numpy 向量化。我怎样才能提高我的表现？

分机 :

我想用 numpy vectorization因为我现在需要在这个列表上应用另一个函数

我正在尝试这样，但性能很慢。与 apply 类似的结果

def count_one(lst):
    index = [i for i, e in enumerate(lst) if e != 0] 
    return len(index)

vfunc = np.vectorize(count_one)
df['Value'] = vfunc(df['Binary_Month_List'])

最佳答案

我们可以使用 explode与 get_dummies , 通知 explode 0.25后可用

df.Month_List.explode().str.get_dummies().sum(level=0).reindex(columns=ref, fill_value=0).values.tolist()
Out[79]: 
[[0, 0, 1, 0, 0, 0, 0],
 [0, 1, 0, 0, 0, 0, 0],
 [0, 0, 1, 1, 0, 0, 0],
 [0, 0, 0, 0, 1, 1, 1]]

#df['new']=df.Month_List.explode().str.get_dummies().sum(level=0).reindex(columns=ref, fill_value=0).values.tolist()

关于python - 使用 numpy 向量化检查 Pandas 列中的引用列表，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58136267/

python - 使用 numpy 向量化检查 Pandas 列中的引用列表

上一篇：c - 试图对一个函数进行逆向工程

下一篇：java - 从 Vaadin TextField 取消编辑/移除焦点