在我的实际数据集中,数据为 3500 万行 x 20 列,data2 为 4000 行 x 10 列。虽然这段代码可以工作,但需要很长时间,以至于我的系统会超时。因此,我正在寻找一种替代解决方案来运行得更快。
import pandas as pd
data = pd.DataFrame({'variable1':[1,2,3,4,0,6,7], 'variable2':[1,2,3,4,5,6,7], 'variable3':[1,200,3,4,50,6,7], 'variable4':[1,2,3,4,5,6,7]})
data2 = pd.DataFrame({'variable1':[2,0], 'variable2':[2,5], 'variable3':[200,50], 'variable4':[17,20]})
target = []
for i in range(len(data)):
for j in range(len(data2)):
if (data['variable1'].iloc[i] == data2['variable1'].iloc[j]) and (data['variable2'].iloc[i] == data2['variable2'].iloc[j]):
target.append("Yes")
else: target.append("No")
Proper output would be:
[[1,1,1,1,"No"],
[2,2,200,2,"Yes"],
[3,3,3,3,"No"],
[4,4,4,4,"No"],
[0,5,50,5,"Yes"],
[6,6,6,6,"No"],
[7,7,7,7,"No"]]
最佳答案
MultiIndex.isin
c = ['variable1', 'variable2']
data['match'] = data.set_index(c).index.isin(data2.set_index(c).index)
variable1 variable2 variable3 variable4 match
0 1 1 1 1 False
1 2 2 200 2 True
2 3 3 3 3 False
3 4 4 4 4 False
4 0 5 50 5 True
5 6 6 6 6 False
6 7 7 7 7 False
关于python - 如何根据 2 个数据框中的列匹配创建列表? Python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73140226/