python - 如果列表中包含相应属性的另一个数据帧的值，如何添加列检查？

df1 看起来像这样:

  attribute_1 attribute_2
0           A           Y
1           A           Z
2           B           Y
3           B           Z

df1 = pd.DataFrame({'attribute_1': ['A', 'A', 'B', 'B'],
                   'attribute_2': ['Y', 'Z', 'Y', 'Z']})

和 df2 更大，具有多行相同的属性值，也有许多与 df1 不同的列:

  attribute_1 attribute_2   fruit
0           A           Y   apple
1           A           Y  banana
2           A           Z   melon
3           B           Z  orange
4           B           Z   grape
5           B           Y    pear
6           B           Z  orange

df2 = pd.DataFrame({'attribute_1': ['A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'attribute_2': ['Y', 'Y', 'Z', 'Z', 'Z', 'Y', 'Z'],
                'fruit': ['apple', 'banana', 'melon', 'orange', 'grape', 'pear', 'orange']})

我想向 df1 添加一列，以检查 df2.fruit 中相应属性的值是否在 ['apple', 'orange']，创建 desired_df:

  attribute_1 attribute_2  has_apple_or_orange
0           A           Y                 True
1           A           Z                False
2           B           Y                False
3           B           Z                 True

desired_df = pd.DataFrame({'attribute_1': ['A', 'A', 'B', 'B'],
                   'attribute_2': ['Y', 'Z', 'Y', 'Z'],
                          'has_apple_or_orange': [True, False, False, True]})

我该怎么做？与merge不知何故？

不确定如何描述这个问题，如果这个问题已经在其他地方得到回答，请原谅我。

最佳答案

首先比较 Series.isin 的值到新列DataFrame.assign ，然后按 GroupBy.any 聚合并通过 DataFrame.join 将新列添加到第二个 DataFrame :

f = ['apple', 'orange']
s = (df2.assign(has_apple_or_orange = df2['fruit'].isin(f))
        .groupby(['attribute_1','attribute_2'])['has_apple_or_orange']
        .any())
print (s)

attribute_1  attribute_2
A            Y               True
             Z              False
B            Y              False
             Z               True
Name: has_apple_or_orange, dtype: bool

df = df1.join(s, on=['attribute_1','attribute_2'])
print (df)
  attribute_1 attribute_2  has_apple_or_orange
0           A           Y                 True
1           A           Z                False
2           B           Y                False
3           B           Z                 True

关于python - 如果列表中包含相应属性的另一个数据帧的值，如何添加列检查？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/62189162/

上一篇：mysql - 根据我从中选择的表设置 View 的列值

下一篇：python - 有什么方法可以在字符串文字中使用变量值吗？

python - 正则表达式匹配但没有评论

python - Boost.Python.ArgumentError:World.set(World, str) 中的 Python 参数类型与 C++ 签名不匹配:set(World {lvalue}, std::string)

python - 为什么当我 append 到元组内部的列表时，元组的内容会发生变化，但当我更新变量时却不会发生变化？

python - Excel 到 pandas 到 numpy 数组的转换

python - 在 Pandas 中，当每一行只有一个非 NaN 值时，将多个分类列合并为一个

R将具有多个值的字符串的行中的隐蔽值转换为数据框中的列

python - Django 数据库查询中的复杂排序

python - 将 SQL 结果从自连接转换为方形 pandas 数据框

python - 绘制时间戳数据帧的图表