python - 根据字符串列过滤分组数据框中的行

我有一个按多列分组的数据框，但在本例中，它将仅按 Year 分组。 .

   Year Animal1  Animal2
0  2002    Dog   Mouse,Lion
1  2002  Mouse            
2  2002   Lion            
3  2002   Duck            
4  2010    Dog   Cat
5  2010    Cat            
6  2010   Lion            
7  2010  Mouse

我想要每个组，从 Animal2 的行开始为空以过滤掉 Animal2 所在的行未出现在 Animal1 列中.

预期输出为:

  Year Animal1   Animal2
0  2002    Dog   Mouse,Lion
1  2002  Mouse            
2  2002   Lion                   
3  2010    Dog   Cat
4  2010    Cat

第 0 行和第 3 行自 Animal2 起一直保留不为空。

由于老鼠和狮子位于 Animal2，因此第 1 行和第 2 行保持不变对于第一组。

自从猫出现在 Animal2 中后，第 4 行就保留了第二组

编辑:我收到类似输入数据帧的错误

  Year Animal1   Animal2
0  2002    Dog   Mouse
1  2002  Mouse            
2  2002   Lion                   
3  2010    Dog   
4  2010    Cat

预期输出为:

  Year Animal1   Animal2
0  2002    Dog   Mouse
1  2002  Mouse

错误在 .apply(lambda g: g.isin(sets[g.name])) 中触发部分代码。

  if not any(isinstance(k, slice) for k in key):
    
                if len(key) == self.nlevels and self.is_unique:
                    # Complete key in unique index -> standard get_loc
                    try:
                        return (self._engine.get_loc(key), None)
                    except KeyError as err:
                       raise KeyError(key) from err
                         KeyError: (2010, 'Dog')

最佳答案

您可以使用掩码和正则表达式:

# non empty Animal2
m1 = df['Animal2'].notna()

# make patterns with those Animals2 per Year
patterns = df[m1].groupby('Year')['Animal2'].agg('|'.join).str.replace(',', '|')

# for each Year select with the matching regex
m2 = (df.groupby('Year', group_keys=False)['Animal1']
        .apply(lambda g: g.str.fullmatch(patterns[g.name]))
     )

out = df.loc[m1|m2]

或设置:

m1 = df['Animal2'].notna()

sets = (df.loc[m1, 'Animal2'].str.split(',')
          .groupby(df['Year'])
          .agg(lambda x: set().union(*x))
       )

m2 = (df.groupby('Year', group_keys=False)['Animal1']
        .apply(lambda g: g.isin(sets[g.name]))
     )

out = df.loc[m1|m2]

输出:

   Year Animal1     Animal2
0  2002     Dog  Mouse,Lion
1  2002   Mouse        None
2  2002    Lion        None
4  2010     Dog         Cat
5  2010     Cat        None

关于python - 根据字符串列过滤分组数据框中的行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/75086562/

python - 根据字符串列过滤分组数据框中的行

上一篇：python - 根据条件连接两个 Pandas 列

下一篇：android - 如何以编程方式将 Android 屏幕/应用程序镜像到屏幕？