python - 按特定列查找表中所有对的计数

我有一个数据框，其中包含周、商店、品牌等列。例如

week shop brand
1    1    cola
1    2    sprite
1    3    pepsi
1    4    pepsi
2    1    cola 
2    2    sprite
2    3    sprite
2    4    sprite

我想了解每周有多少商店遇到几个品牌结果表为:

week brand1  brand2  num_shops
1    cola    sprite  1
1    cola    pepsi   2
1    sprite  cola    1
1    sprite  pepsi   1  
1    pepsi   cola    2
1    pepsi   sprite  1    
2    cola    sprite  3
2    sprite  cola    3

我知道我应该这样做

def func(x):
    x1 = x.merge(x,on=["week"],suffixes =('1','2'))
    x1.groupby(["brand1","brand2"]).apply(func1)
    return x1

def func1(x):
#make count

data.groupby(["week"]).apply(func)

如果我有大量数据，我可以做得更快吗？

编辑:num_shops 列形成如下:我们需要一周的时间。看看上面所有的品牌对，我们看看有多少双是重复的。例如我们首先获取这样的表，然后获取有关 num_shops 的信息:

week brand1  brand2 
1    cola    sprite  
1    cola    pepsi
1    cola    pepsi   
1    sprite  cola    
1    sprite  pepsi     
1    pepsi   cola
1    pepsi   cola    
1    pepsi   sprite      
2    cola    sprite  
2    cola    sprite  
2    cola    sprite  
2    sprite  cola
2    sprite  cola
2    sprite  cola

最佳答案

使用merge与 DataFrame.query过滤掉两个 brand 中相同的值，然后按 DataFrame.groupby 进行计数与 GroupBy.size :

df = (df.merge(df,on=["week"], suffixes= ('1','2'))
       .query("brand1 != brand2")
       .groupby(['week','brand1','brand2'], sort=False)
       .size()
       .reset_index(name='num_shops'))
print (df)
   week  brand1  brand2  num_shops
0     1    cola  sprite          1
1     1    cola   pepsi          2
2     1  sprite    cola          1
3     1  sprite   pepsi          2
4     1   pepsi    cola          2
5     1   pepsi  sprite          2
6     2    cola  sprite          3
7     2  sprite    cola          3

编辑:

您的解决方案应该更改:

def func(x):
    x1 = x.merge(x,on=["week"],suffixes =('1','2'))
    x1 = x1[x1['brand1'].ne(x1['brand2'])]
    return x1.groupby(["brand1","brand2"], sort=False).size()

df = df.groupby(["week"]).apply(func).reset_index(name='num_shops')
print (df)
   week  brand1  brand2  num_shops
0     1    cola  sprite          1
1     1    cola   pepsi          2
2     1  sprite    cola          1
3     1  sprite   pepsi          2
4     1   pepsi    cola          2
5     1   pepsi  sprite          2
6     2    cola  sprite          3
7     2  sprite    cola          3

关于python - 按特定列查找表中所有对的计数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57769260/

python - 按特定列查找表中所有对的计数

上一篇：python - 从 pandas Series 字典列表转换为 DataFrame

下一篇：python - filterAcceptsRow() 到底做了什么？