python - 按特定类对一系列数据帧进行计数

这两个数据框c1、c2包含一些公司及其所属的国家/地区。

## Data of c1: 6 company in 4 countries.
   Country    Company
0   USA    Walmart
1   USA    Apple
2   China  CNPC 
3   China  State_grid 
4   UK     BP 
5   Japan  Toyota    

## Data of c2: 10 company in the same 4 countries.    
   Country    Company
0   USA    Walmart
1   USA    Apple
2   USA    Verizon
3   USA    JP_Morgan
4   China  CNPC 
5   China  China_Bank 
6   UK     BP 
7   Japan  Toyota
8   Japan  Honda
9   Japan  Sony

我们可以注意到c1和c2中的一些公司是不同的(例如:本田)，有些是相同的(例如:沃尔玛)。

我的目标

Combine these two dataframe and summarize the amount of company for each country.

A.对于一个数据帧，我可以使用

> c1.Country.value_counts()
output:
USA      2
China    2
UK       1
Japan    1

B.对于具有相同内容的两个数据帧，我尝试使用 unique 函数来删除转发器。

>dc = pd.concat([c1.Company,c2.Company])
>print len(dc)
>print len(dc.unique())

Output:
> 16
> 11

如何将c1和c2进程结合在一起，并过滤转发器？
然后，我可以得到如下统计结果:

   Country    Company
0   USA    Walmart
1   USA    Apple
2   USA    Verizon
3   USA    JP_Morgan
4   China  CNPC 
5   China  State_grid 
6   China  China_Bank 
7   UK     BP 
8   Japan  Toyota
9   Japan  Honda
10   Japan  Sony

最佳答案

我认为你可以先concat DataFrames 然后 drop_duplicates与 reset_index :

 c = pd.concat([c1, c2]).drop_duplicates(subset=['Country','Company']).reset_index(drop=True)
  Country     Company
0     USA     Walmart
1     USA       Apple
2     USA     Verizon
3     USA   JP_Morgan
4   China        CNPC
5   China  China_Bank
6      UK          BP
7   Japan      Toyota
8   Japan       Honda
9   Japan        Sony

print c.Country.value_counts()
USA      4
China    3
Japan    3
UK       1
Name: Country, dtype: int64

关于python - 按特定类对一系列数据帧进行计数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35718911/

python - 按特定类对一系列数据帧进行计数

上一篇：python - PyQt Python 应用程序中的后台计时器

下一篇：python - Google App Engine URL 处理程序运行时出错