Python:如何将复杂的SQL聚合语句转换为pandas？

使用 pandas，创建 SQL group by 语句的等效项的最佳方法是什么:

每个字段都有不同的聚合函数(例如，我需要 field1、field2 的平均值和 field3 的最大值)
稍微多一点复杂的计算，如 sum(field1)/sum(field2)，例如对于加权平均值

假设我有一个包含城市级别数据的表，我想按国家和地区聚合它。在 SQL 中我会这样写:

select Country, Region
, count(*) as '# of cities'
,sum(GDP) as GDP
,avg(Population) as 'avg # inhabitants per city'
,sum(male_population) / sum(Population) as '% of male population'
from CityTable
group by Country, Region

我怎样才能在 pandas 中做同样的事情？谢谢!

最佳答案

>>> df
   Country  Region  GDP  Population  male_population
0      USA      TX   10         100               50
1      USA      TX   11         120               60
2      USA      KY   11         200              120
3  Austria  Wienna    5          50               34
>>>
>>> df2 = df.groupby(['Country','Region']).agg({'GDP': [np.size, np.sum], 'Population': [np.average, np.sum], 'male_population': np.sum})
>>> df2
                GDP     male_population Population     
               size sum             sum    average  sum
Country Region                                         
Austria Wienna    1   5              34         50   50
USA     KY        1  11             120        200  200
        TX        2  21             110        110  220
>>>
>>> df2['% of male population'] = df2['male_population','sum'].divide(df2['Population','sum'])
>>> df2
                GDP     male_population Population      % of male population
               size sum             sum    average  sum                     
Country Region                                                              
Austria Wienna    1   5              34         50   50                 0.68
USA     KY        1  11             120        200  200                 0.60
        TX        2  21             110        110  220                 0.50
>>>
>>> del df2['male_population', 'sum']
>>> del df2['Population', 'sum']
>>> df2.columns = ['# of cities', 'GDP', 'avg # inhabitants per city', '% of male population']

结果

>>> df2
                # of cities  GDP  avg # inhabitants per city  % of male population
Country Region                                                                    
Austria Wienna            1    5                          50                  0.68
USA     KY                1   11                         200                  0.60
        TX                2   21                         110                  0.50

关于Python:如何将复杂的SQL聚合语句转换为pandas？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/27260003/

Python:如何将复杂的SQL聚合语句转换为pandas？

上一篇：python - for-loop + numpy.where 的向量化版本

下一篇：python - Django - 序列化多态模型