python - 比较数据透视表中的列并添加结果

标签 python dataframe pivot-table

我正在使用来自 http://senegal.opendataforafrica.org/SNVS2015/vital-statistics-of-senegal-2015 的关于塞内加尔人口的开放数据 csv .将它与 pandas 一起导入数据框(形状 17568,7)。

    region  regional-division   sex indicator                               Unit    Date    Value
0   Dakar   Total   Total       Populations (projection de 2008 à   2015)   Number  2008    2482294.0 
1   Dakar   Total   Total       Populations    (projection de 2008 à 2015)  Number  2009    2536959.0
2   Dakar   Total   Total       Populations (projection de 2008 à   2015)   Number  2010    2592191.0 
3   Dakar   Total   Total       Populations   (projection de 2008 à 2015)   Number  2011    2647751.0
4   Dakar   Total   Total       Populations (projection de 2008 à   2015)   Number  2012    2703203.0 
5   Dakar   Total   Total       Populations   (projection de 2008 à 2015)   Number  2013    2776787.0
6   Dakar   Total   Total       Populations (projection de 2008 à   2015)   Number  2014    2851556.0 
7   Dakar   Total   Total       Populations   (projection de 2008 à 2015)   Number  2015    2927422.0
8   Dakar   Total   Men         Populations (projection de 2008 à   2015)   Number  2008    1242463.0 
9   Dakar   Total   Men         Populations (projection   de 2008 à 2015)   Number  2009    1269764.0

然后做了

total_population_condition = (population['sex'] == 'Total') & (population['regional-division'] == 'Total')
total_population = population[total_population_condition]

最重要的是

pivot_total_population = pd.pivot_table(total_population,values='Value',index=['region','sex'],columns='Date')

Pivot Table

现在的问题是:我想找出 2008 年至 2015 年间人口增长最快的 5 个地区,以及缩减最多的 5 个地区。我试图访问具有“2008”值和“2015”值的数据透视表列,然后将后者分为前者。然后将结果添加到数据框。没有设法。我该怎么做?

更新:我刚刚想出了如何......

# compute growth first per region
pivot_total_population['growth'] = 
pivot_total_population.iloc[:,7]/pivot_total_population.iloc[:,0]

# then determine which are top 10 growing regions in terms of total population
pivot_total_population.sort_values(['growth'],ascending=False).head(10)

# then determine which are top 10 shrinking regions in terms of total population
pivot_total_population.sort_values(['growth'],ascending=True).head(10)

最佳答案

找到答案(thx gboffi 给新手的过程提示 ;-))

# compute growth first per region
pivot_total_population['growth'] = 
pivot_total_population.iloc[:,7]/pivot_total_population.iloc[:,0]

# then determine which are top 10 growing regions in terms of total population
pivot_total_population.sort_values(['growth'],ascending=False).head(10)

# then determine which are top 10 shrinking regions in terms of total population
pivot_total_population.sort_values(['growth'],ascending=True).head(10)

关于python - 比较数据透视表中的列并添加结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52005592/

相关文章:

python - python中的数据透视表(列中值的总和)

python - pandas 按日期分组,为列分配值

python - 如何使用 twinx 并仍然获得方形图

python - 为什么当 except block 中发生异常时,finally block 不返回最新的异常?

python - 将 groupby 操作的结果行插入到原始数据框中

python - Dataframe bool 逻辑索引匹配

javascript - 使用 Javascript 透视数据

Python路径解释: import from a subpackage

python - Tally SOAP API 规范

python - 从 Dataframe 中的 2 个或更多列获取唯一值的有效方法