假设我有一个数据集 (df_data
),如下所示:
Time Geography Population
2016 England and Wales 58381200
2017 England and Wales 58744600
2016 Northern Ireland 1862100
2017 Northern Ireland 1870800
2016 Scotland 5404700
2017 Scotland 5424800
2016 Wales 3113200
2017 Wales 3125200
如果我执行以下操作:
df_nireland = df_data[df_data['Geography']=='Northern Ireland']
df_wales = df_data[df_data['Geography']=='Wales']
df_scotland = df_data[df_data['Geography']=='Scotland']
df_engl_n_wales = df_data[df_data['Geography']=='England and Wales']
df_england = df_engl_n_wales
df_england['Population'] = df_engl_n_wales['Population'] - df_wales['Population']
那么 df_england
在 Population
列中具有 NA 值。
我该如何解决这个问题?
顺便说一句,我已经阅读了相关帖子,但确实对我有用(.loc
、.copy
等)。
最佳答案
这确实是一个组织问题。如果您旋转
,那么您可以轻松地进行减法,并确保时间
对齐
df_pop = df.pivot(index='Time', columns='Geography', values='Population')
df_pop['England'] = df_pop['England and Wales'] - df_pop['Wales']
输出df_pop
:
Geography England and Wales Northern Ireland Scotland Wales England
Time
2016 58381200 1862100 5404700 3113200 55268000
2017 58744600 1870800 5424800 3125200 55619400
<小时/>
如果您需要恢复原始格式,那么您可以这样做:
df_pop.stack().to_frame('Population').reset_index()
# Time Geography Population
#0 2016 England and Wales 58381200
#1 2016 Northern Ireland 1862100
#2 2016 Scotland 5404700
#3 2016 Wales 3113200
#4 2016 England 55268000
#5 2017 England and Wales 58744600
#6 2017 Northern Ireland 1870800
#7 2017 Scotland 5424800
#8 2017 Wales 3125200
#9 2017 England 55619400
关于python - 数据帧的减法和赋值返回 NA,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55325009/