我正在学习 pandas,我想为我的数据创建一个新列(我正在使用国家名称数据)。
我只使用 1880 和 1881 年。
name sex births year
0 Mary F 7065 1880
1 Anna F 2604 1880
2 Emma F 2003 1880
3 Elizabeth F 1939 1880
4 Worthy M 5 1880
5 Wright M 5 1880
6 York M 5 1880
7 Zachariah M 5 1880
8 Mary F 6919 1881
9 Anna F 2698 1881
10 Emma F 2034 1881
11 Elizabeth F 1852 1881
12 Wilton M 5 1881
13 Wing M 5 1881
14 Wood M 5 1881
15 Wright M 5 1881
我正在创建总出生数据:
total_births = names.pivot_table('births', index='year', columns='sex', aggfunc=sum)
给出:
sex F M
year
1880 13611 20
1881 13503 20
现在,我想在数据中创建另一列,在其中输入每年出生人数与每年总出生人数的比率。
例如:
name sex births year ratio
Mary F 7065 1880 7065/13611
Wilton M 5 1881 5/13503
我正在尝试:
new = (names.groupby(['year', 'sex'])).assign(ratio= (names.groupby(['year','sex'])).names['births'] / total_births )
给出:
AttributeError: Cannot access callable attribute 'assign' of 'DataFrameGroupBy' objects, try using the 'apply' method
或者
我试图打破:
ratio = names.groupby(['year','sex'])
ratio1 = ratio.loc[:,'births']
但它给出了:
AttributeError: Cannot access callable attribute 'loc' of 'DataFrameGroupBy' objects, try using the 'apply' method
最佳答案
我认为你需要groupby
与 transform
求和
,然后除以 div
:
rat = names.groupby(['year','sex'])['births'].transform('sum')
print (rat)
0 13611
1 13611
2 13611
3 13611
4 20
5 20
6 20
7 20
8 13503
9 13503
10 13503
11 13503
12 20
13 20
14 20
15 20
Name: births, dtype: int64
names['ratio'] = names.births.div(rat)
print (names)
name sex births year ratio
0 Mary F 7065 1880 0.519065
1 Anna F 2604 1880 0.191316
2 Emma F 2003 1880 0.147160
3 Elizabeth F 1939 1880 0.142458
4 Worthy M 5 1880 0.250000
5 Wright M 5 1880 0.250000
6 York M 5 1880 0.250000
7 Zachariah M 5 1880 0.250000
8 Mary F 6919 1881 0.512405
9 Anna F 2698 1881 0.199807
10 Emma F 2034 1881 0.150633
11 Elizabeth F 1852 1881 0.137155
12 Wilton M 5 1881 0.250000
13 Wing M 5 1881 0.250000
14 Wood M 5 1881 0.250000
15 Wright M 5 1881 0.250000
解决方案 assign
:
names = names.assign(ratio=lambda x: x.births.div(rat))
print (names)
name sex births year ratio
0 Mary F 7065 1880 0.519065
1 Anna F 2604 1880 0.191316
2 Emma F 2003 1880 0.147160
3 Elizabeth F 1939 1880 0.142458
4 Worthy M 5 1880 0.250000
5 Wright M 5 1880 0.250000
6 York M 5 1880 0.250000
7 Zachariah M 5 1880 0.250000
8 Mary F 6919 1881 0.512405
9 Anna F 2698 1881 0.199807
10 Emma F 2034 1881 0.150633
11 Elizabeth F 1852 1881 0.137155
12 Wilton M 5 1881 0.250000
13 Wing M 5 1881 0.250000
14 Wood M 5 1881 0.250000
15 Wright M 5 1881 0.250000
关于python - 添加新列并计算比率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40933545/