我很难准确地解释问题,但能够准确地 演示一下,我只想找到组内的比例。
使用以下数据:
import pandas as pd
df1 = pd.DataFrame(
{
"large": ["L1" for _ in range(8)],
"small": ["S1" for i in range(4)] + ["S2" for _ in range(4)],
"who": ["D", "E", "F", "G"] + ["D", "E", "F", "G"],
"amount": [1, 3, 2, 0, 3, 10, 2, 1],
"total": [22 for _ in range(8)],
}
)
df2 = pd.DataFrame(
{
"large": ["L2" for _ in range(8)],
"small": ["S3" for _ in range(4)] + ["S4" for _ in range(4)],
"who": ["D", "E", "F", "G"] + ["D", "E", "F", "G"],
"amount": [0, 8, 1, 1, 5, 3, 4, 1],
"total": [23 for _ in range(8)],
}
)
df = pd.concat([df1, df2]).reset_index(drop=True)
哪些输出:
In [82]: df
Out[82]:
large small who amount total
0 L1 S1 D 1 22
1 L1 S1 E 3 22
2 L1 S1 F 2 22
3 L1 S1 G 0 22
4 L1 S2 D 3 22
5 L1 S2 E 10 22
6 L1 S2 F 2 22
7 L1 S2 G 1 22
8 L2 S3 D 0 23
9 L2 S3 E 8 23
10 L2 S3 F 1 23
11 L2 S3 G 1 23
12 L2 S4 D 5 23
13 L2 S4 E 3 23
14 L2 S4 F 4 23
15 L2 S4 G 1 23
我想计算(大范围内的金额)/(大范围内的总数)
对于每个 who
,因此会有一些重复。
我可以按如下方式计算每个who
的值
In [85]: df.groupby(['large','who']).agg('sum')
Out[85]:
amount total
large who
L1 D 4 44
E 13 44
F 4 44
G 1 44
L2 D 5 46
E 11 46
F 5 46
G 2 46
其中的金额
列很有趣。
使用large_proportions
来表示我将进行的计算
以下(我用分数表示,以便更清楚地了解发生了什么):
large small who amount total large_proportions
0 L1 S1 D 1 22 4/22
1 L1 S1 E 3 22 13/22
2 L1 S1 F 2 22 4/22
3 L1 S1 G 0 22 1/22
4 L1 S2 D 3 22 4/22
5 L1 S2 E 10 22 13/22
6 L1 S2 F 2 22 4/22
7 L1 S2 G 1 22 1/22
8 L2 S3 D 0 23 5/23
9 L2 S3 E 8 23 11/23
10 L2 S3 F 1 23 5/23
11 L2 S3 G 1 23 2/23
12 L2 S4 D 5 23 5/23
13 L2 S4 E 3 23 11/23
14 L2 S4 F 4 23 5/23
15 L2 S4 G 1 23 2/23
摘要
所以问题是,给定原始数据帧df
,如何构造
最终输出包含 large_proportions
最佳答案
您可以在计算中使用变换
,因此它仍保留原始尺寸:
df['large_proportions'] = df.groupby(['large','who'])['amount'].transform('sum') / df['total']
Out[32]:
large small who amount total large_proportions
0 L1 S1 D 1 22 0.181818
1 L1 S1 E 3 22 0.590909
2 L1 S1 F 2 22 0.181818
3 L1 S1 G 0 22 0.045455
4 L1 S2 D 3 22 0.181818
5 L1 S2 E 10 22 0.590909
6 L1 S2 F 2 22 0.181818
7 L1 S2 G 1 22 0.045455
8 L2 S3 D 0 23 0.217391
9 L2 S3 E 8 23 0.478261
10 L2 S3 F 1 23 0.217391
11 L2 S3 G 1 23 0.086957
12 L2 S4 D 5 23 0.217391
13 L2 S4 E 3 23 0.478261
14 L2 S4 F 4 23 0.217391
15 L2 S4 G 1 23 0.086957
Transform
将聚合您的值并重复它们,以便您的结果具有与原始系列相同的长度,即使在 groupby
生效之后也是如此。
关于python - 创建分组比例向量而不丢失行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59994639/