我是 Python-Pandas 的新手。 我有示例数据集,例如
PRODUCT REGION COUNTRY MEASURE Month_ID QTY
P1 West UK M1 Mon_1 200
P1 West UK M2 Mon_1 150
P1 East JAPAN M1 Mon_1 100
P1 East JAPAN M2 Mon_1 100
P1 West UK M1 Mon_2 300
P1 West UK M2 Mon_2 450
P1 East JAPAN M1 Mon_2 500
P1 East JAPAN M2 Mon_2 600
我想要如下数据:
PRODUCT REGION COUNTRY MEASURE Month_ID QTY
P1 West UK M1 Mon_1 200
P1 West UK M2 Mon_1 150
P1 West UK NEW_M Mon_1 350
P1 East JAPAN M1 Mon_1 100
P1 East JAPAN M2 Mon_1 100
P1 East JAPAN NEW_M Mon_1 200
P1 West UK M1 Mon_2 300
P1 West UK M2 Mon_2 450
P1 West UK NEW_M Mon_2 750
P1 East JAPAN M1 Mon_2 500
P1 East JAPAN M2 Mon_2 600
P1 East JAPAN NEW_M Mon_2 1100
我想按列分组 (PRODUCT, REGION, COUNTRY, Month_ID)
与 SUM(QTY)
.
并且新行将添加到每个列 MEASURE
的组之后如NEW_M
.
最佳答案
您可以通过聚合 sum
创建新的 DataFrame,然后为了正确排序,添加最后一个重复索引 DataFrame.set_index
,所以在concat
之后添加DataFrame.sort_index
对于每组之后的新行:
cols = ['PRODUCT', 'REGION', 'COUNTRY', 'Month_ID']
idx = df.index[df.duplicated(cols)]
df1 = (df.groupby(cols, as_index=False, sort=False)['QTY']
.sum()
.assign(MEASURE = 'NEW_M')
.set_index(idx))
df = pd.concat([df, df1], sort=False).sort_index(kind='mergesort').reset_index(drop=True)
print (df)
PRODUCT REGION COUNTRY MEASURE Month_ID QTY
0 P1 West UK M1 Mon_1 200
1 P1 West UK M2 Mon_1 150
2 P1 West UK NEW_M Mon_1 350
3 P1 East JAPAN M1 Mon_1 100
4 P1 East JAPAN M2 Mon_1 100
5 P1 East JAPAN NEW_M Mon_1 200
6 P1 West UK M1 Mon_2 300
7 P1 West UK M2 Mon_2 450
8 P1 West UK NEW_M Mon_2 750
9 P1 East JAPAN M1 Mon_2 500
10 P1 East JAPAN M2 Mon_2 600
11 P1 East JAPAN NEW_M Mon_2 1100
编辑:对于减法,使用了小技巧 - MEASURE
中 QTY
和 M2
的值乘以 -1
,所以如果聚合 sum
得到差异:
#if need only `M1` and `M2` rows
df = df[df['MEASURE'].isin(['M1','M2'])]
cols = ['PRODUCT', 'REGION', 'COUNTRY', 'Month_ID']
idx = df.index[df.duplicated(cols)]
df1 = (df.assign(QTY=df['QTY'].mask(df['MEASURE'].eq('M2'),df['QTY'] * -1))
.groupby(cols, as_index=False, sort=False)['QTY']
.sum()
.assign(MEASURE = 'NEW_M')
.set_index(idx)
)
df2 = pd.concat([df, df1], sort=False).sort_index(kind='mergesort').reset_index(drop=True)
print (df2)
PRODUCT REGION COUNTRY MEASURE Month_ID QTY
0 P1 West UK M1 Mon_1 200
1 P1 West UK M2 Mon_1 150
2 P1 West UK NEW_M Mon_1 50
3 P1 East JAPAN M1 Mon_1 100
4 P1 East JAPAN M2 Mon_1 100
5 P1 East JAPAN NEW_M Mon_1 0
6 P1 West UK M1 Mon_2 300
7 P1 West UK M2 Mon_2 450
8 P1 West UK NEW_M Mon_2 -150
9 P1 East JAPAN M1 Mon_2 500
10 P1 East JAPAN M2 Mon_2 600
11 P1 East JAPAN NEW_M Mon_2 -100
关于python - Pandas 中行级别的分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60521768/