python - 跨行对 Pandas 数据框进行分组 - 2.0

标签 python pandas dataframe

关于这个问题 Grouping Pandas dataframe across rows ,操作是:

          amount
clients           
Comp1    16.360417
Comp2    69.697501
Comp3    85.700000
Comp4    36.666667
Comp5    44.156500 

如果将日期列添加到输入中:

tdate,client1,client2,client3,client4,client5,client6,amount
12/31/2017,,,Comp1,,,4.475000
12/31/2017,,,Comp2,,,16.305584
10/31/2107,,,Comp3,,,4.050000
10/31/2017,Comp2,Comp1,,Comp4,,,21.000000
1/1/2017,,,Comp4,,,30.000000
2/2/2017,Comp1,,Comp2,,,5.137500
10/31/2017,,,Comp3,,,52.650000
12/31/2017,,,Comp1,,,2.650000
10/31/2017,Comp3,,,Comp3,,,29.000000
12/31/2017,Comp5,,,Comp2,,,20.809000
1/1/2017,Comp5,,,Comp2,,,15.100000
10/31/2017,Comp5,,,Comp2,,,52.404000

我们如何得到这个输出:

12/31/2017 Comp1 4.475+2.65
12/31/2017 Comp2 16.305584+20.809/2 
10/31/2017 Comp2 21/3+5.1375/2+52.404/2
1/1/2017   Comp2 15.1/2
10/31/2017 Comp3 4.05+52.65+29
1/1/2017   Comp4 30
10/21/2017 Comp4 21/3
12/31/2017 Comp5 20.809/2
1/1/2017   Comp5 15.1/2
10/31/2017 Comp5 52.404/2   

最佳答案

改进之前的答案,我们需要通过设置两列作为索引来使用堆栈。

cols= ['amount','tdate']
df['new'] = df['amount']/df.drop(cols,1).count(1)

#Set the index as new and tdate by droping amount column, stack and drop the nans.
x = df.drop(['amount'],1).set_index(['new','tdate']).stack().dropna()

#Create dataframe from amount,tdate and the clients
ndf = pd.DataFrame({'amount':x.index.get_level_values('new'),'tdate':x.index.get_level_values('tdate'),'clients':x.values})

#Groupby `clients` and `tdate` 
ndf.groupby(['clients','tdate']).sum().reset_index()

输出:

  clients       tdate     amount
0    Comp1  10/31/2017   7.000000
1    Comp1  12/31/2017   7.125000
2    Comp1    2/2/2017   2.568750
3    Comp2    1/1/2017   7.550000
4    Comp2  10/31/2017  33.202000
5    Comp2  12/31/2017  26.710084
6    Comp2    2/2/2017   2.568750
7    Comp3  10/31/2017  81.650000
8    Comp3  10/31/2107   4.050000
9    Comp4    1/1/2017  30.000000
10   Comp4  10/31/2017   7.000000
11   Comp5    1/1/2017   7.550000
12   Comp5  10/31/2017  26.202000
13   Comp5  12/31/2017  10.404500

关于python - 跨行对 Pandas 数据框进行分组 - 2.0,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46997707/

相关文章:

python - 在 pandas 中使用 groupby 用模式替换缺失值时出现 IndexError

python - 无法在 Python 中解析日期

python - ansible 剧本 : No package matching 'python-pip' found available -on centos guest 出错

python - 如何在 python 中绘制 3D 数据的核密度估计 (KDE) 和零交叉?

python - PyCharm 变量资源管理器不显示带有空格的 Pandas 列名称

python - 如何使用python重新排列数据框的行?

python - 如何从两幅相似图像中提取噪声?

python - 在没有模块的Python中转置矩阵?

python pandas数据框,是按值传递还是按引用传递

python - Pandas:将 IP 解析为国家/地区的最快方法