python - 如何在 Pandas 中另一列的值之间聚合列中的值

我有两个要合并的数据框。它们看起来如下:

df_1
unit   start_time   stop_time
A        0.0          1.2
B        1.3          4.1
A        4.2          4.5
B        4.6          7.2
A        7.3          8.0

df_2
time    other_data
0.2       .0122
0.4       .0128
0.6       .0101
0.8       .0091
1.0       .2122
1.2       .1542
1.4       .1546
1.6       .1522
1.8       .2542
2.0       .1557
2.2       .2542
2.4       .1543
2.6       .0121
2.8       .0111
3.0       .0412
3.2       .0214
3.4       .0155
3.6       .0159
3.8       .0154
4.0       .0155
4.2       .0211
4.4       .0265
4.6       .0146
4.8       .0112
5.0       .0166
5.2       .0101
5.4       .0132
5.6       .0112
5.8       .0121
6.0       .0142
6.2       .0124
6.4       .0111
6.6       .0123
6.8       .0111
6.0       .0119
6.2       .0112
6.4       .0131
6.6       .0117
6.8       .0172
7.0       .0123
7.2       .0127
7.4       .0121
7.6       .0110
7.8       .0120
8.0       .0121

我想使用以下标准合并这些数据框:

第一步

我想对 df_2.other_data 中的所有值进行分组，其中 df_2.time 介于 df_1.start_time 和 df_1.stop_time 之间。例如，对于 df_1 的第一行，来自 df_2 的以下数据将被分组:

time    other_data
0.2       .0122
0.4       .0128
0.6       .0101
0.8       .0091
1.0       .2122
1.2       .1542

第二步

在此组中，我想计算 df_2.other_data 高于阈值的观察总数，在本例中，阈值将设置为 .0120。该组中高于此阈值的观察总数为 4。这是我要合并到 df_1 的值。结果应如下所示:

unit   start_time   stop_time   other_data_above_threshold
A        0.0          1.2             4

最终的数据框应该是这样的:

unit   start_time   stop_time   other_data_above_threshold
A        0.0          1.2              4
B        1.3          4.1              13
A        4.2          4.5              3
B        4.6          7.2              11
A        7.3          8.0              4

最佳答案

IIUC，这就是你需要的。

df['other_data_at'] = df.apply(lambda x: df2.loc[(df2['time']>= x['start_time']) & (df2['time']<= x['stop_time'])].loc[df2['other_data']>=0.012].count()[0], axis=1)

输出

   unit start_time  stop_time   other_data_at
0   A   0.0              1.2    4
1   B   1.3              4.1    13
2   A   4.2              4.5    2 #you expected output shows 3 but it should be 2
3   B   4.6              7.2    11
4   A   7.3              8.0    3

关于python - 如何在 Pandas 中另一列的值之间聚合列中的值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58279227/

python - 如何在 Pandas 中另一列的值之间聚合列中的值

第一步

第二步

上一篇：Python 处理打开另一个工具的 cmd 命令

下一篇：python - 理解位置参数的直观方法不应该跟在关键字参数之后