我有一个名为 df1 的数据框,如下所示:
details endFrame id indexID object startFrame
'series of numbers’ 1111 78 0 Motorbike 1
'series of numbers’ 3647 78 1 Motorbike 1112
'series of numbers’ 3678 78 2 Motorbike 3649
'series of numbers’ 704 120 3 Pedestrian 66
'series of numbers’ 817 120 4 Pedestrian 705
'series of numbers’ 922 120 5 Pedestrian 818
'series of numbers’ 121 110 6 Pedestrian 69
'series of numbers’ 140 109 7 Pedestrian 69
'series of numbers’ 4161 109 8 Pedestrian 140
'series of numbers’ 4344 109 9 Pedestrian 4163
'series of numbers’ 3603 79 10 Motorbike 70
我还有另一个 df2,看起来像这样:
indexID matchID
0 1
1 2
3 4
4 5
7 8
8 9
匹配 ID 显示需要加入哪些 ID。例如,从前 2 行开始,索引 0,1 和 2 应该连接在一起。在 df1 中,所有细节都应该加在一起。最终的final df应该是这样的:
details id indexID
'series of numbers’'series of numbers’'series of numbers’ 78 0
'series of numbers’'series of numbers’'series of numbers’ 120 3
'series of numbers’ 110 6
'series of numbers’'series of numbers’'series of numbers’ 109 7
'series of numbers’ 79 10
我该怎么做?
编辑 这一系列数字实际上是一个列表,而不是像这样的输出:
details id indexID
[series of numbers][series of numbers][series of numbers] 78 0
[series of numbers][series of numbers][series of numbers] 120 3
[series of numbers] 110 6
[series of numbers][series of numbers][series of numbers] 109 7
[series of numbers] 79 10
我希望它有这样的输出:
details id indexID
[series of numbersseries of numbersseries of numbers] 78 0
[series of numbersseries of numbersseries of numbers] 120 3
[series of numbers] 110 6
[series of numbersseries of numbersseries of numbers] 109 7
[series of numbers] 79 10
最佳答案
用 mask
将匹配值替换为缺失值与 isin
并按以前的值向前填充:
g = df1['indexID'] .mask(df1['indexID'].isin(df2['matchID'])).ffill().astype(int)
print (g)
0 0
1 0
2 0
3 3
4 3
5 3
6 6
7 7
8 7
9 7
10 10
Name: indexID, dtype: int32
然后使用 groupby
和 join
:
#if want grouping only be new Series g
df = df1.groupby(g).agg({'details':' '.join, 'id':'first'}).reset_index()
print (df)
indexID details id
0 0 'series of numbers' 'series of numbers' 'serie... 78
1 3 'series of numbers' 'series of numbers' 'serie... 120
2 6 'series of numbers' 110
3 7 'series of numbers' 'series of numbers' 'serie... 109
4 10 'series of numbers' 79
#or also by id column
df = df1.groupby(['id',g], sort=False)['details'].agg(' '.join).reset_index()
print (df)
id indexID details
0 78 0 'series of numbers' 'series of numbers' 'serie...
1 120 3 'series of numbers' 'series of numbers' 'serie...
2 110 6 'series of numbers'
3 109 7 'series of numbers' 'series of numbers' 'serie...
4 79 10 'series of numbers'
关于python - 合并数据框的特定行并删除未使用的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52438106/