python - 将外键从一个数据帧附加到另一个数据帧的最快方法

这让我很沮丧，因为我确信这很容易做到，但我就是无法想出最佳解决方案。

基本上假设我有 df1 ，其中有列 vehicle , checkpoint ，表示比赛中每辆车在单圈中通过每个检查站的时间。但同一场比赛的某些检查站没有记录。

然后我有df2其中包含单个列 checkpoint其中包含应包含在 df1 中的检查点数据.

我正在尝试找到一种快速方法，将这些检查点本质上添加到每个唯一的 lap值 df1 。

例如: df1 = pd.DataFrame({'vehicle': [1,1,2,2,3,3], 'checkpoint': [1,5,1,5,1,5]}) df2 = pd.DataFrame({"checkpoints": range(2,5)})

我想要的是快速生成一个数据帧，将所有 df2 缺失的检查点添加到 df1 中的每辆车，以便生成的数据帧对于 3 辆独特的车辆中的每辆车都有检查点 1 到 5。

预期输出如下所示，但检查站和车辆不一定必须按顺序排列。重要的是，所有 5 个检查点都包含在所有 3 辆车中:

vehicle checkpoints
0   1   1
1   1   2
2   1   3
3   1   4
4   1   5
5   2   1
6   2   2
7   2   3
8   2   4
9   2   5
10  3   1
11  3   2
12  3   3
13  3   4
14  3   5

I've come up with solutions using list comprehensions and concatenation but it's far too slow on larger datasets. I'm not the most at ease with using apply either, so if there's a way to use apply or an entirely different and faster solution, I would be very much appreciative.

If you need more information don't hesitate to ask.

最佳答案

import pandas as pd
df1 = pd.DataFrame({'vehicle': [1,1,2,2,3,3], 'checkpoint': [1,5,1,5,1,5]}) 
df2 = pd.DataFrame({"checkpoint": range(2,5)})

基于合并的解决方案

连接df1和来自df1的独特车辆的完全外部合并以及来自df2的缺失检查点:

pd.concat([df1,
           pd.merge(df1[['vehicle']].drop_duplicates().assign(temp=1),
                    df2.assign(temp=1), how='outer').drop('temp', axis=1)]
         ).sort_values(['vehicle', 'checkpoint']).reset_index(drop=True)

输出如OP所示。

<小时/>

基于重新索引的解决方案

import itertools

all_vehicles = df1.vehicle.unique().tolist()
all_checkpoints = (df1.checkpoint.unique().tolist()
                   + df2.checkpoint.unique().tolist())

(df1.set_index(['vehicle', 'checkpoint'])
    .reindex(index=itertools.product(all_vehicles, all_checkpoints))
    .reset_index())

关于python - 将外键从一个数据帧附加到另一个数据帧的最快方法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59140072/

python - 将外键从一个数据帧附加到另一个数据帧的最快方法

上一篇：python - 合并多个数据帧并对值求和

下一篇：python - 如何制作多列，每列具有不同的行