python - Pandas 根据时间条件连接两个表

标签 python pandas

我正在尝试根据时间戳连接 Pandas 中的两个表。基本上结构看起来像这样:

表2

Timestamp          Truck     MineX           MineY
2016-08-27 01:10    CT77    -11346.36655    -650404.405
2016-08-27 01:12    CT45    -11596.88137    -648294.056
2016-08-27 01:13    CT67    -11953.16118    -648325.114
2016-08-27 01:13    CT75    -11326.54075    -650447.462
2016-08-27 01:14    CT79    -11380.27834    -650425.968
2016-08-27 01:15    CT26    -9493.153286    -652313.633
2016-08-27 01:16    CT73    -11527.47602    -650210.723
2016-08-27 01:16    CT40    -11596.90867    -648260.214
2016-08-27 01:17    CT26    -9493.153286    -652313.633
2016-08-27 01:17    CT80    -11363.34558    -650385.959
2016-08-27 01:17    CT72    -11527.47355    -650213.8

表1

Truck   LoadLocation    Tonnes  ArriveTimestamp
CT70    338-001       261         2016-02-21 00:23
CT66    338-001       271         2016-02-21 00:31
CT62    338-001       264         2016-02-21 00:45
CT73    338-001       254         2016-02-21 00:54
CT71    338-001       250         2016-02-21 01:04
CT39    338-001       182.172     2016-02-21 01:11
CT62    338-001       285         2016-02-21 01:19
CT70    338-001       282         2016-02-21 01:25
CT73    338-001       250         2016-02-21 01:30
CT73    338-001       275         2016-02-21 01:35
CT64    338-001       253         2016-02-21 01:42

表1和表2需要连接,其中Timestamp和ArriveTimeStamp相差一分钟以内,并且卡车ID相同。首选左连接,如果没有匹配,表 2 中的记录将被丢弃

最佳答案

您可以使用merge :

df = pd.merge(df1, df2, on='Truck', how='left')
print (df)
   Truck LoadLocation   Tonnes     ArriveTimestamp           Timestamp  \
0   CT70      338-001  261.000 2016-02-21 00:23:00                 NaT   
1   CT66      338-001  271.000 2016-02-21 00:31:00                 NaT   
2   CT62      338-001  264.000 2016-02-21 00:45:00                 NaT   
3   CT73      338-001  254.000 2016-02-21 00:54:00 2016-08-27 01:16:00   
4   CT71      338-001  250.000 2016-02-21 01:04:00                 NaT   
5   CT39      338-001  182.172 2016-02-21 01:11:00                 NaT   
6   CT62      338-001  285.000 2016-02-21 01:19:00                 NaT   
7   CT70      338-001  282.000 2016-02-21 01:25:00                 NaT   
8   CT73      338-001  250.000 2016-02-21 01:30:00 2016-08-27 01:16:00   
9   CT73      338-001  275.000 2016-02-21 01:35:00 2016-08-27 01:16:00   
10  CT64      338-001  253.000 2016-02-21 01:42:00                 NaT   

          MineX       MineY  
0           NaN         NaN  
1           NaN         NaN  
2           NaN         NaN  
3  -11527.47602 -650210.723  
4           NaN         NaN  
5           NaN         NaN  
6           NaN         NaN  
7           NaN         NaN  
8  -11527.47602 -650210.723  
9  -11527.47602 -650210.723  
10          NaN         NaN  

boolean indexing ,其中过滤 datetimes 中的绝对差异 - 示例返回空 DataFrame:

print ((df.Timestamp - df.ArriveTimestamp).astype('timedelta64[s]'))
0            NaN
1            NaN
2            NaN
3     16244520.0
4            NaN
5            NaN
6            NaN
7            NaN
8     16242360.0
9     16242060.0
10           NaN
dtype: float64

print ((df.Timestamp - df.ArriveTimestamp).astype('timedelta64[s]').abs() < 60)
0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
dtype: bool
Empty DataFrame

print (df[(df.Timestamp - df.ArriveTimestamp).astype('timedelta64[s]').abs() < 60])
Empty DataFrame
Columns: [Truck, LoadLocation, Tonnes, ArriveTimestamp, Timestamp, MineX, MineY]
Index: []

关于python - Pandas 根据时间条件连接两个表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39721008/

相关文章:

python - Pandas:返回两个 DataFrame 变量之间匹配值的计数

python - perl 的哈希和 python 的字典之间的区别

python - 如果我向字符串添加新元素,如何将字符串中的元素重新排列为想要的结果

python - Pandas Xlsxwriter 时间格式

python - 类型错误 : unsupported format string passed to Series. __format__

python - 如何使用 Pandas 将字符串与数据框中的字符串进行比较?

python - 返回两个字典的最大键

python - 如何从 python pandas 数据框中获取行并将其放入一个新数据框的列中

python - 根据特定日期条件在Python中选择日期列

python - Pandas:按 A 列对数据进行分组,按 B 列的现有值过滤 A