python - Pandas :删除另一个系列时间索引的时间间隔内的所有行(即时间范围排除)

假设我有两个数据框:

#df1
time
2016-09-12 13:00:00.017    1.0
2016-09-12 13:00:03.233    1.0
2016-09-12 13:00:10.256    1.0
2016-09-12 13:00:19.605    1.0

#df2
time
2016-09-12 13:00:00.017    1.0
2016-09-12 13:00:00.233    0.0
2016-09-12 13:00:01.016    1.0
2016-09-12 13:00:01.505    0.0
2016-09-12 13:00:06.017    1.0
2016-09-12 13:00:07.233    0.0
2016-09-12 13:00:08.256    1.0
2016-09-12 13:00:19.705    0.0

我想删除 df2 中时间索引在 df1 中最多 +1 秒的所有行，因此产生:

#result
time
2016-09-12 13:00:01.505    0.0
2016-09-12 13:00:06.017    1.0
2016-09-12 13:00:07.233    0.0
2016-09-12 13:00:08.256    1.0

最有效的方法是什么？我在 API 中看不到任何对时间范围排除有用的信息。

最佳答案

您可以使用 pd.merge_asof这是一个以 0.19.0 开头的新包含，并且还接受一个 tolerance 参数以匹配 +/- 指定的时间间隔量。

# Assuming time to be set as the index axis for both df's
df1.reset_index(inplace=True)
df2.reset_index(inplace=True)

df2.loc[pd.merge_asof(df2, df1, on='time', tolerance=pd.Timedelta('1s')).isnull().any(1)]

请注意，默认匹配是在向后的方向进行的，这意味着选择发生在右侧 DataFrame (df1) 的最后一行，其 "on" 键(即 "time")小于或等于左边的 (df2) 键。因此，tolerance 参数仅在这个方向(向后)扩展，从而导致 - 匹配范围。

要同时进行正向和反向查找，从0.20.0 开始这可以通过使用 direction='nearest' 参数并将其包含在函数调用中来实现。因此，容差 也得到双向扩展，从而导致匹配的+/- 带宽范围。

关于python - Pandas :删除另一个系列时间索引的时间间隔内的所有行(即时间范围排除)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/40512442/

python - Pandas :删除另一个系列时间索引的时间间隔内的所有行(即时间范围排除)

上一篇：python - 波浪号 (~) 在 subprocess.Popen() 中不起作用

下一篇：python - else if 在 Python3 中的列表理解中