python - 如何删除 pandas resample 添加的额外天数？

我有一个包含刻度数据的 pandas df，索引为 datetime64[ns] 我想将此数据重新采样为 5 分钟间隔，如下所示:price_5min = Price.price.resample( '5T').ohlc(). Between_time('09:00:00, '16:20:00')

它有效，但是它在新时间序列中添加了周末和假期，我需要将其删除。我不遵循美国(或任何其他标准假期日历)。我只想删除原始 price df 中没有的日期。

索引不唯一，很多情况下时间戳相同。 Pandas 版本0.20.1

我尝试过的:

1) dropna():我需要填充带有 NAN 的行，因此这不起作用。

2)price.index.difference(price_5min.index):给我所有的刻度日期，而不是日期。

3)price.index.date.difference(price_5min.index.date):不起作用，因为index.date是一个numpy.ndarray

4) 价格!=price_5min:错误:只能比较相同标签的 DataFrame 对象

5)price.index!=price_4min.index:错误:长度必须匹配才能比较

解决我的问题的建议逻辑:

a)以某种方式比较两个数据框中的日期并据此删除，但是如何？

b) 删除没有差异的日子，但是如何删除呢？

c)我没有想到的显而易见的事情(很可能......)

df 价格如下所示:

                     price  quantity
time                                
2016-06-15 16:19:20  29.85     429.6
2016-06-15 16:19:20  29.85      65.6
2016-06-15 16:19:20  29.85    1351.4
2016-06-15 16:19:30  29.70     729.4
2016-06-15 16:19:30  29.70     287.0
2016-06-15 16:19:30  29.70     219.4
2016-06-15 16:19:49  29.70      47.4
2016-06-15 16:19:52  29.70      11.8
2016-06-16 09:01:42  29.05     350.0
2016-06-16 09:01:42  29.10     189.4
2016-06-16 09:01:45  29.05      33.6
2016-06-16 09:01:54  29.05      33.6
...

最佳答案

我认为你可以使用np.setdiff1d和 numpy.in1d并按boolean indexing过滤:

diffs = np.setdiff1d(price_5min.index.date, price.index.date))
df = price_5min[~np.in1d(price_5min.index.date, diffs]

另一个解决方案 DatetimeIndex.floor或to_period :

dates = price.index.floor('D')
dates_5min = price_5min.index.floor('D')
df = price_5min[~dates_5min.isin(dates_5min.difference(dates))]

<小时/>

dates = price.index.to_period('D')
dates_5min = price_5min.index.to_period('D')
df = price_5min[~dates_5min.isin(dates_5min.difference(dates))]

关于python - 如何删除 pandas resample 添加的额外天数？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44900011/

python - 如何删除 pandas resample 添加的额外天数？

上一篇：python - 注册基于类的任务

下一篇：python - slack API users.identity 不起作用