python - 10 分钟内的 sample 分箱

我有一个 pandas 数据框，由以下列组成

col1, col2, _time

_time 列是行发生时间的日期时间对象。

我想在 10 分钟内对数据帧进行重新采样，按两列进行分组，并聚合每 10 分钟内发生的每组的行数。我希望生成的数据框具有以下列

col1 col2 since until count

其中 since 是每个 10 分钟时间段的开始 until 每个 10 分钟时间段的结束，并计算在初始数据帧上找到的行数，例如

col1  col2          since                  until         count
1      1       08/12/2017 12:00      08/12/2017 12:10       10
1      2       08/12/2017 12:00      08/12/2017 12:10        5
1      1       08/12/2017 12:10      08/12/2017 12:20        3

这可以通过数据帧的重新采样方法实现吗？

最佳答案

我之前也一直在为此寻找resample，但没有成功。幸运的是，我使用 pd.Series.dt.floor 找到了解决方案!

使用.dt.floor将时间戳与 10 分钟间隔对齐，
在分组中使用生成的对象(或者，可以选择将其分配给源数据中的列，然后使用该列)
使用pd.to_timedelta从 since 列计算 until 列

例如，

import pandas as pd

interval = '10min'  # 10 minutes intervals, please

# Dummy data with 3-minute intervals
data = pd.DataFrame({
    'col1': [0, 0, 1, 0, 0, 0, 1, 0, 1, 1], 
    'col2': [4, 4, 4, 3, 4, 4, 3, 3, 4, 4], 
    '_time': pd.date_range(start='2010-01-01 00:01:00', freq='3min', periods=10),
})

# Floor the timestamps to your desired interval
since = data['_time'].dt.floor(interval).rename('since')

# Get the size of each group - groups are in the index of `agg`
agg = data.groupby(['col1', 'col2', since]).size()
agg = agg.rename('count')

# Back to dataframe
agg = agg.reset_index()

# Simply add your interval to `since`
agg['until'] = agg['since'] + pd.to_timedelta(interval)

print(agg)

   col1  col2               since  count               until
0     0     3 2010-01-01 00:10:00      1 2010-01-01 00:20:00
1     0     3 2010-01-01 00:20:00      1 2010-01-01 00:30:00
2     0     4 2010-01-01 00:00:00      2 2010-01-01 00:10:00
3     0     4 2010-01-01 00:10:00      2 2010-01-01 00:20:00
4     1     3 2010-01-01 00:10:00      1 2010-01-01 00:20:00
5     1     4 2010-01-01 00:00:00      1 2010-01-01 00:10:00
6     1     4 2010-01-01 00:20:00      2 2010-01-01 00:30:00

关于python - 10 分钟内的 sample 分箱，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45547635/

python - 10 分钟内的 sample 分箱

上一篇：python - 编写一个脚本将文件上传到 iRODS，我是否用 Python 编写并合并 iRODS 命令？

下一篇：python - 如何将现有列放置在层次结构下？