python - 重新采样 pandas 数据帧并返回开始时间和结束时间

标签 python pandas dataframe resampling

我有一个 pandas 数据框,我想对每个 id 重新采样 10 秒。但是,我还想扩展输出以返回每个 id 的采样开始时间和结束时间。数据框、预期输出和我尝试过的内容如下。

数据框:

id,date,value
1,2012-01-01 00:09:45,1
1,2012-01-01 00:09:46,1
2,2012-01-01 00:09:47,1
1,2012-01-01 00:09:47,1
2,2012-01-01 00:09:48,1
1,2012-01-01 00:09:51,1
1,2012-01-01 00:09:52,1
1,2012-01-01 00:09:53,1
2,2012-01-01 00:10:00,1
2,2012-01-01 00:10:01,1
2,2012-01-01 00:10:04,1
2,2012-01-01 00:10:05,1
2,2012-01-01 00:10:06,1
3,2012-01-01 00:30:04,1
3,2012-01-01 00:30:05,1
3,2012-01-01 00:30:06,1
3,2012-01-01 00:30:08,1
3,2012-01-01 00:30:09,1
2,2012-01-01 00:30:18,1
2,2012-01-01 00:30:19,1
2,2012-01-01 00:30:23,1
2,2012-01-01 00:30:24,1
3,2012-01-01 00:30:25,1
3,2012-01-01 00:30:26,1
3,2012-01-01 00:30:29,1
3,2012-01-01 00:30:30,1
3,2012-01-01 00:30:32,1
3,2012-01-01 00:30:33,1

预期输出:

id,date,value,start-time,end-time
1,2012-01-01 00:09:40,3,2012-01-01 00:09:45,2012-01-01 00:09:47
2,2012-01-01 00:09:40,2,2012-01-01 00:09:47,2012-01-01 00:09:48
1,2012-01-01 00:09:50,3,2012-01-01 00:09:51,2012-01-01 00:09:53
2,2012-01-01 00:10:00,5,2012-01-01 00:10:00,2012-01-01 00:10:06
3,2012-01-01 00:30:00,5,2012-01-01 00:30:04,2012-01-01 00:30:09
2,2012-01-01 00:30:10,2,2012-01-01 00:30:18,2012-01-01 00:30:19
2,2012-01-01 00:30:20,2,2012-01-01 00:30:23,2012-01-01 00:30:24
3,2012-01-01 00:30:20,3,2012-01-01 00:30:25,2012-01-01 00:30:29
3,2012-01-01 00:30:30,3,2012-01-01 00:30:30,2012-01-01 00:30:33

以下是我对输出所做的操作:

import pandas as pd
df = pd.read_csv('df.csv')
df['date'] = pd.to_datetime(df['date'])
df_resampled = df.set_index('date').groupby('id').resample('10s')['value'].sum().reset_index()
df = df_resampled[df_resampled['value']!=0]
print(df.sort_values(['date']))

到目前为止的输出:

id,date,value
1,2012-01-01 00:09:40,3
2,2012-01-01 00:09:40,2
1,2012-01-01 00:09:50,3
2,2012-01-01 00:10:00,5
3,2012-01-01 00:30:00,5
2,2012-01-01 00:30:10,2
2,2012-01-01 00:30:20,2
3,2012-01-01 00:30:20,3
3,2012-01-01 00:30:30,3

如何扩展当前的简单代码以包含每个 id 的 10 秒采样的开始和结束时间。

最佳答案

尝试:

df["date"] = pd.to_datetime(df["date"])
df["date2"] = df["date"]

x = (
    df.groupby(["id", pd.Grouper(freq="10s", key="date")])
    .agg({"value": ["sum"], "date2": ["first", "last"]})
    .reset_index()
)
x.columns = x.columns.map(" ".join).str.strip()
x = x.rename(
    columns={
        "value sum": "value",
        "date2 first": "start-time",
        "date2 last": "end-time",
    }
).sort_values(by="date")
print(x)

打印:

   id                date  value          start-time            end-time
0   1 2012-01-01 00:09:40      3 2012-01-01 00:09:45 2012-01-01 00:09:47
2   2 2012-01-01 00:09:40      2 2012-01-01 00:09:47 2012-01-01 00:09:48
1   1 2012-01-01 00:09:50      3 2012-01-01 00:09:51 2012-01-01 00:09:53
3   2 2012-01-01 00:10:00      5 2012-01-01 00:10:00 2012-01-01 00:10:06
6   3 2012-01-01 00:30:00      5 2012-01-01 00:30:04 2012-01-01 00:30:09
4   2 2012-01-01 00:30:10      2 2012-01-01 00:30:18 2012-01-01 00:30:19
5   2 2012-01-01 00:30:20      2 2012-01-01 00:30:23 2012-01-01 00:30:24
7   3 2012-01-01 00:30:20      3 2012-01-01 00:30:25 2012-01-01 00:30:29
8   3 2012-01-01 00:30:30      3 2012-01-01 00:30:30 2012-01-01 00:30:33

关于python - 重新采样 pandas 数据帧并返回开始时间和结束时间,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68262497/

相关文章:

python - BS4 某些行不返回任何内容

python - 通过外部连接将 Pandas 数据框与列表合并

python - 如何对 Pandas Dataframe 中的值进行 COUNTIFS 并将结果添加到列中?

python - 在 django urls.py 中访问请求对象

python - python 导入顺序如何影响名称?

python-2.7 - Pandas 将对象列转换为 str - 列包含 unicode、float 等

python - 在 Pandas 中,根据同一 DataFrame 中的值对匹配多个条件的行进行计数,并将计数添加到列中

python - 合并 pandas 数据帧时出现数据类型错误

python - 获取DataFrame中特定日期范围内的最小值和最大值

python - 从数据集中删除多行