如何在对数据帧进行上采样时拆分持续时间,如下例所示。 我可以用例如替换 for 循环吗? group_by 函数?
我想用 pandas 像这样转换数据:
activity name time started time ended
0 Bedtime 2021-10-25 00:00:00 2021-10-25 08:25:42
1 videos 2021-10-25 08:25:42 2021-10-25 08:51:54
2 Commute 2021-10-25 08:51:54 2021-10-25 09:29:34
进入这个:
time started Bedtime videos Commute
2021-10-25 00:00:00 0 days 01:00:00 0 days 00:00:00 0 days
2021-10-25 01:00:00 0 days 01:00:00 0 days 00:00:00 0 days
2021-10-25 02:00:00 0 days 01:00:00 0 days 00:00:00 0 days
2021-10-25 03:00:00 0 days 01:00:00 0 days 00:00:00 0 days
2021-10-25 04:00:00 0 days 01:00:00 0 days 00:00:00 0 days
2021-10-25 05:00:00 0 days 01:00:00 0 days 00:00:00 0 days
2021-10-25 06:00:00 0 days 01:00:00 0 days 00:00:00 0 days
2021-10-25 07:00:00 0 days 01:00:00 0 days 00:00:00 0 days
2021-10-25 08:00:00 0 days 00:25:42 0 days 00:26:12 0 days 00:08:06
...
我走到这一步了:
import pandas as pd
df=pd.DataFrame({'activity name':['Bedtime','videos','Commute'],'time started':["2021-10-25 00:00:00","2021-10-25 08:25:42","2021-10-25 08:51:54"],'time ended':["2021-10-25 08:25:42","2021-10-25 08:51:54","2021-10-25 09:29:34"]})
# converting strings to datetime
df['time ended']=pd.to_datetime(df['time ended'])
df['time started']=pd.to_datetime(df['time started'])
# calclating the duration
df['duration']=df['time ended']-df['time started']
# changeing index
df.index=df['time started']
df=df.drop(columns=['time started','time ended'])
for a in df['activity name'].unique():
df[a]=(df['activity name']==a)*df['duration']
df=df.drop(columns=['activity name','duration'])
df.resample('H').first()
time started
2021-10-25 00:00:00 0 days 08:25:42 0 days 00:00:00 0 days
2021-10-25 01:00:00 NaT NaT NaT
2021-10-25 02:00:00 NaT NaT NaT
2021-10-25 03:00:00 NaT NaT NaT
2021-10-25 04:00:00 NaT NaT NaT
2021-10-25 05:00:00 NaT NaT NaT
2021-10-25 06:00:00 NaT NaT NaT
2021-10-25 07:00:00 NaT NaT NaT
2021-10-25 08:00:00 0 days 00:00:00 0 days 00:26:12 0 days
最佳答案
试试这个:
import pandas as pd
from io import StringIO
txtfile = StringIO(
""" activity name time started time ended
0 Bedtime 2021-10-25 00:00:00 2021-10-25 08:25:42
1 videos 2021-10-25 08:25:42 2021-10-25 08:51:54
2 Commute 2021-10-25 08:51:54 2021-10-25 09:29:34"""
)
df = pd.read_csv(txtfile, sep="\s\s+", engine="python")
df[["time started", "time ended"]] = df[["time started", "time ended"]].apply(
pd.to_datetime
)
df_e = df.assign(
date=[
pd.date_range(s, e, freq="s")
for s, e in zip(df["time started"], df["time ended"])
]
).explode("date")
df_out = (
df_e.groupby(["activity name", pd.Grouper(key="date", freq="H")])["activity name"]
.count()
.unstack(0)
.apply(pd.to_timedelta, unit="s")
)
print(df_out)
输出:
activity name Bedtime Commute videos
date
2021-10-25 00:00:00 0 days 01:00:00 NaT NaT
2021-10-25 01:00:00 0 days 01:00:00 NaT NaT
2021-10-25 02:00:00 0 days 01:00:00 NaT NaT
2021-10-25 03:00:00 0 days 01:00:00 NaT NaT
2021-10-25 04:00:00 0 days 01:00:00 NaT NaT
2021-10-25 05:00:00 0 days 01:00:00 NaT NaT
2021-10-25 06:00:00 0 days 01:00:00 NaT NaT
2021-10-25 07:00:00 0 days 01:00:00 NaT NaT
2021-10-25 08:00:00 0 days 00:25:43 0 days 00:08:06 0 days 00:26:13
2021-10-25 09:00:00 NaT 0 days 00:29:35 NaT
地址@DerekO评论:
import pandas as pd
from io import StringIO
txtfile = StringIO(
""" activity name time started time ended
0 Bedtime 2021-10-25 00:00:00 2021-10-25 08:25:42
1 videos 2021-10-25 08:25:42 2021-10-25 08:51:54
2 Commute 2021-10-25 08:51:54 2021-10-25 09:29:34
3 Bedtime 2021-10-25 11:00:00 2021-10-25 13:04:31"""
)
df = pd.read_csv(txtfile, sep="\s\s+", engine="python")
df[["time started", "time ended"]] = df[["time started", "time ended"]].apply(
pd.to_datetime
)
df_e = df.assign(
date=[
pd.date_range(s, e, freq="s")
for s, e in zip(df["time started"], df["time ended"])
]
).explode("date")
df_out = (
df_e.groupby(["activity name", pd.Grouper(key="date", freq="H")])["activity name"]
.count()
.unstack(0)
.apply(pd.to_timedelta, unit="s")
.sort_index()
)
print(df_out)
输出:
activity name Bedtime Commute videos
date
2021-10-25 00:00:00 0 days 01:00:00 NaT NaT
2021-10-25 01:00:00 0 days 01:00:00 NaT NaT
2021-10-25 02:00:00 0 days 01:00:00 NaT NaT
2021-10-25 03:00:00 0 days 01:00:00 NaT NaT
2021-10-25 04:00:00 0 days 01:00:00 NaT NaT
2021-10-25 05:00:00 0 days 01:00:00 NaT NaT
2021-10-25 06:00:00 0 days 01:00:00 NaT NaT
2021-10-25 07:00:00 0 days 01:00:00 NaT NaT
2021-10-25 08:00:00 0 days 00:25:43 0 days 00:08:06 0 days 00:26:13
2021-10-25 09:00:00 NaT 0 days 00:29:35 NaT
2021-10-25 11:00:00 0 days 01:00:00 NaT NaT
2021-10-25 12:00:00 0 days 01:00:00 NaT NaT
2021-10-25 13:00:00 0 days 00:04:32 NaT NaT
关于python - 在上采样数据帧时拆分持续时间,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70319624/