我有以下数据框
Sleep Stage Time[hh:mm:ss] Event Duration[s]
0 SLEEP-S0 23:27:14 SLEEP-S0 30
1 SLEEP-S0 23:27:44 SLEEP-S0 30
2 SLEEP-MT 23:28:14 SLEEP-MT 30
3 SLEEP-S0 23:28:44 SLEEP-S0 30
4 SLEEP-S0 23:29:14 SLEEP-S0 30
... ... ... ... ...
1022 SLEEP-REM 07:57:14 SLEEP-REM 30
1023 SLEEP-REM 07:57:44 SLEEP-REM 30
1024 SLEEP-S2 07:58:14 SLEEP-S2 30
1025 SLEEP-S2 07:58:44 SLEEP-S2 30
1026 SLEEP-S2 07:59:14 SLEEP-S2 30
我想创建包含阶段、该阶段的开始时间和该阶段的结束时间的段。请注意,这些阶段可能会重复。例如,我想要这样的东西:
STAGE START_POINT END_POINT DURATION
SLEEP-S0 23:27:14 23:28:14 60
SLEEP-MT 23:28:14 23:28:44 30
SLEEP-S0 ... ... ...
有人可以帮我吗?
最佳答案
您的问题是岛屿与缺口问题的变体。每次Stage
发生变化时,它都会创建一个新的岛屿。一座岛的结束就是下一座岛的开始。
# Every time the Stage changes from the previous row, it creates a new island
islands = df["Stage"].ne(df["Stage"].shift()).rename("island").cumsum()
# Convert the Time column to time for easier calculation
time = pd.to_datetime(df["Time"])
# If left alone "00:01:00" will convert to 12:01 AM *today* but it's most likely
# mean 12:01 AM *the next day*. We need some special handling for time in the
# wee hours.
time = time.mask(time < pd.Timestamp.now().replace(hour=12, minute=0, second=0), time + pd.Timedelta(hours=24))
# Each island has the same Stage across all rows
result = time.groupby([islands, df["Stage"]]).agg(START_POINT="min")
# END_POINT is START_POINT of the next stage
result["END_POINT"] = result["START_POINT"].shift(-1)
# DURATION is measured in seconds
result["DURATION"] = (result["END_POINT"] - result["START_POINT"]) / pd.Timedelta(seconds=1)
# Format the START_POINT and END_POINT back to strings
for col in ["START_POINT", "END_POINT"]:
result[col] = result[col].dt.strftime("%H:%M:%S")
关于python - 从 pandas 数据帧和相应的时间戳中提取阶段/段,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74083683/