python - 获取时间点,并针对日期时间对象制作标签以关联点周围的事物

标签 python pandas

我正在尝试使用我服药的通常时间(因此 + 4 小时以上)并在数据框中填写一个标签,为 2,1 或 0,用于我服用这种药物的时间,或服药后一小时为 2 小时,因为刚停药。
作为数据框的示例,我也尝试添加此列,

<bound method NDFrame.to_clipboard of                           id  sentiment  magnitude  angry  disgusted  fearful  \
created                                                                         
2020-05-21 12:00:00     23.0  -0.033333        0.5    NaN        NaN      NaN   
2020-05-21 12:15:00      NaN        NaN        NaN    NaN        NaN      NaN   
2020-05-21 12:30:00      NaN        NaN        NaN    NaN        NaN      NaN   
2020-05-21 12:45:00      NaN        NaN        NaN    NaN        NaN      NaN   
2020-05-21 13:00:00      NaN        NaN        NaN    NaN        NaN      NaN   
...                      ...        ...        ...    ...        ...      ...   
2021-04-20 00:45:00      NaN        NaN        NaN    NaN        NaN      NaN   
2021-04-20 01:00:00      NaN        NaN        NaN    NaN        NaN      NaN   
2021-04-20 01:15:00      NaN        NaN        NaN    NaN        NaN      NaN   
2021-04-20 01:30:00      NaN        NaN        NaN    NaN        NaN      NaN   
2021-04-20 01:45:00  46022.0  -1.000000        1.0    NaN        NaN      NaN   

                     happy  neutral  sad  surprised  
created                                              
2020-05-21 12:00:00    NaN      NaN  NaN        NaN  
2020-05-21 12:15:00    NaN      NaN  NaN        NaN  
2020-05-21 12:30:00    NaN      NaN  NaN        NaN  
2020-05-21 12:45:00    NaN      NaN  NaN        NaN  
2020-05-21 13:00:00    NaN      NaN  NaN        NaN  
...                    ...      ...  ...        ...  
2021-04-20 00:45:00    NaN      NaN  NaN        NaN  
2021-04-20 01:00:00    NaN      NaN  NaN        NaN  
2021-04-20 01:15:00    NaN      NaN  NaN        NaN  
2021-04-20 01:30:00    NaN      NaN  NaN        NaN  
2021-04-20 01:45:00    NaN      NaN  NaN        NaN  

[32024 rows x 10 columns]>
以及我通常服药时的时间戳数据,
['09:00 AM', '12:00 PM', '03:00 PM']
我将如何使用这些时间戳来获取此类列信息?
更新
因此,尝试基于这个问题,我将如何确保它只针对有可用数据的地方添加药物,并确保正确应用一小时的用药后时间!
谢谢

最佳答案

使用 np.select() 为给定条件选择合适的标签。
第一 dropna()如果 created 之后的所有值为空( subset=df.columns[1:] )。您可以更改 subset取决于您的需要(例如, subset=['id'] 如果行应该被删除只是因为有一个 null id )。
然后生成datetime基于 duration 的服用、主动和服药后期间的数组的药物。检查是否created时间匹配 active 中的任何时间(标签 1)或 after (标签 2),否则默认为 0。

# drop rows that are empty except for column 0 (i.e., except for df.created)
df.dropna(subset=df.columns[1:], inplace=True)

# convert times to datetime
df.created = pd.to_datetime(df.created)
taken = pd.to_datetime(['09:00:00', '12:00:00', '15:00:00'])

# generate time arrays
duration = 2 # hours
active = np.array([(taken + pd.Timedelta(f'{h}H')).time for h in range(duration)]).ravel()
after = (taken + pd.Timedelta(f'{duration}H')).time

# define boolean masks by label
conditions = {
    1: df.created.dt.floor('H').dt.time.isin(active),
    2: df.created.dt.floor('H').dt.time.isin(after),
}

# create medication column with np.select()
df['medication'] = np.select(conditions.values(), conditions.keys(), default=0)
这是带有一些稍微修改的数据的输出,可以更好地展示 active/after/nan场景:
               created       id  sentiment  magnitude  medication
0  2020-05-21 12:00:00     23.0  -0.033333        0.5           1
3  2020-05-21 12:45:00     39.0  -0.500000        0.5           1
4  2020-05-21 13:00:00     90.0  -0.500000        0.5           1
5  2020-05-21 13:15:00    100.0  -0.033333        0.1           1
9  2020-05-21 14:15:00   1000.0   0.033333        0.5           2
10 2020-05-21 14:30:00      3.0   0.001000        1.0           2
17 2021-04-20 01:00:00  46022.0  -1.000000        1.0           0
20 2021-04-20 01:45:00  46022.0  -1.000000        1.0           0

关于python - 获取时间点,并针对日期时间对象制作标签以关联点周围的事物,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67252795/

相关文章:

python - Tk TreeView 列排序

Python:如何导入 Excel 单元格的显示(格式化)值而不是实际值

python - 从 pandas Dataframe 中提取在特定列中具有特定值的所有行

python - 如何预处理具有太多 NAN 值的列?

Python 3.5 : How to read a db of JSON objects

python 3.x 从pickle恢复变量

python - Pandas 选择具有相同时间的行

Python:在哪里编译 RE?

python - 内存使用过多 xarray `to_dataframe()`

python - Sklearn-Pandas DataFrameMapper : mapper. fit_transform 给出 ValueError : bad input shape (8, 2)