我有两个不同的数据框。两者都包含时间戳和相应的值。我的目标是子集化或提供 bool 索引,其中一个 df 的时间落在第二个 df 的两个时间点内。
df
包含名称和开始/结束时间。我想将此信息用于 df2
。因此,如果存在相同的名称 (John
),如果 Time
落在 Start Time
和 End 范围内,我想提供一个 bool 索引时间
。
df = pd.DataFrame({
'Start Time' : ['2010-03-20 09:27:00','2010-03-20 10:15:00','2010-03-20 11:10:38','2010-03-20 11:32:15','2010-03-20 11:45:38'],
'End Time' : ['2010-03-20 09:40:00','2010-03-20 10:32:15','2010-03-20 11:35:38','2010-03-20 11:38:15','2010-03-20 11:50:38'],
"Name":['John', 'Brian', 'Suni', 'Gary', 'Li'],
"Occ":[1, 2, 3, 4, 5],
})
df2 = pd.DataFrame({
'Time' : ['2010-03-20 09:27:28','2010-03-20 09:29:15','2010-03-20 09:30:38','2010-03-20 09:32:15','2010-03-20 09:38:38',
'2010-03-20 10:15:08','2010-03-20 10:16:36','2010-03-20 10:30:12','2010-03-20 10:31:08','2010-03-20 10:32:48'],
'Name':['John', 'John', 'John', 'John', 'John',
'John', 'John', 'Li', 'Li', 'Li'],
})
df['Start Time'] = pd.to_datetime(df['Start Time'])
df['End Time'] = pd.to_datetime(df['End Time'])
df2['Time'] = pd.to_datetime(df2['Time'])
mask1 = (df2['Time'] > df['Start Time']) & (df2['Time'] < df['End Time'])
raise ValueError("Can only compare identically-labeled Series objects")
ValueError: Can only compare identically-labeled Series objects
I/O:使用掩码查找满足条件和子集 df2 的行。
True
True
True
True
True
False
False
False
False
False
Time Name
0 2010-03-20 09:27:28 John
1 2010-03-20 09:29:15 John
2 2010-03-20 09:30:38 John
3 2010-03-20 09:32:15 John
4 2010-03-20 09:38:38 John
编辑2:
在满足相同条件的情况下,是否可以将 df
中的值分配给 df2
?
例如,df 是相同的,除了一个额外的 col:
df = pd.DataFrame({
'Start Time' : ['2010-03-20 09:27:00','2010-03-20 10:15:00','2010-03-20 11:10:38','2010-03-20 11:32:15','2010-03-20 11:45:38'],
'End Time' : ['2010-03-20 09:40:00','2010-03-20 10:32:15','2010-03-20 11:35:38','2010-03-20 11:38:15','2010-03-20 11:50:38'],
"Name":['John', 'Brian', 'Suni', 'John', 'Li'],
"Occ":[1, 2, 3, 4, 5],
})
df2 = pd.DataFrame({
'Time' : ['2010-03-20 09:27:28','2010-03-20 09:29:15','2010-03-20 09:30:38','2010-03-20 09:32:15','2010-03-20 09:38:38',
'2010-03-20 11:11:08','2010-03-20 11:16:36','2010-03-20 11:30:12','2010-03-20 11:31:08','2010-03-20 11:32:48',
],
'Name':['John', 'John', 'John', 'John', 'John',
'Suni', 'Suni', 'Li', 'John', 'John',
],
"desc":[6, 6, 6, np.nan, np.nan,
89, 89, np.nan, 2, 2
],
})
我们可以使用相同的函数,但将相关行的 Occ
传递给 df2
吗?
输入/输出:
Time Name desc msk Occ
0 2010-03-20 09:27:28 John 6.0 True 1.0
1 2010-03-20 09:29:15 John 6.0 True 1.0
2 2010-03-20 09:30:38 John 6.0 True 1.0
3 2010-03-20 09:32:15 John NaN True 1.0
4 2010-03-20 09:38:38 John NaN True 1.0
5 2010-03-20 11:11:08 Suni 89.0 True 3.0
6 2010-03-20 11:16:36 Suni 89.0 True 3.0
7 2010-03-20 11:30:12 Li NaN False NaN
8 2010-03-20 11:31:08 John 2.0 False NaN
9 2010-03-20 11:32:48 John 2.0 True 4.0
最佳答案
您可以merge
您的数据框作为第一种方法。您需要reset_index
保留 df2 索引:
idx = (df2.reset_index().merge(df, on='Name')
.loc[lambda x:x['Time'].between(x['Start Time'], x['End Time']), 'index'])
msk = df2.index.isin(idx)
你必须重新
输出:
>>> msk
array([ True, True, True, True, True, False, False, False, False,
False])
>>> pd.Series(msk, index=df2.index)
0 True
1 True
2 True
3 True
4 True
5 False
6 False
7 False
8 False
9 False
dtype: bool
也许merge_asof
可以更好吗?
idx = (pd.merge_asof(df2.reset_index().sort_values('Time'),
df.sort_values('Start Time'),
left_on='Time', right_on='Start Time', by='Name')
.loc[lambda x:x['Time'].between(x['Start Time'], x['End Time']), 'index'])
msk = df2.index.isin(idx)
编辑2
你可以这样做:
cx = (df2.reset_index().merge(df, on='Name')
.loc[lambda x:x['Time'].between(x['Start Time'], x['End Time'])]
.set_index('index').rename_axis(None))
df2['mask'] = df2.index.isin(cx.index)
df2['Occ'] = cx['Occ']
输出:
>>> df2
Time Name desc mask Occ
0 2010-03-20 09:27:28 John 6.0 True 1.0
1 2010-03-20 09:29:15 John 6.0 True 1.0
2 2010-03-20 09:30:38 John 6.0 True 1.0
3 2010-03-20 09:32:15 John NaN True 1.0
4 2010-03-20 09:38:38 John NaN True 1.0
5 2010-03-20 11:11:08 Suni 89.0 True 3.0
6 2010-03-20 11:16:36 Suni 89.0 True 3.0
7 2010-03-20 11:30:12 Li NaN False NaN
8 2010-03-20 11:31:08 John 2.0 False NaN
9 2010-03-20 11:32:48 John 2.0 True 4.0
关于python - bool 掩码,如果 df 的时间戳与第二个 df 的两个时间点 - python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/77356307/