我有 df 例如:
Hour
12:00pm
12:00am
3:00pm
2:00pm
11:00pm
Continued....
我想创建一个新列,根据以下条件提供该时间段 如果在上午 6:00 至上午 11:59 之间,则默认;如果在下午 12:00 至下午 3:59 之间,则及时;如果在下午 4:00 至晚上 11:59 之间,则迟到;如果在上午 12:00 至凌晨 5:59 之间,则无效。
想要使用类似下面的代码:
def func(row):
if row['Hour'] >= 06:00am & < 12:00pm:
return 'defualt'
elif row['Hour'] >= 12:00pm & < 04:00pm:
return 'timley'
elif row['Hour'] >= 04:00pm & < 12:00am:
return 'late'
elif row['Hour'] >= 12:00am & < 06:00am:
return 'nonvalid'
else:
return 'other'
df['Segment'] = df.apply(func, axis=1)
但是小时列不是日期时间,因此不确定它是否会读取我的函数中的范围。
Expected output:
Hour Segment
12:00pm timley
12:00am nonvalid
3:00pm timley
2:00pm timley
11:00pm late
最佳答案
我认为这里有必要转换 bin 和列值并传递给 cut
:
dates = pd.to_datetime(df['Hour'], format='%I:%M%p')
b = pd.to_datetime(['12:00am','06:00am','12:00pm','04:00pm', '11:59pm'], format='%I:%M%p')
l = ['nonvalid','Default', 'timely','late']
df['new'] = pd.cut(dates, bins=b, labels=l, right=False)
print (df)
Hour new
0 12:00pm timely
1 12:00am nonvalid
2 3:00pm timely
3 2:00pm timely
4 11:00pm late
使用更多日期进行测试:
df = pd.DataFrame({'Hour': pd.date_range('2020-01-01', periods=24, freq='H')})
df['Hour'] = df['Hour'].dt.strftime('%I:%M%p')
#print (df)
dates = pd.to_datetime(df['Hour'], format='%I:%M%p')
b = pd.to_datetime(['12:00am','06:00am','12:00pm','04:00pm', '11:59pm'], format='%I:%M%p')
l = ['nonvalid','Default', 'timely','late']
df['new'] = pd.cut(dates, bins=b, labels=l, right=False)
print (df)
Hour new
0 12:00AM nonvalid
1 01:00AM nonvalid
2 02:00AM nonvalid
3 03:00AM nonvalid
4 04:00AM nonvalid
5 05:00AM nonvalid
6 06:00AM Default
7 07:00AM Default
8 08:00AM Default
9 09:00AM Default
10 10:00AM Default
11 11:00AM Default
12 12:00PM timely
13 01:00PM timely
14 02:00PM timely
15 03:00PM timely
16 04:00PM late
17 05:00PM late
18 06:00PM late
19 07:00PM late
20 08:00PM late
21 09:00PM late
22 10:00PM late
23 11:00PM late
关于python - 如果值落在某个范围内,则根据另一列的条件创建新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60000726/