python - 如果值落在某个范围内,则根据另一列的条件创建新列

标签 python python-3.x pandas

我有 df 例如:

Hour 
12:00pm
12:00am
3:00pm
2:00pm
11:00pm
Continued....

我想创建一个新列,根据以下条件提供该时间段 如果在上午 6:00 至上午 11:59 之间,则默认;如果在下午 12:00 至下午 3:59 之间,则及时;如果在下午 4:00 至晚上 11:59 之间,则迟到;如果在上午 12:00 至凌晨 5:59 之间,则无效。

想要使用类似下面的代码:

def func(row):
    if row['Hour'] >= 06:00am & < 12:00pm:
        return 'defualt'
    elif row['Hour'] >= 12:00pm & < 04:00pm:
        return 'timley' 
    elif row['Hour'] >= 04:00pm & < 12:00am:
        return 'late' 
    elif row['Hour'] >= 12:00am & < 06:00am:
        return 'nonvalid' 
    else:
        return 'other'

df['Segment'] = df.apply(func, axis=1)

但是小时列不是日期时间,因此不确定它是否会读取我的函数中的范围。

Expected output:

     Hour   Segment
    12:00pm timley
    12:00am nonvalid
    3:00pm  timley
    2:00pm  timley
    11:00pm late

最佳答案

我认为这里有必要转换 bin 和列值并传递给 cut :

dates = pd.to_datetime(df['Hour'], format='%I:%M%p')
b = pd.to_datetime(['12:00am','06:00am','12:00pm','04:00pm', '11:59pm'], format='%I:%M%p')
l = ['nonvalid','Default', 'timely','late']
df['new'] = pd.cut(dates, bins=b, labels=l, right=False)
print (df)
      Hour       new
0  12:00pm    timely
1  12:00am  nonvalid
2   3:00pm    timely
3   2:00pm    timely
4  11:00pm      late

使用更多日期进行测试:

df = pd.DataFrame({'Hour': pd.date_range('2020-01-01', periods=24, freq='H')})
df['Hour'] = df['Hour'].dt.strftime('%I:%M%p')
#print (df)

dates = pd.to_datetime(df['Hour'], format='%I:%M%p')
b = pd.to_datetime(['12:00am','06:00am','12:00pm','04:00pm', '11:59pm'], format='%I:%M%p')
l = ['nonvalid','Default', 'timely','late']
df['new'] = pd.cut(dates, bins=b, labels=l, right=False)
print (df)
       Hour       new
0   12:00AM  nonvalid
1   01:00AM  nonvalid
2   02:00AM  nonvalid
3   03:00AM  nonvalid
4   04:00AM  nonvalid
5   05:00AM  nonvalid
6   06:00AM   Default
7   07:00AM   Default
8   08:00AM   Default
9   09:00AM   Default
10  10:00AM   Default
11  11:00AM   Default
12  12:00PM    timely
13  01:00PM    timely
14  02:00PM    timely
15  03:00PM    timely
16  04:00PM      late
17  05:00PM      late
18  06:00PM      late
19  07:00PM      late
20  08:00PM      late
21  09:00PM      late
22  10:00PM      late
23  11:00PM      late

关于python - 如果值落在某个范围内,则根据另一列的条件创建新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60000726/

相关文章:

python - 如何在python中排除BMP中不存在的字符?

python - 为什么 int(0.9) 四舍五入为 0?

python - 如何使用 Scikit-Learn 在 Python 中实现斐波那契数列?

python - 每 n 秒运行一次特定代码

python - 如何使用表单数据和文件显示 aiohttp POST 的进度

python - 需要删除 for 循环来矢量化代码并运行得更快

list - 如何从两个列表中删除与单独列表的重复值相对应的非最大值索引?

python - Pandas 数据框 - 选择行并清除内存?

python - 将日期解析为从 csv 到 pandas 的字符串

python - 根据指定的索引对数据框进行排序