我有一大堆 CSV,其中日期列如下:
Print df
Date
0 20090501 00:00:00.831
1 20090501 00:00:00.832
2 20090501 00:00:01.078
3 20090501 00:00:01.337
4 20090501 00:00:01.580
5 20090501 00:00:01.581
6 20090501 00:00:01.582
7 20090501 00:00:01.602
从这里我想用格式'%Y%m%d %H:%M:%S.%f'
来表达它,因此:
df['Date'] = pd.to_datetime(df['Date'], format='%Y%m%d %H:%M:%S.%f')
print df
Date
2009-05-01 00:00:00.831
1 2009-05-01 00:00:00.832
2 2009-05-01 00:00:01.078
3 2009-05-01 00:00:01.337
4 2009-05-01 00:00:01.580
5 2009-05-01 00:00:01.581
最后,使用以下方法将其拆分为单独的日期和时间列:
df['Time'] = df['Date'].apply(lambda x:x.time())
df['Date1']= df['Date'].apply(lambda x:x.date())
print df
Time Date1
0 00:00:00.831000 2009-05-01
1 00:00:00.832000 2009-05-01
2 00:00:01.078000 2009-05-01
3 00:00:01.337000 2009-05-01
4 00:00:01.580000 2009-05-01
5 00:00:01.581000 2009-05-01
6 00:00:01.582000 2009-05-01
问题是 lambda 函数大约需要一分钟才能完成,我要处理大约 30000 个 CSV,每个 CSV 大约有 200 万行。如果有人能给我一个更快的解决方案,那将大有帮助。
谢谢
最佳答案
df['Time'] = df['Date'].dt.time
df['Date1']= df['Date'].dt.date
print (df)
Date Time Date1
0 2009-05-01 00:00:00.831 00:00:00.831000 2009-05-01
1 2009-05-01 00:00:00.832 00:00:00.832000 2009-05-01
2 2009-05-01 00:00:01.078 00:00:01.078000 2009-05-01
3 2009-05-01 00:00:01.337 00:00:01.337000 2009-05-01
4 2009-05-01 00:00:01.580 00:00:01.580000 2009-05-01
5 2009-05-01 00:00:01.581 00:00:01.581000 2009-05-01
6 2009-05-01 00:00:01.582 00:00:01.582000 2009-05-01
7 2009-05-01 00:00:01.602 00:00:01.602000 2009-05-01
关于python - 在 Pandas 中转换日期时间列的快速方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40874984/