我有一个相当大的数据集,没什么疯狂的,但是当我运行下面的代码以转换为日期时间时,单元格似乎运行了很长时间的任何原因?有什么方法可以提高代码性能吗?
代码:
df["created_at"] = pd.to_datetime(df["timestamp"]).dt.strftime('%Y-%m-%d %H:%M:%S')
timstamp 列值最初如下所示
Wed Nov 22 08:31:24 +0000 2017
谢谢
最佳答案
是的,如果您没有明确指定格式,Panda 的 to_datetime
方法是出了名的慢。这是docs对于该方法,这里是 standard了解如何设置格式。
我不知道你的时间数据是如何设置的,但这应该会让你走上正确的道路:
import pandas as pd
# Weekday as abbreviated name "%a"
df = pd.DataFrame(["Wed"], columns = ["timestamp"])
df["created_at"] = pd.to_datetime(df["timestamp"], format="%a")
print(df)
# Month as abbreviated "%b"
df = pd.DataFrame(["Wed Nov"], columns = ["timestamp"])
df["created_at"] = pd.to_datetime(df["timestamp"], format="%a %b")
print(df)
# Day with zero-padded decimal "%d"
df = pd.DataFrame(["Wed Nov 22"], columns = ["timestamp"])
df["created_at"] = pd.to_datetime(df["timestamp"], format="%a %b %d")
print(df)
# Time as hour:minute:second "%H:%M:%S"
df = pd.DataFrame(["Wed Nov 22 08:31:24"], columns = ["timestamp"])
df["created_at"] = pd.to_datetime(df["timestamp"], format="%a %b %d %H:%M:%S")
print(df)
# UTC offset (%z)
df = pd.DataFrame(["Wed Nov 22 08:31:24 +0000"], columns = ["timestamp"])
df["created_at"] = pd.to_datetime(df["timestamp"], format="%a %b %d %H:%M:%S %z")
print(df)
# Year is "%Y"
df = pd.DataFrame(["Wed Nov 22 08:31:24 +0000 2017"], columns = ["timestamp"])
df["created_at"] = pd.to_datetime(df["timestamp"], format="%a %b %d %H:%M:%S %z %Y")
print(df)
关于python - Pandas 到日期时间的转换运行缓慢/不运行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64020879/