我的数据框 df 如下所示:
Code DateTime Reading
801 2011-01-15 08:30:00 0.0
801 2011-01-15 07:45:00 0.5
801 2011-01-16 06:30:00 5.0
801 2011-02-05 05:30:00 0.0
801 2011-02-08 00:45:00 10.0
2011 年全年依此类推。这没有特定的时间间隔。因此我想固定15分钟的时间间隔,并获得从2011-01-01 00:00:00开始到2011-12-31 23:45:00的连续统一数据,相应的读数应该是'0.0 ' 对于所有新添加的行。必须保留现有的读数。
此外,我想添加 4 列“年”、“月”、“日”、“小时”,这些列必须从“日期时间”列中提取。
我的输出应该如下所示:
Code DateTime Year Month Day Hour Reading
801 2011-01-01 00:00:00 2011 1 1 0 0.0
801 2011-01-01 00:15:00 2011 1 1 0 0.0
801 2011-01-01 00:30:00 2011 1 1 0 0.0
801 2011-01-01 00:45:00 2011 1 1 0 0.0
801 2011-01-01 01:00:00 2011 1 1 1 0.0
.
.
.
801 2011-12-31 23:45:00 2011 12 31 23 0.0
有人可以指导我完成这个吗?
最佳答案
您可以使用dt 访问器
来访问时间戳中的年、月、日和小时。您可以使用 date_range
获取日期范围,并将频率
设置为每 15 分钟一行的 15min
。对于您想要的输出,您可以执行以下操作。
df['DateTime'] = pd.to_datetime(df['DateTime'])
# Create a year month, day and time dataframe
new = pd.DataFrame({"Year": df["DateTime"].dt.year, "Month": df["DateTime"].dt.month,"Day":df["DateTime"].dt.day,"Hour":df["DateTime"].dt.hour})
# Set index to datetime after concatinating both dataframes
df = pd.concat((df,new),axis=1).set_index(df['DateTime'])
#Create a time dataframe
time_df = pd.DataFrame({"DateTime":pd.date_range(start='2011-01-01 00:00:00', end='2011-12-31 23:45:00',freq="15min"),"Code":801,"Reading":0})
#Create a data frame of year, month, day and time
k = pd.DataFrame({"Year": time_df["DateTime"].dt.year, "Month": time_df["DateTime"].dt.month,"Day":time_df["DateTime"].dt.day,"Hour":time_df["DateTime"].dt.hour})
# Set index to datetime after concatinating both dataframes
time_df = pd.concat((time_df,k),axis=1).set_index(time_df['DateTime'])
# Create a new dataframe concatinating previous two dataframes by specifying proper axis
orginal_df = pd.concat((df,time_df),axis=0)
# Remove the duplicates
orginal_df = orginal_df[~orginal_df.index.duplicated(keep='first')]
#Sort the dataframe by time
orginal_df = orginal_df.sort_index()
#Reset the index
orginal_df = orginal_df.reset_index(drop=True)
输出
Code DateTime Reading Day Hour Month Year 0 801 2011-01-01 00:00:00 0.0 1 0 1 2011 1 801 2011-01-01 00:15:00 0.0 1 0 1 2011 2 801 2011-01-01 00:30:00 0.0 1 0 1 2011 3 801 2011-01-01 00:45:00 0.0 1 0 1 2011 4 801 2011-01-01 01:00:00 0.0 1 1 1 2011 5 801 2011-01-01 01:15:00 0.0 1 1 1 2011 6 801 2011-01-01 01:30:00 0.0 1 1 1 2011 . . . 1375 801 2011-01-15 07:45:00 0.5 15 7 1 2011 . . 1378 801 2011-01-15 08:30:00 0.0 15 8 1 2011 . . 35039 801 2011-12-31 23:45:00 0.0 31 23 12 2011
If you want the order you can use
orginal_df[['Code','DateTime','Year','Month','Day','Hour','Reading']]
Code DateTime Year Month Day Hour Reading 0 801 2011-01-01 00:00:00 2011 1 1 0 0.0 1 801 2011-01-01 00:15:00 2011 1 1 0 0.0 2 801 2011-01-01 00:30:00 2011 1 1 0 0.0 3 801 2011-01-01 00:45:00 2011 1 1 0 0.0 4 801 2011-01-01 01:00:00 2011 1 1 1 0.0 5 801 2011-01-01 01:15:00 2011 1 1 1 0.0
关于python - 将日期时间转换为统一的 15 分钟格式,并从日期时间中提取年、月、日、小时列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45051310/