python - 如何用 pandas 中的特定值填充缺失值(日期和时间)

标签 python python-3.x python-2.7 pandas datetime

我正在尝试使用 pandas 填充缺失值,但无法获得输出。

输入数据:此处缺少一些行值。

             Date_time  current_demand   Temp_Mean  humidity_Mean
0     2018-05-01 00:00         15951.0  300.904267      49.600000
1     2018-05-01 00:15         16075.0  300.904267      49.600000
2     2018-05-01 00:30         15977.0  300.904267      49.600000
3     2018-05-01 00:45         15945.0  300.837600      50.333333
4     2018-05-01 01:00         15868.0  298.889333      59.133333
5     2018-05-01 01:15         15583.0  298.889333      59.133333
6     2018-05-01 01:30         15470.0  298.756000      59.800000
7     2018-05-01 01:45         15301.0  298.756000      59.800000
8     2018-05-01 02:15         14946.0  298.756000      59.800000
9     2018-05-01 02:30         14736.0  298.756000      59.800000
10    2018-05-01 02:45         14630.0  298.502333      59.000000
11    2018-05-01 03:15         14350.0  298.502333      59.000000

我尝试过的脚本:

import pandas as pd
import numpy as np


df = pd.read_csv(r'submission.csv', index_col=[1], parse_dates=[1], dayfirst=True)

df['Date_time'] = pd.to_datetime(df['Date_time']).dt.time
start = pd.to_datetime(str(df['Date_time'].min()))
end = pd.to_datetime(str(df['Date_time'].max()))
dates = pd.date_range(start=start, end=end, freq='15Min').time


df1 = pd.pivot_table(df, "current_demand", "Temp_Mean", "humidity_Mean").stack(dropna=False).reset_index(name="current_demand")
df1.loc[df1['current_demand'].isnull(), "Temp_Mean", "Temp_Mean" , "humidity_Mean"] = np.nan

经验。输出:

                Date_time  current_demand   Temp_Mean  humidity_Mean
0     2018-05-01 00:00         15951.0  300.904267      49.600000
1     2018-05-01 00:15         16075.0  300.904267      49.600000
2     2018-05-01 00:30         15977.0  300.904267      49.600000
3     2018-05-01 00:45         15945.0  300.837600      50.333333
4     2018-05-01 01:00         15868.0  298.889333      59.133333
5     2018-05-01 01:15         15583.0  298.889333      59.133333
6     2018-05-01 01:30         15470.0  298.756000      59.800000
7     2018-05-01 01:45         15301.0  298.756000      59.800000
8     2018-05-01 02:00         0        0                 0
9     2018-05-01 02:15         14946.0  298.756000      59.800000
10    2018-05-01 02:30         14736.0  298.756000      59.800000
11    2018-05-01 02:45         14630.0  298.502333      59.000000
12    2018-05-01 03:00         0        0               0
13    2018-05-01 03:15         14350.0  298.502333      59.000000

但是在0的地方——用昨天的数据填充()表示前一天的数据或者之前的数据)

请提出建议。预先感谢您

编辑

df = df.set_index(['Date_time']).asfreq('15T').ffill()
#df = df.set_index('Date_time').resample('15T').ffill() #as same 
#df = df.asfreq('15T').ffill()

df = df.asfreq('15T').fillna(df.shift(1, freq='d'))

为什么我得到 NaN ?请告诉我

                     current_demand  Temp_Mean  humidity_Mean
Date_time                                                    
2018-05-01 00:00:00             NaN        NaN            NaN
2018-05-01 00:15:00             NaN        NaN            NaN
2018-05-01 00:30:00             NaN        NaN            NaN
2018-05-01 00:45:00             NaN        NaN            NaN
2018-05-01 01:00:00             NaN        NaN            NaN

最佳答案

使用asfreqresample向前填充:

df = pd.read_csv(r'submission.csv', index_col=[1], parse_dates=[1], dayfirst=True)

df = df.asfreq('15T').ffill()
df = df.resample('15T').ffill()

print (df)

                     current_demand   Temp_Mean  humidity_Mean
Date_time                                                     
2018-05-01 00:00:00         15951.0  300.904267      49.600000
2018-05-01 00:15:00         16075.0  300.904267      49.600000
2018-05-01 00:30:00         15977.0  300.904267      49.600000
2018-05-01 00:45:00         15945.0  300.837600      50.333333
2018-05-01 01:00:00         15868.0  298.889333      59.133333
2018-05-01 01:15:00         15583.0  298.889333      59.133333
2018-05-01 01:30:00         15470.0  298.756000      59.800000
2018-05-01 01:45:00         15301.0  298.756000      59.800000
2018-05-01 02:00:00         15301.0  298.756000      59.800000
2018-05-01 02:15:00         14946.0  298.756000      59.800000
2018-05-01 02:30:00         14736.0  298.756000      59.800000
2018-05-01 02:45:00         14630.0  298.502333      59.000000
2018-05-01 03:00:00         14630.0  298.502333      59.000000
2018-05-01 03:15:00         14350.0  298.502333      59.000000

如果您想用前几天的时间替换NaN,解决方案是 fillnashift编辑数据帧:

df = df.asfreq('15T').fillna(df.shift(1, freq='d'))

关于python - 如何用 pandas 中的特定值填充缺失值(日期和时间),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50621498/

相关文章:

Eclipse -> 运行按钮 -> 快捷方式

python - 根据条件将多个列值设置为 NaN

python - 在模块和/或包中组织 Python 类

python - 使用 keras 进行情感分析,包括中性推文

python - 根据 ID 将 numpy 行转换为列

python - 如何将字符串转换为十进制数以在 Python 中进行算术运算?

python - 无法安装 Python 库

python - 在 appengine 中通过代码处理 404 抛出

python - 从ppt幻灯片中提取所有标题(标题)

python - 如何在sklearn grid search中使用log loss