我正在尝试使用 pandas 填充缺失值,但无法获得输出。
输入数据:此处缺少一些行值。
Date_time current_demand Temp_Mean humidity_Mean
0 2018-05-01 00:00 15951.0 300.904267 49.600000
1 2018-05-01 00:15 16075.0 300.904267 49.600000
2 2018-05-01 00:30 15977.0 300.904267 49.600000
3 2018-05-01 00:45 15945.0 300.837600 50.333333
4 2018-05-01 01:00 15868.0 298.889333 59.133333
5 2018-05-01 01:15 15583.0 298.889333 59.133333
6 2018-05-01 01:30 15470.0 298.756000 59.800000
7 2018-05-01 01:45 15301.0 298.756000 59.800000
8 2018-05-01 02:15 14946.0 298.756000 59.800000
9 2018-05-01 02:30 14736.0 298.756000 59.800000
10 2018-05-01 02:45 14630.0 298.502333 59.000000
11 2018-05-01 03:15 14350.0 298.502333 59.000000
我尝试过的脚本:
import pandas as pd
import numpy as np
df = pd.read_csv(r'submission.csv', index_col=[1], parse_dates=[1], dayfirst=True)
df['Date_time'] = pd.to_datetime(df['Date_time']).dt.time
start = pd.to_datetime(str(df['Date_time'].min()))
end = pd.to_datetime(str(df['Date_time'].max()))
dates = pd.date_range(start=start, end=end, freq='15Min').time
df1 = pd.pivot_table(df, "current_demand", "Temp_Mean", "humidity_Mean").stack(dropna=False).reset_index(name="current_demand")
df1.loc[df1['current_demand'].isnull(), "Temp_Mean", "Temp_Mean" , "humidity_Mean"] = np.nan
经验。输出:
Date_time current_demand Temp_Mean humidity_Mean
0 2018-05-01 00:00 15951.0 300.904267 49.600000
1 2018-05-01 00:15 16075.0 300.904267 49.600000
2 2018-05-01 00:30 15977.0 300.904267 49.600000
3 2018-05-01 00:45 15945.0 300.837600 50.333333
4 2018-05-01 01:00 15868.0 298.889333 59.133333
5 2018-05-01 01:15 15583.0 298.889333 59.133333
6 2018-05-01 01:30 15470.0 298.756000 59.800000
7 2018-05-01 01:45 15301.0 298.756000 59.800000
8 2018-05-01 02:00 0 0 0
9 2018-05-01 02:15 14946.0 298.756000 59.800000
10 2018-05-01 02:30 14736.0 298.756000 59.800000
11 2018-05-01 02:45 14630.0 298.502333 59.000000
12 2018-05-01 03:00 0 0 0
13 2018-05-01 03:15 14350.0 298.502333 59.000000
但是在0的地方——用昨天的数据填充()表示前一天的数据或者之前的数据)
请提出建议。预先感谢您
编辑
df = df.set_index(['Date_time']).asfreq('15T').ffill()
#df = df.set_index('Date_time').resample('15T').ffill() #as same
#df = df.asfreq('15T').ffill()
df = df.asfreq('15T').fillna(df.shift(1, freq='d'))
为什么我得到 NaN
?请告诉我
current_demand Temp_Mean humidity_Mean
Date_time
2018-05-01 00:00:00 NaN NaN NaN
2018-05-01 00:15:00 NaN NaN NaN
2018-05-01 00:30:00 NaN NaN NaN
2018-05-01 00:45:00 NaN NaN NaN
2018-05-01 01:00:00 NaN NaN NaN
最佳答案
df = pd.read_csv(r'submission.csv', index_col=[1], parse_dates=[1], dayfirst=True)
df = df.asfreq('15T').ffill()
df = df.resample('15T').ffill()
print (df)
current_demand Temp_Mean humidity_Mean
Date_time
2018-05-01 00:00:00 15951.0 300.904267 49.600000
2018-05-01 00:15:00 16075.0 300.904267 49.600000
2018-05-01 00:30:00 15977.0 300.904267 49.600000
2018-05-01 00:45:00 15945.0 300.837600 50.333333
2018-05-01 01:00:00 15868.0 298.889333 59.133333
2018-05-01 01:15:00 15583.0 298.889333 59.133333
2018-05-01 01:30:00 15470.0 298.756000 59.800000
2018-05-01 01:45:00 15301.0 298.756000 59.800000
2018-05-01 02:00:00 15301.0 298.756000 59.800000
2018-05-01 02:15:00 14946.0 298.756000 59.800000
2018-05-01 02:30:00 14736.0 298.756000 59.800000
2018-05-01 02:45:00 14630.0 298.502333 59.000000
2018-05-01 03:00:00 14630.0 298.502333 59.000000
2018-05-01 03:15:00 14350.0 298.502333 59.000000
如果您想用前几天的时间替换NaN
,解决方案是 fillna
与 shift
编辑数据帧
:
df = df.asfreq('15T').fillna(df.shift(1, freq='d'))
关于python - 如何用 pandas 中的特定值填充缺失值(日期和时间),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50621498/