我正在尝试使用 pd.concat 将两个数据帧(df 和 df2)合并到一个新数据帧(df3)中Pandas 使用以下代码:
df3 = pd.concat([df, df2])
这几乎按照我想要的方式工作,但它产生了一个问题。
df 包含当前日期的数据,索引是时间序列。它看起来像这样:
Facility Servers PUE
2016-10-31 00:00:00 6.0 5.0 1.2
2016-10-31 00:30:00 7.0 5.0 1.4
2016-10-31 01:00:00 6.0 5.0 1.2
2016-10-31 01:30:00 6.0 5.0 1.2
2016-10-31 02:00:00 6.0 5.0 1.2
df2 仅包含 NaN 数据,索引是一个时间序列,其格式与 df 中的时间序列相对应,但从较早的日期开始并持续完整的时间序列年(即 17520 行对应于 365 * 48 三十分钟间隔)。它看起来基本上是这样的:
Facility Servers PUE
2016-10-01 00:00:00 NaN NaN NaN
2016-10-01 00:30:00 NaN NaN NaN
2016-10-01 01:00:00 NaN NaN NaN
2016-10-01 01:30:00 NaN NaN NaN
2016-10-01 02:00:00 NaN NaN NaN
2016-10-01 02:30:00 NaN NaN NaN
<continues to 17520 rows, i.e. one year of 30 minute time intervals>
当我申请时:df3 = pd.concat([df, df2])
然后运行df3.head(),我得到以下结果:
Facility Servers PUE
2016-10-31 00:00:00 6.0 5.0 1.2
2016-10-31 00:30:00 7.0 5.0 1.4
2016-10-31 01:00:00 6.0 5.0 1.2
2016-10-31 01:30:00 6.0 5.0 1.2
2016-10-31 02:00:00 6.0 5.0 1.2
2016-10-31 02:30:00 NaN NaN NaN
2016-10-31 03:00:00 NaN NaN NaN
2016-10-31 03:30:00 NaN NaN NaN
<continues to the end of the year>
换句话说,代码似乎删除了 df 中数据之前发生的时间间隔内的所有 NaN 数据。谁能建议如何保留 df2 中的所有数据,仅用 df 中相应时间间隔的数据替换它?
最佳答案
print (df2.index.union(df.index))
DatetimeIndex(['2016-10-01 00:00:00', '2016-10-01 00:30:00',
'2016-10-01 01:00:00', '2016-10-01 01:30:00',
'2016-10-01 02:00:00', '2016-10-01 02:30:00',
'2016-10-31 00:00:00', '2016-10-31 00:30:00',
'2016-10-31 01:00:00', '2016-10-31 01:30:00',
'2016-10-31 02:00:00'],
dtype='datetime64[ns]', freq=None)
df = df.reindex(df2.index.union(df.index))
print (df)
Facility Servers PUE
2016-10-01 00:00:00 NaN NaN NaN
2016-10-01 00:30:00 NaN NaN NaN
2016-10-01 01:00:00 NaN NaN NaN
2016-10-01 01:30:00 NaN NaN NaN
2016-10-01 02:00:00 NaN NaN NaN
2016-10-01 02:30:00 NaN NaN NaN
2016-10-31 00:00:00 6.0 5.0 1.2
2016-10-31 00:30:00 7.0 5.0 1.4
2016-10-31 01:00:00 6.0 5.0 1.2
2016-10-31 01:30:00 6.0 5.0 1.2
2016-10-31 02:00:00 6.0 5.0 1.2
关于Python/Pandas 与 NaN 数据合并问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40344257/