Python/Pandas 与 NaN 数据合并问题

我正在尝试使用 pd.concat 将两个数据帧(df 和 df2)合并到一个新数据帧(df3)中Pandas 使用以下代码:

df3 = pd.concat([df, df2])

这几乎按照我想要的方式工作，但它产生了一个问题。

df 包含当前日期的数据，索引是时间序列。它看起来像这样:

                        Facility    Servers   PUE
2016-10-31  00:00:00    6.0         5.0       1.2
2016-10-31  00:30:00    7.0         5.0       1.4
2016-10-31  01:00:00    6.0         5.0       1.2
2016-10-31  01:30:00    6.0         5.0       1.2
2016-10-31  02:00:00    6.0         5.0       1.2

df2 仅包含 NaN 数据，索引是一个时间序列，其格式与 df 中的时间序列相对应，但从较早的日期开始并持续完整的时间序列年(即 17520 行对应于 365 * 48 三十分钟间隔)。它看起来基本上是这样的:

                        Facility    Servers   PUE
2016-10-01  00:00:00    NaN         NaN       NaN
2016-10-01  00:30:00    NaN         NaN       NaN
2016-10-01  01:00:00    NaN         NaN       NaN
2016-10-01  01:30:00    NaN         NaN       NaN
2016-10-01  02:00:00    NaN         NaN       NaN
2016-10-01  02:30:00    NaN         NaN       NaN
<continues to 17520 rows, i.e. one year of 30 minute time intervals>

当我申请时:df3 = pd.concat([df, df2])

然后运行df3.head()，我得到以下结果:

                        Facility    Servers   PUE
2016-10-31  00:00:00    6.0         5.0       1.2
2016-10-31  00:30:00    7.0         5.0       1.4
2016-10-31  01:00:00    6.0         5.0       1.2
2016-10-31  01:30:00    6.0         5.0       1.2
2016-10-31  02:00:00    6.0         5.0       1.2
2016-10-31  02:30:00    NaN         NaN       NaN
2016-10-31  03:00:00    NaN         NaN       NaN
2016-10-31  03:30:00    NaN         NaN       NaN
<continues to the end of the year>

换句话说，代码似乎删除了 df 中数据之前发生的时间间隔内的所有 NaN 数据。谁能建议如何保留 df2 中的所有数据，仅用 df 中相应时间间隔的数据替换它？

最佳答案

我认为你需要reindex通过 union两个索引:

print (df2.index.union(df.index))
DatetimeIndex(['2016-10-01 00:00:00', '2016-10-01 00:30:00',
               '2016-10-01 01:00:00', '2016-10-01 01:30:00',
               '2016-10-01 02:00:00', '2016-10-01 02:30:00',
               '2016-10-31 00:00:00', '2016-10-31 00:30:00',
               '2016-10-31 01:00:00', '2016-10-31 01:30:00',
               '2016-10-31 02:00:00'],
              dtype='datetime64[ns]', freq=None)

df = df.reindex(df2.index.union(df.index))
print (df)
                     Facility  Servers  PUE
2016-10-01 00:00:00       NaN      NaN  NaN
2016-10-01 00:30:00       NaN      NaN  NaN
2016-10-01 01:00:00       NaN      NaN  NaN
2016-10-01 01:30:00       NaN      NaN  NaN
2016-10-01 02:00:00       NaN      NaN  NaN
2016-10-01 02:30:00       NaN      NaN  NaN
2016-10-31 00:00:00       6.0      5.0  1.2
2016-10-31 00:30:00       7.0      5.0  1.4
2016-10-31 01:00:00       6.0      5.0  1.2
2016-10-31 01:30:00       6.0      5.0  1.2
2016-10-31 02:00:00       6.0      5.0  1.2

关于Python/Pandas 与 NaN 数据合并问题，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/40344257/

Python/Pandas 与 NaN 数据合并问题

上一篇：python - Applescript - 从 python 获取返回

下一篇：python - Pycharm SSH 插件缺少 Ubuntu 16.04