python - 将时间序列数据集与缺失值对齐以进行绘图

标签 python pandas numpy matplotlib

我有三个包含缺失值的数据集,每个数据集由一个时间列和一个数据列组成。两行之间的最小时间差为 1 秒 (00:00:01):

Dataset 1:          Dataset 2:          Dataset 3:  
00:00:00    81                          00:00:00    70
00:00:01    81                      
00:00:02    81                      
00:00:03    81                          00:00:03    99
00:00:04    81                          00:00:04    100
00:00:05    80      00:00:05    80      00:00:05    101
00:00:06    80      00:00:06    100         
                    00:00:07    92      00:00:07    88
00:00:08    83      00:00:08    80      00:00:08    88
00:00:09    84      00:00:09    83      00:00:09    87
00:00:10    86                      
00:00:11    89                      
00:00:12    90                      
00:00:13    92                          00:00:13    92
00:00:14    94                          00:00:14    94
00:00:15    94      00:00:15    96      00:00:15    93
00:00:16    96      00:00:16    97          
00:00:17    98      00:00:17    100     00:00:17    99
00:00:18    100                         00:00:18    99
00:00:19    101                         00:00:19    101
00:00:20    103                     

为了可视化,上表显示了缺失值的空字段。真实数据是密集的,例如看起来像这样:

Dataset 1:          Dataset 2:          Dataset 3:  
00:00:00    81      00:00:05    80      00:00:00    70
00:00:01    81      00:00:06    100     00:00:03    99
00:00:02    81      00:00:07    92      00:00:04    100
00:00:03    81      00:00:08    80      00:00:05    101
00:00:04    81      00:00:09    83      00:00:07    88
00:00:05    80      00:00:15    96      00:00:08    88
00:00:06    80      00:00:16    97      00:00:09    87
00:00:08    83      00:00:17    100     00:00:13    92
00:00:09    84                          00:00:14    94
00:00:10    86                          00:00:15    93
00:00:11    89                          00:00:17    99
00:00:12    90                          00:00:18    99
00:00:13    92                          00:00:19    101
00:00:14    94                      
00:00:15    94                      
00:00:16    96                      
00:00:17    98                      
00:00:18    100                     
00:00:19    101                     
00:00:20    103                     

现在我想对齐数据,以便可以这样绘制:

Combined

这样:

Split

我天真的做法是这样的:

  1. 查找每个数据集中的最小/最大时间。
  2. 创建一个表格,其中每个时间一行,三列,每列都有 n/a 作为值。
  3. 循环遍历每个数据集并将值分配给表。

是否有一些 Python 函数/库可以有效地执行这些步骤?或者有更好的方法吗?

问候,

最佳答案

您可以concat所有 DataFrame 以及按 time 列索引:

dfs = [df1, df2, df3]
df = pd.concat([x.set_index('time')['val'] for x in dfs], 
                axis=1, 
                keys=['a','b','c'],
                sort=True)
print (df)
              a      b      c
00:00:00   81.0    NaN   70.0
00:00:01   81.0    NaN    NaN
00:00:02   81.0    NaN    NaN
00:00:03   81.0    NaN   99.0
00:00:04   81.0    NaN  100.0
00:00:05   80.0   80.0  101.0
00:00:06   80.0  100.0    NaN
00:00:07    NaN   92.0   88.0
00:00:08   83.0   80.0   88.0
00:00:09   84.0   83.0   87.0
00:00:10   86.0    NaN    NaN
00:00:11   89.0    NaN    NaN
00:00:12   90.0    NaN    NaN
00:00:13   92.0    NaN   92.0
00:00:14   94.0    NaN   94.0
00:00:15   94.0   96.0   93.0
00:00:16   96.0   97.0    NaN
00:00:17   98.0  100.0   99.0
00:00:18  100.0    NaN   99.0
00:00:19  101.0    NaN  101.0
00:00:20  103.0    NaN    NaN

如果每个 DataFrame 有时缺少,请添加 DataFrame.asfreq ,但是是必需的 DatetimeIndex:

df.index = pd.to_datetime(df.index)
df = df.asfreq('S')
df.index = df.index.time
print (df)
              a      b      c
00:00:00   81.0    NaN   70.0
00:00:01   81.0    NaN    NaN
00:00:02   81.0    NaN    NaN
00:00:03   81.0    NaN   99.0
00:00:04   81.0    NaN  100.0
00:00:05   80.0   80.0  101.0
00:00:06   80.0  100.0    NaN
00:00:07    NaN   92.0   88.0
00:00:08   83.0   80.0   88.0
00:00:09   84.0   83.0   87.0
00:00:10   86.0    NaN    NaN
00:00:11   89.0    NaN    NaN
00:00:12   90.0    NaN    NaN
00:00:13   92.0    NaN   92.0
00:00:14   94.0    NaN   94.0
00:00:15   94.0   96.0   93.0
00:00:16   96.0   97.0    NaN
00:00:17   98.0  100.0   99.0
00:00:18  100.0    NaN   99.0
00:00:19  101.0    NaN  101.0
00:00:20  103.0    NaN    NaN

最后用于绘图使用 DataFrame.plot :

df.plot()

对于单独的每个图:

df.plot(subplots=True)

关于python - 将时间序列数据集与缺失值对齐以进行绘图,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59442241/

相关文章:

python - 从数据帧中提取子集

python - 重新分区 dask 数据帧以减少滚动期间的洗牌

python - 使用数据框另一部分的数据编辑 pandas 数据框中的值

python - 如何向 pandas hdf5 添加另一个数据框

python - 对 numpy 数组的不均匀分割部分应用运算

python - 在 Python 中四舍五入到给定的数字

python - tkinter python 的全局变量问题

python - 从时间索引 pandas dataframe 获取两个时间戳内特定时间的值

python - 没有收到来自 Flask-Security 的信号

python - numpy 中多维数组的向量和