python - 用 pandas 填充信号时保留原始数据点

考虑以下测试数据集:

testdf = pandas.DataFrame({'t': [datetime(2015, 1, 1, 10,  0),
                                 datetime(2015, 1, 1, 11, 32),
                                 datetime(2015, 1, 1, 12,  0)],
                           'val': [1, 2, 3]})

我想使用简单的填充来插入此数据集，这样我至少每 30 分钟就有一个数据点，同时保留原始数据点。

合适的结果如下所示:

't'                'val'
2015-01-01 10:00   1
2015-01-01 10:30   1
2015-01-01 11:00   1
2015-01-01 11:30   1
2015-01-01 11:32   2
2015-01-01 12:00   3

哪个是实现此结果的好方法，最好使用标准 pandas 方法？

我知道 DataFrame.resample 方法，但是

a) 我似乎找不到 how 参数的正确值来实现所需的简单填充，并且

b)我找不到在结果中保留原始数据点的方法。

问题 b) 当然可以通过手动将原始数据点添加到重新采样的 DataFrame 中来规避，尽管我不认为这是一个特别简洁的解决方案。

最佳答案

生成具有缺失时间戳的索引，并创建具有 NaN 值的数据帧。然后将其与 combine_first 方法结合并填写 NaN 值:

idx = pandas.date_range(datetime(2015, 1, 1, 10, 0), datetime(2015, 1, 1, 12, 0), freq='30min')
df = pandas.DataFrame(numpy.nan, index=idx, columns=['val'])

testdf.set_index('t', inplace=True)
testdf.combine_first(df).fillna(method='ffill')

documentation of the combine_first method内容如下:

Combine two DataFrame objects and default to non-null values in frame calling the method. Result index columns will be the union of the respective indexes and columns

fillna 方法的 ffill 方法执行以下操作 ( source ):

ffill: propagate last valid observation forward to next valid backfill

关于python - 用 pandas 填充信号时保留原始数据点，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35918248/

python - 用 pandas 填充信号时保留原始数据点

上一篇：python - 引用稍后在代码中创建的对象

下一篇：python - 函数不适用于文件中的空格