python - 使用 Pandas 重新采样移位间隔

给定以下时间序列(用于说明目的):

From                | Till                | Precipitation
2022-01-01 06:00:00 | 2022-01-02 06:00:00 | 0.5
2022-01-02 06:00:00 | 2022-01-03 06:00:00 | 1.2
2022-01-03 06:00:00 | 2022-01-04 06:00:00 | 0.0
2022-01-04 06:00:00 | 2022-01-05 06:00:00 | 1.3
2022-01-05 06:00:00 | 2022-01-06 06:00:00 | 9.8
2022-01-06 06:00:00 | 2022-01-07 06:00:00 | 0.1

我想估算 2022-01-02 00:00:00 到 2022-01-06 00:00:00 之间的每日降水量。我们可以假设表中每个给定间隔的降水率是恒定的。

手动操作我会假设类似

2022-01-02 00:00:00 | 2022-01-03 00:00:00 | 0.25 * 0.5 + 0.75 * 1.2

注意:现实世界的数据很可能看起来不太规则，有点像下面这样(缺失的间隔可以假设为 0.0):

From                | Till                | Precipitation
2022-01-01 05:45:12 | 2022-01-02 02:11:20 | 0.8
2022-01-03 02:01:59 | 2022-01-04 12:01:00 | 5.4
2022-01-04 06:00:00 | 2022-01-05 06:00:00 | 1.3
2022-01-05 07:10:00 | 2022-01-06 07:10:00 | 9.2
2022-01-06 02:54:00 | 2022-01-07 02:53:59 | 0.1

也许有一个库提供通用且高效的解决方案？
如果没有这样的库，如何以最有效的方式计算重采样时间序列？

最佳答案

只需计算周期重叠...我认为这会很快

import pandas as pd
import numpy as np


def create_test_data():
    # just a helper to construct a test dataframe
    from_dates = pd.date_range(start='2022-01-01 06:00:00', freq='D', periods=6)
    till_dates = pd.date_range(start='2022-01-02 06:00:00', freq='D', periods=6)
    precip_amounts = [0.5, 1.2, 1, 2, 3, 0.5]
    return pd.DataFrame({'From': from_dates, 'Till': till_dates, 'Precip': precip_amounts})


def get_between(df, start_datetime, end_datetime):
    # all the entries that end (Till) after start_time
    # and start(From) before the end
    mask1 = df['Till'] > start_datetime
    mask2 = df['From'] < end_datetime
    return df[mask1 & mask2]


def get_ratio_values(df, start_datetime, end_datetime, debug=True):
    # get the ratios of the period windows
    df2 = get_between(df, start_datetime, end_datetime)  # get only the rows of interest
    precip_values = df['Precip']
    # get overlap from the end time of row to start of our period of interest
    overlap_period1 = df2['Till'] - start
    # get overlap from end of our period of interest and the start time of row
    overlap_period2 = end - df2['From']
    # get the "best" overlap for each row
    best_overlap = np.minimum(overlap_period1, overlap_period2)
    # get the period of each duration
    window_durations = df2['Till'] - df2['From']
    # calculate the ratios of overlap (cannot be greater than 1)
    ratios = np.minimum(1.0, best_overlap / window_durations)
    # calculate the value * the ratio
    ratio_values = ratios * precip_values
    if debug:
        # just some prints for verification
        print("Ratio * value = result")
        print("----------------------")
        print("\n".join(f"{x:0.3f} * {y:0.2f} = {z}" for x, y, z in zip(ratios, df['Precip'], ratio_values)))
        print("----------------------")
    return ratio_values


start = pd.to_datetime('2022-01-02 00:00:00')
end = pd.to_datetime('2022-01-04 00:00:00')
ratio_vals = get_ratio_values(create_test_data(), start, end)
total_precip = ratio_vals.sum()
print("SUM RESULT   =", total_precip)

您也可以只计算第一个和最后一个条目，因为中间的任何内容都将始终为 1(这可能既简单又快速)

def get_ratio_values(df, start_datetime, end_datetime, debug=True):
    # get the ratios of the period windows
    df2 = get_between(df, start_datetime, end_datetime)  # get only the rows of interest
    precip_values = df['Precip']
    # overlap with first row and duration of first row
    overlap_start = df2[0]['Till'] - start
    duration_start = df2[0]['Till'] - df2[0]['From']

    # overlap with last row and duration of last row
    overlap_end = end - df2[-1]['From']
    duration_start = df2[-1]['Till'] - df2[-1]['From']

    ratios = [1]* len(df2)
    ratios[0] = overlap_start/duration_start
    ratios[-1] = overlap_end/duration_end

    return ratios * precip_values

关于python - 使用 Pandas 重新采样移位间隔，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/72412448/

python - 使用 Pandas 重新采样移位间隔

上一篇：html - 修复了导航栏而不破坏 flex 盒

下一篇：python - 谷歌Foobar L4 : Bringing a gun to a trainer fight