python - 如何根据双方最近的可用日期制作重叠的几周窗口?

标签 python pandas dataframe

抱歉大家的标题,但这确实是我想做的。

这里有一个表格来解释更多。粗线代表年份,细线代表星期。

enter image description here

对于预期的输出。它的格式实际上并不重要。我所需要的是,如果我询问一对YEAR/WEEK的日期,我会得到相应的日期窗口。

例如,如果我执行 some_window_function(2022, 5) 我应该得到下面的结果(它对应于RED WINDOW)

                                  DATE
YEAR WEEK                             
2020 30          Friday, July 24, 2020
2022 5     Wednesday, February 2, 2022
     5      Thursday, February 3, 2022
     5        Friday, February 4, 2022
     7      Tuesday, February 15, 2022

例如,如果我执行 some_window_function(2022, 7) 我应该得到下面的结果(它对应于BLUE WINDOW)

                                   DATE
YEAR WEEK                              
2022 5         Friday, February 4, 2022
2022 7       Tuesday, February 15, 2022
     7     Wednesday, February 16, 2022
     7      Thursday, February 17, 2022
2023 44       Tuesday, October 31, 2023

使用的数据框是这样的:

df = pd.DataFrame({'YEAR': [2020, 2020, 2020, 2020, 2020, 2020, 2020, 2022, 2022, 2022, 2022, 2022, 2022, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023], 'WEEK': [29, 29, 29, 30, 30, 30, 30, 5, 5, 5, 7, 7, 7, 44, 44, 44, 44, 45, 45, 45, 46, 46, 46, 46], 'DATE': ['Monday, July 13, 2020', 'Thursday, July 16, 2020', 'Friday, July 17, 2020', 'Monday, July 20, 2020', 'Tuesday, July 21, 2020', 'Thursday, July 23, 2020', 'Friday, July 24, 2020', 'Wednesday, February 2, 2022', 'Thursday, February 3, 2022', 'Friday, February 4, 2022', 'Tuesday, February 15, 2022', 'Wednesday, February 16, 2022', 'Thursday, February 17, 2022', 'Tuesday, October 31, 2023', 'Wednesday, November 02, 2023', 'Friday, November 03, 2023', 'Sunday, November 05, 2023', 'Monday, November 06, 2023', 'Tuesday, November 07, 2023', 'Wednesday, November 08, 2023', 'Monday, November 13, 2023', 'Tuesday, November 14, 2023', 'Wednesday, November 15, 2023', 'Thursday, November 16, 2023']})

我编写了下面的代码,但它提供了与我的输入类似的数据框:

def make_windows(group):
    if group.name == df.loc[df['YEAR'] == group.name, 'WEEK'].min():
        group.at[group.index[-1]+1, 'DATE'] = df.at[group.index[-1]+1, 'DATE']
        return group.ffill()

    elif group.name < df.loc[df['YEAR']== group.name, 'WEEK'].max():
        group.at[group.index[-1]+1, 'DATE'] = df.at[group.index[-1]+1, 'DATE']
        return group.iloc[1:].ffill()
    else:
        return group.iloc[1:].ffill()

results = df.groupby('YEAR').apply(make_windows)

最佳答案

看起来您可以对“年/周”使用一个简单的掩码,并将其在上面/下面展开一行(假设已排序的日期):

df = df.sort_values(by=['YEAR', 'WEEK'])

def some_window_function(year, week):
    mask = df['YEAR'].eq(year) & df['WEEK'].eq(week)
    return df[mask|mask.shift()|mask.shift(-1)]

some_window_function(2022, 5)

输出:

    YEAR  WEEK                         DATE
6   2020    30        Friday, July 24, 2020
7   2022     5  Wednesday, February 2, 2022
8   2022     5   Thursday, February 3, 2022
9   2022     5     Friday, February 4, 2022
10  2022     7   Tuesday, February 15, 2022

关于python - 如何根据双方最近的可用日期制作重叠的几周窗口?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/77521686/

相关文章:

python - Plotly:如何在注释中同时使用美元符号和换行符?

python - 滚动条件均值

python - 如何根据索引的最大值差异创建新列?

r - 将日期字符串转换为 R 中的天数

python - 根据列表查找 pandas 列子集中任何匹配值的更快方法

python - 用python数据框中的新结尾替换单词结尾

python - 如何将 CSV 文件迁移到 Sqlite3(或 MySQL)? - Python

python - 将 numba 函数拆分为项目中的单独模块以进行打包

python - 连接 Sproutcore 前端和自定义 Python 后端

python - 使用 numpy 重复函数创建 pandas DataFrame