python - Pandas - DateTime groupby 到结构化字典

标签 python pandas

我有一个包含 DateTime 字段的数据集。我需要按 hours 分组并将每个组分派(dispatch)到具有以下结构的字典中:

{year_1: 
    {month_1: 
        {week_1: 
            {day_1: 
                {hour_1: df_1, hour_2: df_2}
            }
        },
        {week_2: 
            {day_1: 
                {hour_1: df_1}
            }
        }
    },
    {month_3: 
        {week_1: 
            {day_1: 
                {hour_1: df_1, hour_2: df_2}
            }
        }
    },
year_2:
    {month_5: 
        {week_1: 
            {day_1: 
                {hour_2: df_2}
            }
        }
    }
}

为此,我使用了以下代码:

import pandas as pd

df = df = pd.DataFrame({'date': [pd.datetime(2015,3,17,2),    pd.datetime(2014,3,24,3), pd.datetime(2014,3,17,4)], 'hdg_id': [4041,4041,4041],'stock': [1.0,1.0,1.0]})
df.loc[:,'year'] = [x.year for x in df['date']]
df.loc[:,'month'] = [x.month for x in df['date']]
df.loc[:,'week'] = [x.week for x in df['date']]
df.loc[:,'day'] = [x.day for x in df['date']]
df.loc[:,'hour'] = [x.hour for x in df['date']]

result = {}
for to_unpack, df_hour in df.groupby(['year','month','day','week','hour']):
    year, month, week, day, hour = to_unpack
    try:
        result[year]
    except KeyError:
        result[year] = {}
    try:
        result[year][month]
    except KeyError:
        result[year][month] = {}
    try:
        result[year][month][week]
    except KeyError:
        result[year][month][week] = {}
    try:
        result[year][month][week][day]
    except KeyError:
        result[year][month][week][day] = {}

    result[year][month][week][day][hour] = df_hour

如您所见,这几乎是一个蛮力解决方案,我一直在寻找看起来更简洁易懂的解决方案。此外,它也非常慢。我尝试了不同的分组方式 ( Python Pandas Group by date using datetime data ),我还尝试了一个包含日期时间 ( Pandas DataFrame with MultiIndex: Group by year of DateTime level values ) 的每个组件的多重索引。然而,问题始终是如何创建字典。理想情况下,我只想写这样的东西:

result[year][month][week][day][hour] = df_hour

但据我所知,我首先需要初始化每个字典。

最佳答案

你需要dict.setdefault

result = {}
for to_unpack, df_hour in df.groupby(['year','month','day','week','hour']):
    year, month, week, day, hour = to_unpack

    result.setdefault(year, {}) \
          .setdefault(month, {}) \
          .setdefault(week, {}) \
          .setdefault(day, {}) \
          .setdefault(hour, df_hour)

你也可以继承 dict 来做这件事

class Fict(dict):
    def __getitem__(self, item):
        return super().setdefault(item, type(self)())

result = Fict()

for to_unpack, df_hour in df.groupby(['year','month','day','week','hour']):
    year, month, week, day, hour = to_unpack

    result[year][month][week][day][hour] = df_hour

关于python - Pandas - DateTime groupby 到结构化字典,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55928354/

相关文章:

python - Snowflake python 表存储阶段文件

python-3.x - Flask-Pandas 创建下载文件

python - 如何按类别绘制平均值条形图

python - 不同的错误总是引发相同的错误: ImproperlyConfigured: The included URLconf 'myproject.urls' does not appear to have any patterns in it

python - 如何对 groupby 对象中没有时间列的基于时间的列进行排序

python Pandas : offset Timestamp by business day

python - Pandas :在给定条件下填充nans

python - 在 Django 中分组 CheckboxSelectMultiple 选项

python - smtplib 中的 "No module named email.utils"与 gui2exe

python - 当 window = 1 时,pandas 滚动平均值可能存在错误