python - 将数据帧转换为没有列名的嵌套字典

这是我的 pandas 数据框的示例，它包含接近 100k 行

import pandas as pd
df = pd.DataFrame({'cluster': ['5', '5', '5', '5', '5', '5'],
         'mdse_item_i': ['23627102',
                         '23627102',
                         '23627102',
                         '23627102',
                         '23627102',
                         '23627102'],
         'predPriceQty': ['35.675543',
                         '33.236678',
                         '35.675543',
                         '35.675543',
                         '35.675543',
                         '35.675543'],
         'schedule_i': ['56', '56', '56', '56', '56', '56'],
         'segment_id': ['4123', '4123', '4144', '4161', '4295', '4454'],
         'wk': ['1', '2', '1', '1', '1', '1']} )

<表类=“s-表”> <标题> segment_id 集群 schedule_i mdse_item_i 周 predPriceQty <正文> 4123 5 56 23627102 1 35.675543 4123 5 56 23627102 2 33.236678 4144 5 56 23627102 1 35.675543 4161 5 56 23627102 1 35.675543 4295 5 56 23627102 1 35.675543 4454 5 56 23627102 1 35.675543

下面是我想要实现的字典的嵌套格式

{(4123, 5): {56.0: {23627102.0: {1: 35.6755430505491, 2:33.236678}}},
 (4144, 5): {56.0: {23627102.0: {1: 35.6755430505491}}},
 (4161, 5): {56.0: {23627102.0: {1: 35.6755430505491}}},
 (4295, 5): {56.0: {23627102.0: {1: 35.6755430505491}}},
 (4454, 5): {56.0: {23627102.0: {1: 35.6755430505491}}}}

下面的代码适用于我，但对于巨大的数据帧，创建字典需要几个小时，并且我试图避免逐行迭代

forecast_dict_all = {}
for _, row in df.iterrows():
        item_agg_id = int(row[segment_id])
        mdse_item_i = row["mdse_item_i"]
        cluster = int(row["cluster"])
        wk = int(row["wk"])
        forecast = float(row["predPriceQty"])
        schedule_id = row["schedule_i"]
        
        if (item_agg_id, cluster) not in forecast_dict_all:
            forecast_dict_all[item_agg_id, cluster] = {
                schedule_id: {mdse_item_i: {wk: forecast}}
            }

到目前为止我的解决方案

dict(df.groupby(['segment_id','cluster'],as_index=False).apply(lambda x: x.to_dict()).to_dict())

df.set_index(['segment_id', 'cluster'], inplace=True)
    
di = df.to_dict(orient='index')
    
forecast_dict_all = {k:{v['schedule_i']: {v['mdse_item_i']: {v['wk']: v['predPriceQty']}}} 
                            for k,v in di.items()}

df.set_index(['segment_id', 'cluster'], inplace=True)
{k:{grp['schedule_i']: {grp['mdse_item_i']: {grp['wk']: grp['predPriceQty']}}}
for k, grp in df.groupby(['schedule_i','mdse_item_i','wk','predPriceQty'])}

我什至尝试使用压缩，但在这两种情况下，我都无法实现所需的输出。

编辑我在用 python :2.7.13.final.0 Pandas :0.20.1

感谢任何帮助，谢谢

最佳答案

我不知道这是否会更快，但它给出了示例数据的预期输出。

df = pd.DataFrame(d)
df = df.astype(dtype={'cluster': int, 'mdse_item_i': int, 'predPriceQty': float,
                'schedule_i': int, 'segment_id': int, 'wk': int})
df.drop_duplicates(inplace=True)
df.set_index(['segment_id', 'cluster'], inplace=True)
answer = df.apply(lambda row:
                {row['schedule_i']: {row['mdse_item_i']: {row['wk']: row['predPriceQty']}}},
                axis=1).to_dict()

结果:

{(4123, 5): {56.0: {23627102.0: {1.0: 35.675543}}},
 (4144, 5): {56.0: {23627102.0: {1.0: 35.675543}}},
 (4161, 5): {56.0: {23627102.0: {1.0: 35.675543}}},
 (4295, 5): {56.0: {23627102.0: {1.0: 35.675543}}},
 (4454, 5): {56.0: {23627102.0: {1.0: 35.675543}}}}

注意:我修复了数据框的类型，因为您在代码中这样做，但获得正确类型的最佳时间是创建数据框时。

关于python - 将数据帧转换为没有列名的嵌套字典，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/73334697/

python - 将数据帧转换为没有列名的嵌套字典

上一篇：c# - AutoFixture:如何创建不带前缀的属性值

下一篇：shopware6 - 如何仅在侧边栏中显示当前父事件类别及其子类别？