python - 通过动态键名对字典进行分组并聚合Python中嵌套字典的一些键

我正在努力通过键(在嵌套字典中)对嵌套字典进行分组，并聚合一些嵌套字典的数据。我希望这里有人能给我一些有用的提示，因为我没有取得任何进展。我使用的是 Python 3.6，我查看了集合和 pandas 模块，认为 pandas 模块可能包含我实现目标所需的内容。

提供以下字典:


{
  12345: {
    'date': '2019-07-26',
    'time_spent': 0.5,
    'color': 'yellow',
    'drive_id': 1804
  },
  54321: {
    'date': '2019-07-26',
    'time_spent': 1.5,
    'color': 'yellow',
    'drive_id': 3105
  },
  11561: {
    'date': '2019-07-25',
    'time_spent': 1.25,
    'color': 'red',
    'drive_id': 1449
  },
  12101: {
    'date': '2019-07-25',
    'time_spent': 0.25,
    'color': 'red',
    'drive_id': 2607
  },
  12337: {
    'date': '2019-07-24',
    'time_spent': 2.0,
    'color': 'yellow',
    'drive_id': 3105
  },
  54123: {
    'date': '2019-07-24',
    'time_spent': 1.5,
    'color': 'yellow',
    'drive_id': 4831
  },
  15931: {
    'date': '2019-07-19',
    'time_spent': 3.0,
    'color': 'yellow',
    'drive_id': 3105
  },
  13412: {
    'date': '2019-07-19',
    'time_spent': 1.5,
    'color': 'red',
    'drive_id': 1449
  }
}

将其视为汽车销售商最近几天的试驾列表，其中包含单次试驾所花费的时间，并通过颜色评估销售机会。现在，我需要对这些数据进行分组:

按日期分组，因此新字典可能包含单个日期作为键
汇总单个日期的 time_spent 并提供该日期的总和
带上颜色，但如果有一天颜色混合在一起(例如红色和黄色)，红色总是获胜
对于每个日期，都有一个以逗号分隔的drive_id 聚合列表
丢弃顶级字典的键名

所以当我手动转换数据时，它可能看起来像这样:

{
  '2019-07-26':
  {
    'time_spent': '2.0',
    'color': 'yellow',
    'drive_id': '1804, 3105',

  },
  '2019-07-25':
  {
    'time_spent': '1.5',
    'color': 'red',
    'drive_id': '1449, 2607',

  },
  '2019-07-24':
  {
    'time_spent': '3.5',
    'color': 'yellow',
    'drive_id': '3105, 4831',

  },
  '2019-07-19':
  {
    'time_spent': '4.5',
    'color': 'red',
    'drive_id': '1449, 3105',
  }

}

现在我的障碍在哪里？显然，我的 Python 技能有限，而且我很难动态生成 dict 键名称(例如 13412)。我在这里找到了这个解决方案( Group pandas dataframe by a nested dictionary key )，但我无法将此解决方案应用于我的案例，因为这里事先不知道 dict 键名称。所以我基本上尝试创建一个 pandas DataFrame 并首先按日期对原始字典进行分组，但我已经失败了。

如果我可能忽略了 pandas 文档中的某些内容或 StackOverflow 上的问题，我深表歉意。如果有人能给我提示并向我解释如何处理这种情况，我将不胜感激。

最佳答案

通过简单的迭代并使用dict.setdefault:

d = {
  12345: {
    'date': '2019-07-26',
    'time_spent': 0.5,
    'color': 'yellow',
    'drive_id': 1804
  },
  54321: {
    'date': '2019-07-26',
    'time_spent': 1.5,
    'color': 'yellow',
    'drive_id': 3105
  },
  11561: {
    'date': '2019-07-25',
    'time_spent': 1.25,
    'color': 'red',
    'drive_id': 1449
  },
  12101: {
    'date': '2019-07-25',
    'time_spent': 0.25,
    'color': 'red',
    'drive_id': 2607
  },
  12337: {
    'date': '2019-07-24',
    'time_spent': 2.0,
    'color': 'yellow',
    'drive_id': 3105
  },
  54123: {
    'date': '2019-07-24',
    'time_spent': 1.5,
    'color': 'yellow',
    'drive_id': 4831
  },
  15931: {
    'date': '2019-07-19',
    'time_spent': 3.0,
    'color': 'yellow',
    'drive_id': 3105
  },
  13412: {
    'date': '2019-07-19',
    'time_spent': 1.5,
    'color': 'red',
    'drive_id': 1449
  }
}

out = {}
for item in d.values():
    out.setdefault(item['date'], {})
    out[item['date']].setdefault('time_spent', 0.0)
    out[item['date']].setdefault('color', 'yellow')
    out[item['date']].setdefault('drive_id', [])

    out[item['date']]['time_spent'] += item['time_spent']
    if item['color'] == 'red':
        out[item['date']]['color'] = 'red'
    out[item['date']]['drive_id'].append(item['drive_id'])

#post-processing
for k in out.values():
    k['drive_id'] = ', '.join(str(i) for i in k['drive_id'])
    k['time_spent'] = str(k['time_spent'])

from pprint import pprint
pprint(out)

打印:

{'2019-07-19': {'color': 'red', 'drive_id': '3105, 1449', 'time_spent': '4.5'},
 '2019-07-24': {'color': 'yellow',
                'drive_id': '3105, 4831',
                'time_spent': '3.5'},
 '2019-07-25': {'color': 'red', 'drive_id': '1449, 2607', 'time_spent': '1.5'},
 '2019-07-26': {'color': 'yellow',
                'drive_id': '1804, 3105',
                'time_spent': '2.0'}}

关于python - 通过动态键名对字典进行分组并聚合Python中嵌套字典的一些键，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57358407/

python - 通过动态键名对字典进行分组并聚合Python中嵌套字典的一些键

上一篇：python - XGBoost 中的特征重要性 'gain'

下一篇：python - 如何在 Python 中修复我的数据类的 TypeError？