python - 从使用Python中的groupby itertools创建的词典列表中删除重复项

标签 python python-3.x dictionary arraylist duplicates

我想删除合并字典中的某些重复项。

我的资料:

mongo_data = [{
 'url': 'https://goodreads.com/',
 'variables': [{'key': 'Harry Potter', 'value': '10.0'},
               {'key': 'Discovery of Witches', 'value': '8.5'},],
 'vendor': 'Fantasy' 
 },{
 'url': 'https://goodreads.com/',
 'variables': [{'key': 'Hunger Games', 'value': '10.0'},
               {'key': 'Maze Runner', 'value': '5.5'},],
 'vendor': 'Dystopia' 
 },{
 'url': 'https://kindle.com/',
 'variables': [{'key': 'Divergent', 'value': '9.0'},
               {'key': 'Lord of the Rings', 'value': '9.0'},],
 'vendor': 'Fantasy' 
 },{
 'url': 'https://kindle.com/',
 'variables': [{'key': 'The Handmaids Tale', 'value': '10.0'},
               {'key': 'Divergent', 'value': '9.0'},],
 'vendor': 'Fantasy' 
 }]

我的代码:
for key, group in groupby(mongo_data, key=lambda chunk: chunk['url']):
    search = {"url": key, "results": []}
    for vendor, group2 in groupby(group, key=lambda chunk2: chunk2['vendor']):
        result = {
            "genre": vendor,
            "data": [{'key': key['key'], 'value': key['value']} 
                     for result2 in group2
                     for key in result2["variables"]],
        }
        search["results"].append(result)
    searches.append(search)

我的结果:
[
  {
    "url": "https://goodreads.com/",
    "results": [
      {
        "genre": "Fantasy",
        "data": [
          {
            "key": "Harry Potter",
            "value": "10.0"
          },
          {
            "key": "Discovery of Witches",
            "value": "8.5"
          }
        ]
      },
      {
        "genre": "Dystopia",
        "data": [
          {
            "key": "Hunger Games",
            "value": "10.0"
          },
          {
            "key": "Maze Runner",
            "value": "5.5"
          }
        ]
      }
    ]
  },
  {
    "url": "https://kindle.com/",
    "results": [
      {
        "genre": "Fantasy",
        "data": [
          {
            "key": "Divergent",
            "value": "9.0"
          },
          {
            "key": "Lord of the Rings",
            "value": "9.0"
          },
          {
            "key": "The Handmaids Tale",
            "value": "10.0"
          },
          {
            "key": "Divergent",
            "value": "9.0"
          }
        ]
      }
      }
    ]
  }
]

我不想在我的结构中有任何重复。我不确定如何将它们取出。我的预期结果如下所示。

预期结果:
[
  {
    "url": "https://goodreads.com/",
    "results": [
      {
        "genre": "Fantasy",
        "data": [
          {
            "key": "Harry Potter",
            "value": "10.0"
          },
          {
            "key": "Discovery of Witches",
            "value": "8.5"
          }
        ]
      },
      {
        "genre": "Dystopia",
        "data": [
          {
            "key": "Hunger Games",
            "value": "10.0"
          },
          {
            "key": "Maze Runner",
            "value": "5.5"
          }
        ]
      }
    ]
  },
  {
    "url": "https://kindle.com/",
    "results": [
      {
        "genre": "Fantasy",
        "data": [
          {
            "key": "Divergent",
            "value": "9.0"
          },
          {
            "key": "Lord of the Rings",
            "value": "9.0"
          },
          {
            "key": "The Handmaids Tale",
            "value": "10.0"
          }
        ]
      }
      }
    ]
  }
]

在最后的词典列表中,发散现象越来越多。当我合并字典时,即使https://kindle.com/-->Fantasy中的重复项也合并为一个。我有办法删除重复的词典吗?

我希望https://kindle.com/部分看起来像:
{
"url": "https://kindle.com/",
"results": [
  {
    "genre": "Fantasy",
    "data": [
      {
        "key": "Divergent",
        "value": "9.0"
      },
      {
        "key": "Lord of the Rings",
        "value": "9.0"
      },
      {
        "key": "The Handmaids Tale",
        "value": "10.0"
      }
    ]
  }
  }
]
}

最佳答案

您可以先尝试将这些dict转换为settuple,然后稍后再转换回listdict:

mongo_data = [{
 'url': 'https://goodreads.com/',
 'variables': [{'key': 'Harry Potter', 'value': '10.0'},
               {'key': 'Discovery of Witches', 'value': '8.5'},],
 'vendor': 'Fantasy' 
 },{
 'url': 'https://goodreads.com/',
 'variables': [{'key': 'Hunger Games', 'value': '10.0'},
               {'key': 'Maze Runner', 'value': '5.5'},],
 'vendor': 'Dystopia' 
 },{
 'url': 'https://kindle.com/',
 'variables': [{'key': 'Divergent', 'value': '9.0'},
               {'key': 'Lord of the Rings', 'value': '9.0'},],
 'vendor': 'Fantasy' 
 },{
 'url': 'https://kindle.com/',
 'variables': [{'key': 'The Handmaids Tale', 'value': '10.0'},
               {'key': 'Divergent', 'value': '9.0'},],
 'vendor': 'Fantasy' 
 }]
from itertools import groupby
searches = []
for key, group in groupby(mongo_data, key=lambda chunk: chunk['url']):
    search = {"url": key, "results": []}
    for vendor, group2 in groupby(group, key=lambda chunk2: chunk2['vendor']):
        result = {
            "genre": vendor,
            "data": set((key['key'], key['value'])
                     for result2 in group2
                     for key in result2["variables"]),
        }
        result['data'] = [{"key": tup[0], "value": tup[1]} for tup in result['data']]
        search["results"].append(result)
    searches.append(search)
searches

输出:
[{'results': [{'data': [{'key': 'Harry Potter', 'value': '10.0'},
                        {'key': 'Discovery of Witches', 'value': '8.5'}],
               'genre': 'Fantasy'},
              {'data': [{'key': 'Maze Runner', 'value': '5.5'},
                        {'key': 'Hunger Games', 'value': '10.0'}],
               'genre': 'Dystopia'}],
  'url': 'https://goodreads.com/'},
 {'results': [{'data': [{'key': 'The Handmaids Tale', 'value': '10.0'},
                        {'key': 'Lord of the Rings', 'value': '9.0'},
                        {'key': 'Divergent', 'value': '9.0'}],
               'genre': 'Fantasy'}],
  'url': 'https://kindle.com/'}]

关于python - 从使用Python中的groupby itertools创建的词典列表中删除重复项,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59780030/

相关文章:

c++ - 使用 maps c++ 查找 vector 的多种模式

python - 用 conda-forge 包替换 pip 包的简单方法

python - Flask 中的实时图像流

Python 代码有效,但 Eclipse 显示错误 - Syntax error while detecting tuple

python - 重新分配字典值列表

python - 在python中寻找一种将项目从一个大文件映射到另一个大文件的有效且快速的方法

java - 在Java中其他 map 的基础上更新 map 中的值

python - 如何包含包含另一个模板的模板?

python - 如何使用 map 或 filter 而不是列表理解来过滤特定值的嵌套字典(pythonic 方式)?

python - 搜索列表中的值是否在字典中,其格式为 key-string, value-list(strings)