python - 从字典列表中创建嵌套的 json 对象

标签 python json google-bigquery

我想将字典列表转换为嵌套.json 文件对象。我有一个字典列表,字典中的一个字段指示该特定字段是否应嵌套在 .json 文件中,如果是,则应嵌套在该文件中的位置。

我可以将内容嵌套到适当的表中,但是让它们进一步嵌套在字段中会让我陷入困境。

我的数据采用以下格式:

table_list = [
    {"Table": "table1", "Field": "field1", "Description": "description1", "Type": "STR"}, 
    {"Table": "table1", "Field": "field2", "Description": "description2", "Type": "STR"}, 
    {"Table": "table1", "Field": "field3", "Description": "description3", "Type": "STR"},
    {"Table": "table1", "Field": "field4", "Description": "description4", "Type": "STR"},
    {"Table": "table1", "Field": "field5", "Description": "description5", "Type": "RECORD"},
    {"Table": "table1", "Field": "field5.nest1", "Description": "description6", "Type": "STR"},
    {"Table": "table1", "Field": "field5.nest2", "Description": "description7", "Type": "STR"},
    {"Table": "table1", "Field": "field5.nest3", "Description": "description8", "Type": "STR"},
    {"Table": "table1", "Field": "field5.nest4", "Description": "description9", "Type": "RECORD"},
    {"Table": "table1", "Field": "field5.nest4.nest1", "Description": "description10", "Type": "STR"},
    {"Table": "table1", "Field": "field5.nest4.nest2", "Description": "description11", "Type": "STR"},
    {"Table": "table2", "Field": "field1", "Description": "description1", "Type": "STR"}
]

我希望它输出为这种格式(抱歉有任何拼写错误):

{
    "table1": [
    {
        "Field": "field1",
        "Description": "description1",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "field2",
        "Description": "description2",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "field3",
        "Description": "description3",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "field4",
        "Description": "description4",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "field5",
        "Description": "description5",
        "Mode": "REPEATED",
        "Type": "RECORD",
        "Fields": [
            {
                "Field": "nest1",
                "Description": "description6",
                "Mode": "NULLABLE",
                "Type": "STR"
            },
            {
                "Field": "nest2",
                "Description": "description7",
                "Mode": "NULLABLE",
                "Type": "STR"
            },
            {
                "Field": "nest3",
                "Description": "description8",
                "Mode": "NULLABLE",
                "Type": "STR"
            },
            {
                "Field": "nest4",
                "Description": "description9",
                "Mode": "REPEATED",
                "Type": "RECORD",
                "Fields": [
                    {
                        "Field": "nest1",
                        "Description": "description10",
                        "Mode": "NULLABLE",
                        "Type": "STR"
                    },
                    {
                        "Field": "nest2",
                        "Description": "description11",
                        "Mode": "NULLABLE",
                        "Type": "STR"
                    }
                ]
            }
        ]
    }
    ]
    "table2": [
    {
        "Field": "field1",
        "Description": "description1",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    ]
}

我在让nest1和nest2在现有字典中创建一个新字段时遇到问题,该字段带有一个可以添加到深度深度的开放列表。本例中的巢穴仅 3 层深,但我可能需要深达 15 层

我有代码可以在第一级使用“Table”应用此功能,但进入字段以添加到该列表一直具有挑战性,而且我还没有找到具有完全相同的问题问题。

我看到很多人试图通过展平嵌套结构来逆向做到这一点,但我正在尝试创建嵌套。

import json


def create_schema(file_to_read):
    all_tables = {}
    for row in file_to_read:
        if row['Table'] in all_tables.keys():
            all_tables[row['Table']].append({"Mode": "NULLABLE",
                                             "Field": row['Field'],
                                             "Type": row['Type'],
                                             "Description": row['Description']})
        else:
            all_tables[row['Table']] = []
            all_tables[row['Table']].append({"Mode": "NULLABLE",
                                             "Field": row['Field'],
                                             "Type": row['Type'],
                                             "Description": row['Description']})
    return json.dumps(all_tables, indent=4, sort_keys=True)

我通过这个函数实际得到的是:

{
    "table1": [
    {
        "Field": "field1",
        "Description": "description1",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "field2",
        "Description": "description2",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "field3",
        "Description": "description3",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "field4",
        "Description": "description4",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "field5",
        "Description": "description5",
        "Mode": "NULLABLE",
        "Type": "RECORD",
    },
    {
        "Field": "nest1",
        "Description": "description6",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "nest2",
        "Description": "description7",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "nest3",
        "Description": "description8",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "nest4",
        "Description": "description9",
        "Mode": "NULLABLE",
        "Type": "RECORD",
    },
    {
        "Field": "nest1",
        "Description": "description10",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "nest2",
        "Description": "description11",
        "Mode": "NULLABLE",
        "Type": "STR"
    }
    ]
    "table2": [
    {
        "Field": "field1",
        "Description": "description1",
        "Mode": "NULLABLE",
        "Type": "STR"
    }
    ]
}

(就上下文而言,这旨在作为 BigQuery json 架构)

最佳答案

这应该可以实现您的目标:

from collections import defaultdict

d = defaultdict(list)
for t in table_list:
    field_list = d[t['Table']]
    field = t['Field'].split('.')
    for f in field[:-1]:
        field_list = next(el['Fields'] for el in field_list if el['Field'] == f)
    new_d = {'Field': field[-1], 'Description': t['Description'], 'Mode': 'NULLABLE' if t['Type'] == 'STR' else 'REPEATED', 'Type': t['Type']}
    field_list.append(defaultdict(list, new_d))

print(json.dumps(d, indent=4))

或者如果您不想使用 defaultdict:

d = {}
for t in table_list:
    if t['Table'] not in d:
        d[t['Table']] = []
    field_list = d[t['Table']]
    field = t['Field'].split('.')
    for f in field[:-1]:
        inner = next(el for el in field_list if el['Field'] == f)
        if 'Fields' not in inner:
            inner['Fields'] = []
        field_list = inner['Fields']
    new_d = {'Field': field[-1], 'Description': t['Description'], 'Mode': 'NULLABLE' if t['Type'] == 'STR' else 'REPEATED', 'Type': t['Type']}
    field_list.append(new_d)

输出:

{
    "table1": [
        {
            "Field": "field1",
            "Description": "description1",
            "Mode": "NULLABLE",
            "Type": "STR"
        },
        {
            "Field": "field2",
            "Description": "description2",
            "Mode": "NULLABLE",
            "Type": "STR"
        },
        {
            "Field": "field3",
            "Description": "description3",
            "Mode": "NULLABLE",
            "Type": "STR"
        },
        {
            "Field": "field4",
            "Description": "description4",
            "Mode": "NULLABLE",
            "Type": "STR"
        },
        {
            "Field": "field5",
            "Description": "description5",
            "Mode": "REPEATED",
            "Type": "RECORD",
            "Fields": [
                {
                    "Field": "nest1",
                    "Description": "description6",
                    "Mode": "NULLABLE",
                    "Type": "STR"
                },
                {
                    "Field": "nest2",
                    "Description": "description7",
                    "Mode": "NULLABLE",
                    "Type": "STR"
                },
                {
                    "Field": "nest3",
                    "Description": "description8",
                    "Mode": "NULLABLE",
                    "Type": "STR"
                },
                {
                    "Field": "nest4",
                    "Description": "description9",
                    "Mode": "REPEATED",
                    "Type": "RECORD",
                    "Fields": [
                        {
                            "Field": "nest1",
                            "Description": "description10",
                            "Mode": "NULLABLE",
                            "Type": "STR"
                        },
                        {
                            "Field": "nest2",
                            "Description": "description11",
                            "Mode": "NULLABLE",
                            "Type": "STR"
                        }
                    ]
                }
            ]
        }
    ],
    "table2": [
        {
            "Field": "field1",
            "Description": "description1",
            "Mode": "NULLABLE",
            "Type": "STR"
        }
    ]
}

关于python - 从字典列表中创建嵌套的 json 对象,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55481423/

相关文章:

python - 如何正确获取文件扩展名?

没有表格的Python Bottle上传文件

json - 使用 jq 循环遍历 json 文件

sql - 如何在 BigQuery 中显示数据样本?

java - 来自java的流插入: templateSuffix

python - OAuth 和 YouTube API

python - 在 python 中创建新的列表对象

java - ClassCastException 将字符串解析为 JSONArray

java - 为特定注释禁用 Jackson 映射器

google-sheets - "Failed to read the spreadsheet. Error code: PERMISSION_DENIED"- Tableau 自定义查询