python - 将嵌套字典转换为表/父子结构,Python 3.6

标签 python python-3.x pandas dataframe dictionary

想要从下面的代码转换嵌套字典。

import requests
from bs4 import BeautifulSoup

url = 'https://www.bundesbank.de/en/statistics/time-series-databases/time-series-databases/743796/openAll?treeAnchor=BANKEN&statisticType=BBK_ITS'
result = requests.get(url)
soup = BeautifulSoup(result.text, 'html.parser')

def get_child_nodes(parent_node):
    node_name = parent_node.a.get_text(strip=True)

    result = {"name": node_name, "children": []}

    children_list = parent_node.find('ul', recursive=False)
    if not children_list:
    return result

    for child_node in children_list('li', recursive=False):
    result["children"].append(get_child_nodes(child_node))

    return result

Data_Dict = get_child_nodes(soup.find("div", class_="statisticTree"))

是否可以导出父子 - 如图所示?

enter image description here

以上代码来自@alecxe的回答:Fetch complete List of Items using BeautifulSoup, Python 3.6

我尝试过,但它太复杂了,难以理解,请帮忙。

字典:http://s000.tinyupload.com/index.php?file_id=97731876598977568058

示例词典数据:

{"name": "Banks", "children": [{"name": "Banks", "children": [{"name": "Balance sheet items", "children": 
[{"name": "Minimum reserves", "children": [{"name": "Reserve maintenance in the euro area", "children": []}, {"name": "Reserve maintenance in Germany", "children": []}]}, 

{"name": "Bank Lending Survey (BLS) - Results for Germany", "children": [{"name": "Lending", "children": [{"name": "Enterprises", "children": [{"name": "Changes over the past three months", "children": [{"name": "Credit standards and explanatory factors", "children": [{"name": "Overall", "children": []}, {"name": "Loans to small and medium-sized enterprises", "children": []}, {"name": "Loans to large enterprises", "children": []}, {"name": "Short-term loans", "children": []}, {"name": "Long-term loans", "children": []}]}, {"name": "Terms and conditions and explanatory factors", "children": [{"name": "Overall", "children": [{"name": "Overall terms and conditions and explanatory factors", "children": []}, {"name": "Margins on average loans and explanatory factors", "children": []}, {"name": "Margins on riskier loans and explanatory factors", "children": []}, {"name": "Non-interest rate charges", "children": []}, {"name": "Size of the loan or credit line", "children": []}, {"name": "Collateral requirements", "children": []}, {"name": "Loan covenants", "children": []}, {"name": "Maturity", "children": []}]}, {"name": "Loans to small and medium-sized enterprises", "children": []}, {"name": "Loans to large enterprises", "children": []}]}, {"name": "Share of enterprise rejected loan applications", "children": []}]}, {"name": "Expected changes over the next three months", "children": [{"name": "Credit standards", "children": []}]}]}, {"name": "Households", "children": [{"name": "Changes over the past three months", "children": [{"name": "Credit standards and explanatory factors", "children": [{"name": "Loans for house purchase", "children": []}, {"name": "Consumer credit and other lending", "children": []}]}, 

最佳答案

您可以使用递归函数来处理此问题。

def get_pairs(data, parent=''):
    rv = [(data['name'], parent)]
    for d in data['children']:    
        rv.extend(get_pairs(d, parent=data['name']))
    return rv

Data_Dict = get_child_nodes(soup.find("div", class_="statisticTree"))

pairs = get_pairs(Data_Dict)

然后,您可以选择创建 DataFrame,或立即导出到 csv,如示例输出所示。要创建 DataFrame,我们可以简单地执行以下操作:

df = pd.DataFrame(get_pairs(Data_Dict), columns=['Name', 'Parent'])

给予:

                                             Name               Parent
0                                           Banks                     
1                                           Banks                Banks
2                             Balance sheet items                Banks
3                                Minimum reserves  Balance sheet items
4            Reserve maintenance in the euro area     Minimum reserves
                                          ...                  ...
3890  Number of transactions per type of terminal  Payments statistics
3891   Value of transactions per type of terminal  Payments statistics
3892                   Number of OTC transactions  Payments statistics
3893                    Value of OTC transactions  Payments statistics
3894                        Issuance of banknotes  Payments statistics

[3895 rows x 2 columns]

或者要输出到 csv,我们可以使用 csv内置库:

import csv

with open('out.csv', 'w', newline='') as f:
    writer = csv.writer(f, delimiter=',')
    writer.writerow(('Name', 'Parent'))
    for pair in pairs:
        writer.writerow(pair)

输出:

enter image description here

关于python - 将嵌套字典转换为表/父子结构,Python 3.6,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59269960/

相关文章:

python - 制作无分支计数器的最佳方法是什么?

python - 使用 Pandas 数据框的简单线性回归

python - 使用 pandas dataframe 中的 JSON 对象优化解析文件,其中某些行中可能缺少键

python - 匀称导入错误 : No module named 'shapely'

python - 如何对坐标列表进行排序?

python - Pytorch:更新 numpy 数组而不更新相应的张量

python-3.x - 如何从 Dask 中的 zip 文件读取多个 csv 文件?

python - 用户输入的数字为 "Counting digits"(Python 2.x)

python - socket __exit__ 在 python 中关闭吗?

python - Dataframe - 在按日期分组的行中查找第一个 0