python - 如何将字典文本文件读入数据帧

标签 python pandas dictionary text json-normalize

我有一个来自 Kaggle of Clash Royale 统计数据的文本文件。它采用 Python 字典的格式。我正在努力找出如何以有意义的方式将其读入文件。好奇最好的方法是什么。这是一个相当复杂的带有列表的字典。

原始数据集在这里: https://www.kaggle.com/s1m0n38/clash-royale-matches-dataset

{'players': {'right': {'deck': [['Mega Minion', '9'], ['Electro Wizard', '3'], ['Arrows', '11'], ['Lightning', '5'], ['Tombstone', '9'], ['The Log', '2'], ['Giant', '9'], ['Bowler', '5']], 'trophy': '4258', 'clan': 'TwoFiveOne', 'name': 'gpa raid'}, 'left': {'deck': [['Fireball', '9'], ['Archers', '12'], ['Goblins', '12'], ['Minions', '11'], ['Bomber', '12'], ['The Log', '2'], ['Barbarians', '12'], ['Royal Giant', '13']], 'trophy': '4325', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['2', '0'], 'time': '2017-07-12'}
{'players': {'right': {'deck': [['Ice Spirit', '10'], ['Valkyrie', '9'], ['Hog Rider', '9'], ['Inferno Tower', '9'], ['Goblins', '12'], ['Musketeer', '9'], ['Zap', '12'], ['Fireball', '9']], 'trophy': '4237', 'clan': 'The Wolves', 'name': 'TITAN'}, 'left': {'deck': [['Royal Giant', '13'], ['Ice Wizard', '2'], ['Bomber', '12'], ['Knight', '12'], ['Fireball', '9'], ['Barbarians', '12'], ['The Log', '2'], ['Archers', '12']], 'trophy': '4296', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['1', '0'], 'time': '2017-07-12'}
{'players': {'right': {'deck': [['Miner', '3'], ['Ice Golem', '9'], ['Spear Goblins', '12'], ['Minion Horde', '12'], ['Inferno Tower', '8'], ['The Log', '2'], ['Skeleton Army', '6'], ['Fireball', '10']], 'trophy': '4300', 'clan': '@LA PERLA NEGRA', 'name': 'Victor'}, 'left': {'deck': [['Royal Giant', '13'], ['Ice Wizard', '2'], ['Bomber', '12'], ['Knight', '12'], ['Fireball', '9'], ['Barbarians', '12'], ['The Log', '2'], ['Archers', '12']], 'trophy': '4267', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['0', '1'], 'time': '2017-07-12'}

最佳答案

根据此数据集的synopsis on kaggle ,每个字典代表两个玩家之间的一场比赛。我觉得让数据框中的每一行代表单个匹配的所有特征是有意义的。

这可以通过几个简短的步骤来完成。

  1. 将所有匹配字典(来自 kaggle 的数据集的每一行)存储在一个列表中:
matches = [
{'players': {'right': {'deck': [['Mega Minion', '9'], ['Electro Wizard', '3'], ['Arrows', '11'], ['Lightning', '5'], ['Tombstone', '9'], ['The Log', '2'], ['Giant', '9'], ['Bowler', '5']], 'trophy': '4258', 'clan': 'TwoFiveOne', 'name': 'gpa raid'}, 'left': {'deck': [['Fireball', '9'], ['Archers', '12'], ['Goblins', '12'], ['Minions', '11'], ['Bomber', '12'], ['The Log', '2'], ['Barbarians', '12'], ['Royal Giant', '13']], 'trophy': '4325', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['2', '0'], 'time': '2017-07-12'},
{'players': {'right': {'deck': [['Ice Spirit', '10'], ['Valkyrie', '9'], ['Hog Rider', '9'], ['Inferno Tower', '9'], ['Goblins', '12'], ['Musketeer', '9'], ['Zap', '12'], ['Fireball', '9']], 'trophy': '4237', 'clan': 'The Wolves', 'name': 'TITAN'}, 'left': {'deck': [['Royal Giant', '13'], ['Ice Wizard', '2'], ['Bomber', '12'], ['Knight', '12'], ['Fireball', '9'], ['Barbarians', '12'], ['The Log', '2'], ['Archers', '12']], 'trophy': '4296', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['1', '0'], 'time': '2017-07-12'},
{'players': {'right': {'deck': [['Miner', '3'], ['Ice Golem', '9'], ['Spear Goblins', '12'], ['Minion Horde', '12'], ['Inferno Tower', '8'], ['The Log', '2'], ['Skeleton Army', '6'], ['Fireball', '10']], 'trophy': '4300', 'clan': '@LA PERLA NEGRA', 'name': 'Victor'}, 'left': {'deck': [['Royal Giant', '13'], ['Ice Wizard', '2'], ['Bomber', '12'], ['Knight', '12'], ['Fireball', '9'], ['Barbarians', '12'], ['The Log', '2'], ['Archers', '12']], 'trophy': '4267', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['0', '1'], 'time': '2017-07-12'}
]
  • 从上面的列表中创建一个数据框,该数据框将自动填充包含类型时间结果信息的列匹配:
  • df = pd.DataFrame(matches)
    
  • 然后,使用一些简单的逻辑来填充包含牌组奖杯氏族名称信息的列比赛中左右双方球员的code>:
  • sides = ['right', 'left']
    player_keys = ['deck', 'trophy', 'clan', 'name']
    
    for side in sides:
        for key in player_keys:
            for i, row in df.iterrows():
                df[side + '_' + key] = df['players'].apply(lambda x: x[side][key])
    
    df = df.drop('players', axis=1) # no longer need this after populating the other columns
    
    df = df.iloc[:, ::-1] # made sense to display columns in order of player info from left to right,
                          # followed by general match info at the far right of the dataframe
    

    生成的数据框如下所示:

        left_name   left_clan   left_trophy   left_deck                                           right_name    right_clan  right_trophy    right_deck                                          type    time         result
    0   Supr4       battusai           4325   [[Fireball, 9], [Archers, 12], [Goblins, 12], ...   gpa raid      TwoFiveOne          4258    [[Mega Minion, 9], [Electro Wizard, 3], [Arrow...   ladder  2017-07-12   [2, 0]
    1   Supr4       battusai           4296   [[Royal Giant, 13], [Ice Wizard, 2], [Bomber, ...   TITAN The     Wolves              4237    [[Ice Spirit, 10], [Valkyrie, 9], [Hog Rider, ...   ladder  2017-07-12   [1, 0]
    2   Supr4       battusai           4267   [[Royal Giant, 13], [Ice Wizard, 2], [Bomber, ...   Victor        @LA PERLA NEGRA     4300    [[Miner, 3], [Ice Golem, 9], [Spear Goblins, 1...   ladder  2017-07-12   [0, 1]
    

    关于python - 如何将字典文本文件读入数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54489615/

    相关文章:

    python - pandas.read_csv 未按分号分隔符对数据进行分区

    python - 如何为我的数据创建三个新列?

    python:递归更新字典的值

    用于文本文件中值的 Python 拆分函数

    使用文件而不是字典时的python for循环

    python - 通过 psycopg2 获取警告信息

    python - 打印 bool 值正则表达式匹配的真实结果-Pandas Dataframe

    python - 从 "non uniform"python 字典中选择随机元素

    python - 如何查找一维张量中的重复元素

    python - 如何在 Django 中组合/嵌套枚举字段?