python - 如何将字典文本文件读入数据帧

我有一个来自 Kaggle of Clash Royale 统计数据的文本文件。它采用 Python 字典的格式。我正在努力找出如何以有意义的方式将其读入文件。好奇最好的方法是什么。这是一个相当复杂的带有列表的字典。

原始数据集在这里: https://www.kaggle.com/s1m0n38/clash-royale-matches-dataset

{'players': {'right': {'deck': [['Mega Minion', '9'], ['Electro Wizard', '3'], ['Arrows', '11'], ['Lightning', '5'], ['Tombstone', '9'], ['The Log', '2'], ['Giant', '9'], ['Bowler', '5']], 'trophy': '4258', 'clan': 'TwoFiveOne', 'name': 'gpa raid'}, 'left': {'deck': [['Fireball', '9'], ['Archers', '12'], ['Goblins', '12'], ['Minions', '11'], ['Bomber', '12'], ['The Log', '2'], ['Barbarians', '12'], ['Royal Giant', '13']], 'trophy': '4325', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['2', '0'], 'time': '2017-07-12'}
{'players': {'right': {'deck': [['Ice Spirit', '10'], ['Valkyrie', '9'], ['Hog Rider', '9'], ['Inferno Tower', '9'], ['Goblins', '12'], ['Musketeer', '9'], ['Zap', '12'], ['Fireball', '9']], 'trophy': '4237', 'clan': 'The Wolves', 'name': 'TITAN'}, 'left': {'deck': [['Royal Giant', '13'], ['Ice Wizard', '2'], ['Bomber', '12'], ['Knight', '12'], ['Fireball', '9'], ['Barbarians', '12'], ['The Log', '2'], ['Archers', '12']], 'trophy': '4296', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['1', '0'], 'time': '2017-07-12'}
{'players': {'right': {'deck': [['Miner', '3'], ['Ice Golem', '9'], ['Spear Goblins', '12'], ['Minion Horde', '12'], ['Inferno Tower', '8'], ['The Log', '2'], ['Skeleton Army', '6'], ['Fireball', '10']], 'trophy': '4300', 'clan': '@LA PERLA NEGRA', 'name': 'Victor'}, 'left': {'deck': [['Royal Giant', '13'], ['Ice Wizard', '2'], ['Bomber', '12'], ['Knight', '12'], ['Fireball', '9'], ['Barbarians', '12'], ['The Log', '2'], ['Archers', '12']], 'trophy': '4267', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['0', '1'], 'time': '2017-07-12'}

最佳答案

根据此数据集的synopsis on kaggle ，每个字典代表两个玩家之间的一场比赛。我觉得让数据框中的每一行代表单个匹配的所有特征是有意义的。

这可以通过几个简短的步骤来完成。

将所有匹配字典(来自 kaggle 的数据集的每一行)存储在一个列表中:

matches = [
{'players': {'right': {'deck': [['Mega Minion', '9'], ['Electro Wizard', '3'], ['Arrows', '11'], ['Lightning', '5'], ['Tombstone', '9'], ['The Log', '2'], ['Giant', '9'], ['Bowler', '5']], 'trophy': '4258', 'clan': 'TwoFiveOne', 'name': 'gpa raid'}, 'left': {'deck': [['Fireball', '9'], ['Archers', '12'], ['Goblins', '12'], ['Minions', '11'], ['Bomber', '12'], ['The Log', '2'], ['Barbarians', '12'], ['Royal Giant', '13']], 'trophy': '4325', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['2', '0'], 'time': '2017-07-12'},
{'players': {'right': {'deck': [['Ice Spirit', '10'], ['Valkyrie', '9'], ['Hog Rider', '9'], ['Inferno Tower', '9'], ['Goblins', '12'], ['Musketeer', '9'], ['Zap', '12'], ['Fireball', '9']], 'trophy': '4237', 'clan': 'The Wolves', 'name': 'TITAN'}, 'left': {'deck': [['Royal Giant', '13'], ['Ice Wizard', '2'], ['Bomber', '12'], ['Knight', '12'], ['Fireball', '9'], ['Barbarians', '12'], ['The Log', '2'], ['Archers', '12']], 'trophy': '4296', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['1', '0'], 'time': '2017-07-12'},
{'players': {'right': {'deck': [['Miner', '3'], ['Ice Golem', '9'], ['Spear Goblins', '12'], ['Minion Horde', '12'], ['Inferno Tower', '8'], ['The Log', '2'], ['Skeleton Army', '6'], ['Fireball', '10']], 'trophy': '4300', 'clan': '@LA PERLA NEGRA', 'name': 'Victor'}, 'left': {'deck': [['Royal Giant', '13'], ['Ice Wizard', '2'], ['Bomber', '12'], ['Knight', '12'], ['Fireball', '9'], ['Barbarians', '12'], ['The Log', '2'], ['Archers', '12']], 'trophy': '4267', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['0', '1'], 'time': '2017-07-12'}
]

从上面的列表中创建一个数据框，该数据框将自动填充包含类型、时间和结果信息的列匹配:

df = pd.DataFrame(matches)

然后，使用一些简单的逻辑来填充包含牌组、奖杯、氏族和名称信息的列比赛中左右双方球员的code>:

sides = ['right', 'left']
player_keys = ['deck', 'trophy', 'clan', 'name']

for side in sides:
    for key in player_keys:
        for i, row in df.iterrows():
            df[side + '_' + key] = df['players'].apply(lambda x: x[side][key])

df = df.drop('players', axis=1) # no longer need this after populating the other columns

df = df.iloc[:, ::-1] # made sense to display columns in order of player info from left to right,
                      # followed by general match info at the far right of the dataframe

生成的数据框如下所示:

    left_name   left_clan   left_trophy   left_deck                                           right_name    right_clan  right_trophy    right_deck                                          type    time         result
0   Supr4       battusai           4325   [[Fireball, 9], [Archers, 12], [Goblins, 12], ...   gpa raid      TwoFiveOne          4258    [[Mega Minion, 9], [Electro Wizard, 3], [Arrow...   ladder  2017-07-12   [2, 0]
1   Supr4       battusai           4296   [[Royal Giant, 13], [Ice Wizard, 2], [Bomber, ...   TITAN The     Wolves              4237    [[Ice Spirit, 10], [Valkyrie, 9], [Hog Rider, ...   ladder  2017-07-12   [1, 0]
2   Supr4       battusai           4267   [[Royal Giant, 13], [Ice Wizard, 2], [Bomber, ...   Victor        @LA PERLA NEGRA     4300    [[Miner, 3], [Ice Golem, 9], [Spear Goblins, 1...   ladder  2017-07-12   [0, 1]

关于python - 如何将字典文本文件读入数据帧，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54489615/

python - 如何将字典文本文件读入数据帧

上一篇：python - 查找与 pandas 中的条件匹配的第 N 行

下一篇：Python 相当于将 POST 负载用单引号括起来