python - Pandas 可以将 Excel 组结构读取为 Multiindex 吗?

标签 python excel pandas multi-index

我有一个 Excel 文件,其中有一些(大部分)很好地分组的行。我在下面构建了一个假示例。

有没有办法让 Pandas 中的 read_excel 生成保留此结构的多重索引?

investment sample

在此示例中,多重索引有四个级别(家庭、个人、 child (可选)、投资)。如果小计值丢失了也没关系,因为它们可以很容易地在 Pandas 中重新创建。

最佳答案

不,pandas 无法读取这样的结构。

另一种解决方案是使用 pandas 读取数据,但将其转换为易于访问的字典,而不是使用 MultiIndex 将数据保存在数据框中。

为了让您的数据更有用,有 2 个合理的要求:

  1. 让您的投资基金名称独一无二。这是微不足道的。
  2. 将 Excel 分组转换为附加列,该列指示该行的父级

在下面的示例中,假定了这 2 个要求。

设置

from collections import defaultdict
from functools import reduce
import operator
import pandas as pd

df = pd.DataFrame({'name': ['Simpson Family', 'Marge Simpson', 'Maggies College Fund',
                            'MCF Investment 2', 'MS Investment 1', 'MS Investment 2', 'MS Investment 3',
                            'Homer Simpson', 'HS Investment 1', 'HS Investment 3', 'HS Investment 2',
                            'Griffin Family', 'Lois Griffin', 'LG Investment 2', 'LG Investment 3',
                            'Brian Giffin', 'BG Investment 3'],
                   'Value': [600, 450, 100, 100, 100, 200, 50, 150, 100, 50, 0, 200, 150, 100, 50, 50, 50],
                   'parent': ['Families', 'Simpson Family', 'Marge Simpson', 'Maggies College Fund',
                              'Marge Simpson', 'Marge Simpson', 'Marge Simpson', 'Simpson Family',
                              'Homer Simpson', 'Homer Simpson', 'Homer Simpson', 'Families',
                              'Griffin Family', 'Lois Griffin', 'Lois Griffin', 'Griffin Family',
                              'Brian Giffin']})

    Value                  name                parent  
0     600        Simpson Family              Families   
1     450         Marge Simpson        Simpson Family   
2     100  Maggies College Fund         Marge Simpson   
3     100      MCF Investment 2  Maggies College Fund   
4     100       MS Investment 1         Marge Simpson   
5     200       MS Investment 2         Marge Simpson   
6      50       MS Investment 3         Marge Simpson   
7     150         Homer Simpson        Simpson Family   
8     100       HS Investment 1         Homer Simpson   
9      50       HS Investment 3         Homer Simpson   
10      0       HS Investment 2         Homer Simpson   
11    200        Griffin Family              Families   
12    150          Lois Griffin        Griffin Family   
13    100       LG Investment 2          Lois Griffin   
14     50       LG Investment 3          Lois Griffin   
15     50          Brian Giffin        Griffin Family   
16     50       BG Investment 3          Brian Giffin

第 1 步

定义一个子->父字典和一些实用函数:

child_parent_dict = df.set_index('name')['parent'].to_dict()

tree = lambda: defaultdict(tree)

d = tree()

def get_all_parents(child):

    """Get all parents from hierarchy structure"""

    while child != 'Families':
        child = child_parent_dict[child]
        if child != 'Families':
            yield child

def getFromDict(dataDict, mapList):

    """Iterate nested dictionary"""

    return reduce(operator.getitem, mapList, dataDict)

def default_to_regular_dict(d):

    """Convert nested defaultdict to regular dict of dicts."""

    if isinstance(d, defaultdict):
        d = {k: default_to_regular_dict(v) for k, v in d.items()}
    return d

第 2 步

将此应用到您的数据框。使用它来创建嵌套字典结构,这对于重复查询将更加有效。

df['structure'] = df['name'].apply(lambda x: ['Families'] + list(get_all_parents(x))[::-1])

for idx, row in df.iterrows():
    getFromDict(d, row['structure'])[row['name']]['Value'] = row['Value']

res = default_to_regular_dict(d)

结果

数据框

    Value                  name                parent  \
0     600        Simpson Family              Families   
1     450         Marge Simpson        Simpson Family   
2     100  Maggies College Fund         Marge Simpson   
3     100      MCF Investment 2  Maggies College Fund   
4     100       MS Investment 1         Marge Simpson   
5     200       MS Investment 2         Marge Simpson   
6      50       MS Investment 3         Marge Simpson   
7     150         Homer Simpson        Simpson Family   
8     100       HS Investment 1         Homer Simpson   
9      50       HS Investment 3         Homer Simpson   
10      0       HS Investment 2         Homer Simpson   
11    200        Griffin Family              Families   
12    150          Lois Griffin        Griffin Family   
13    100       LG Investment 2          Lois Griffin   
14     50       LG Investment 3          Lois Griffin   
15     50          Brian Giffin        Griffin Family   
16     50       BG Investment 3          Brian Giffin   

                                            structure  
0                                          [Families]  
1                          [Families, Simpson Family]  
2           [Families, Simpson Family, Marge Simpson]  
3   [Families, Simpson Family, Marge Simpson, Magg...  
4           [Families, Simpson Family, Marge Simpson]  
5           [Families, Simpson Family, Marge Simpson]  
6           [Families, Simpson Family, Marge Simpson]  
7                          [Families, Simpson Family]  
8           [Families, Simpson Family, Homer Simpson]  
9           [Families, Simpson Family, Homer Simpson]  
10          [Families, Simpson Family, Homer Simpson]  
11                                         [Families]  
12                         [Families, Griffin Family]  
13           [Families, Griffin Family, Lois Griffin]  
14           [Families, Griffin Family, Lois Griffin]  
15                         [Families, Griffin Family]  
16           [Families, Griffin Family, Brian Giffin]

字典

{'Families': {'Griffin Family': {'Brian Giffin': {'BG Investment 3': {'Value': 50},
                                                  'Value': 50},
                                 'Lois Griffin': {'LG Investment 2': {'Value': 100}, 'LG Investment 3': {'Value': 50},
                                                  'Value': 150},
                                 'Value': 200},
              'Simpson Family': {'Homer Simpson': {'HS Investment 1': {'Value': 100}, 'HS Investment 2': {'Value': 0}, 'HS Investment 3': {'Value': 50},
                                                   'Value': 150},
                                 'Marge Simpson': {'MS Investment 1': {'Value': 100}, 'MS Investment 2': {'Value': 200}, 'MS Investment 3': {'Value': 50},
                                                   'Maggies College Fund': {'MCF Investment 2': {'Value': 100},
                                                                            'Value': 100},
                                                   'Value': 450},
              'Value': 600}}}

关于python - Pandas 可以将 Excel 组结构读取为 Multiindex 吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49826003/

相关文章:

python - 项目分配 Tensorflow 2.0 - TypeError : 'tensorflow.python.framework.ops.EagerTensor' object does not support item assignment

python - Fabric 将应用程序部署到代理服务器

excel - 从 Word 中提取作者和评论到 Excel

vba - 如何将长字符串分成多行

python - 按组重新采样 pandas 系列

python - 根据数据框中的条件对值进行排序

Python 调用使用 argparser 的模块

python - pandas DataFrame 中 x 天内每个元素的累积乘积

excel - 比较 2 个单元格之间的值,其中一个单元格具有自定义数字格式

python - 解释sklearn的scale()和乘以STD并加上平均值之间的区别