Python pandas 创建不均匀的多索引

标签 python pandas dataframe

我有以下代码,

IDX_VALS_BANKNOTER_PATRIMONY = [['PATRIMONY'],['GOLD']]
IDX_VALS_BANKNOTER_ASSETS = [['ASSETS'],['DEPOSITS', 'ADVANCES']]
IDX_VALS_BANKNOTER_LIABILITIES = [['LIABILITIES'], ['CLIENTS', 'SUPPLIERS']]

IDX_BANKNOTER_PATRIMONY = pd.MultiIndex.from_product(IDX_VALS_BANKNOTER_PATRIMONY)
IDX_BANKNOTER_ASSETS = pd.MultiIndex.from_product(IDX_VALS_BANKNOTER_ASSETS)
IDX_BANKNOTER_LIABILITIES = pd.MultiIndex.from_product(IDX_VALS_BANKNOTER_LIABILITIES)

IDX_BANKNOTER = IDX_BANKNOTER_PATRIMONY.append(IDX_BANKNOTER_ASSETS).append(IDX_BANKNOTER_LIABILITIES)

print(IDX_BANKNOTER)
打印以下索引:
MultiIndex([(  'PATRIMONY',      'GOLD'),
            (     'ASSETS',  'DEPOSITS'),
            (     'ASSETS',  'ADVANCES'),
            ('LIABILITIES',   'CLIENTS'),
            ('LIABILITIES', 'SUPPLIERS')],
           )
(我使用 .from_product() 因为我希望最终添加更多标签)
我的问题如下:我想在第三列上扩展这个多索引,以便我得到一个如下所示的多索引:
'PATRIMONY', 'GOLD',
'ASSETS', 'DEPOSITS',
'ASSETS', 'ADVANCES',
'LIABILITIES', 'CLIENTS', 'Dr. Foo'
'LIABILITIES', 'CLIENTS', 'Dr. House'
'LIABILITIES', 'CLIENTS', 'Richard'
'LIABILITIES', 'SUPPLIERS', 'PORT1',
'LIABILITIES', 'SUPPLIERS', 'PORT2'
这意味着多重索引将是不均匀的,第三个级别仅由“LIABILITIES”使用,并且根据客户名称或供应商名称为 CLIENTS 和 SUPPLIERS 使用不同的索引。我尝试附加以下索引:
IDX_FIRST_EXTENSION_NAMES = [['LIABILITIES'], ['CLIENTS'], ['Dr. Foo', 'Dr. House', 'Richard']]
IDX_FIRST_EXTENSION = pd.MultiIndex.from_product(IDX_FIRST_EXTENSION_NAMES)
IDX_SECOND_EXTENSION_NAMES = [['LIABILITIES'], ['SUPPLIERS'], ['PORT1', 'PORT2']]
IDX_SECOND_EXTENSION = pd.MultiIndex.from_product(IDX_SECOND_EXTENSION_NAMES)
DESIRED_RESULT = IDX_BANKNOTER.append(IDX_FIRST_EXTENSION).append(IDX_SECOND_EXTENSION)
但我得到的返回是:
MultiIndex([(  'PATRIMONY',      'GOLD'),
            (     'ASSETS',  'DEPOSITS'),
            (     'ASSETS',  'ADVANCES'),
            ('LIABILITIES',   'CLIENTS'),
            ('LIABILITIES',   'CLIENTS'),
            ('LIABILITIES',   'CLIENTS'),
            ('LIABILITIES', 'SUPPLIERS'),
            ('LIABILITIES', 'SUPPLIERS')],
           )
我对使用 Pandas 相当陌生,关于多索引的文档并没有帮助(它有相当数量的初始化多索引的示例,并且没有不均匀多索引的示例)。有人有指点吗?我正在制作这个多索引以方便操作相应的数据,例如能够访问特定的客户帐户
df['LIABILITIES']['CLIENTS']['(CLIENT NAME)']
或者能够获得 ['CLIENTS'] 下所有值的总和.理想情况下,我希望保留数据帧的列作为时间标签。
任何帮助表示赞赏,谢谢。

最佳答案

代码:

import pandas as pd

IDX_VALS_BANKNOTER_PATRIMONY = [['PATRIMONY'],['GOLD'], ['']]
IDX_VALS_BANKNOTER_ASSETS = [['ASSETS'],['DEPOSITS', 'ADVANCES'], ['']]

IDX_BANKNOTER_PATRIMONY = pd.MultiIndex.from_product(IDX_VALS_BANKNOTER_PATRIMONY)
IDX_BANKNOTER_ASSETS = pd.MultiIndex.from_product(IDX_VALS_BANKNOTER_ASSETS)

IDX_BANKNOTER = IDX_BANKNOTER_PATRIMONY.append(IDX_BANKNOTER_ASSETS)

IDX_FIRST_EXTENSION_NAMES = [['LIABILITIES'], ['CLIENTS'], ['Dr. Foo', 'Dr. House', 'Richard']]
IDX_FIRST_EXTENSION = pd.MultiIndex.from_product(IDX_FIRST_EXTENSION_NAMES)
IDX_SECOND_EXTENSION_NAMES = [['LIABILITIES'], ['SUPPLIERS'], ['PORT1', 'PORT2']]
IDX_SECOND_EXTENSION = pd.MultiIndex.from_product(IDX_SECOND_EXTENSION_NAMES)
WANTED_RESULT = IDX_BANKNOTER.append(IDX_FIRST_EXTENSION).append(IDX_SECOND_EXTENSION)

print(WANTED_RESULT)
输出:
MultiIndex([(  'PATRIMONY',      'GOLD',          ''),
            (     'ASSETS',  'DEPOSITS',          ''),
            (     'ASSETS',  'ADVANCES',          ''),
            ('LIABILITIES',   'CLIENTS',   'Dr. Foo'),
            ('LIABILITIES',   'CLIENTS', 'Dr. House'),
            ('LIABILITIES',   'CLIENTS',   'Richard'),
            ('LIABILITIES', 'SUPPLIERS',     'PORT1'),
            ('LIABILITIES', 'SUPPLIERS',     'PORT2')],
           )

关于Python pandas 创建不均匀的多索引,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63208223/

相关文章:

c# - Python 中的 XMLSerializer (C#) 等效项?

Python:将唯一 ID 分组并将值合并到数组中

python/pandas 查找两个日期之间的年数

python - 将列 append 到新数据框

python - 如何根据条件删除 Pandas 数据框中的列?

r - 数据框中的 list 不完整

python - 我们可以创建一个不满足用来创建对象的类吗?

python - 如何在没有 next 的情况下迭代可迭代对象?

Python Pandas 将字符串和数字连接成一个字符串

python - 如何计算股票回调