Pandas MultiIndex 切片和索引

我刚刚开始使用多帧，并且在有关切片和索引的相当稀疏的文档和在线示例方面遇到了一些麻烦。

考虑以下多帧

import pandas as pd
import numpy as np
levels={
'produce_source':['Vendor A', 'Vendor B'],
'day':['mon','wed','fri'],
'chiller_temp':['low','mid'],
'fruit':['apples','pears','nanas']
}

index = pd.MultiIndex.from_product(levels.values(), names = list(levels.keys()))
df = pd.DataFrame(index=index)
df = df.assign(deliveries=np.random.rand(len(df)))


                                        deliveries
produce_source day chiller_temp fruit             
Vendor A       mon low          apples    0.748376
                                pears     0.639824
                                nanas     0.604342
                   mid          apples    0.160837
                                pears     0.970412
                                nanas     0.301815
               wed low          apples    0.572627
                                pears     0.254242
                                nanas     0.590702
                   mid          apples    0.153772
                                pears     0.180117
                                nanas     0.858085
               fri low          apples    0.535358
                                pears     0.576359
                                nanas     0.893993
                   mid          apples    0.334602
                                pears     0.053892
                                nanas     0.778767
Vendor B       mon low          apples    0.565761
                                pears     0.437994
                                nanas     0.090994
                   mid          apples    0.261041
                                pears     0.028795
                                nanas     0.057612
               wed low          apples    0.808108
                                pears     0.914724
                                nanas     0.020663
                   mid          apples    0.055319
                                pears     0.888612
                                nanas     0.623370
               fri low          apples    0.419422
                                pears     0.938593
                                nanas     0.358441
                   mid          apples    0.534191
                                pears     0.590103
                                nanas     0.753034

实现以下目标的最Pythonic方法是什么

1)以切片形式查看所有 wed 数据

1a) 延伸目标:不关心“day”是index.names[1]，而是按索引名称“day”进行索引

2) 仅向该 wed 切片写入可迭代的数据

3) 为所有供应商、日期和水果添加一个高的 chiller_temp

我看到使用 idx = pd.IndexSlice 进行了一些切片。

idx = pd.IndexSlice
df_wip = df.loc[idx[:,'wed'], ] #1)  
#would love to write to df_wip sliced df here but get slice copy warning with df_wip['deliveries'] = list(range(0,100*len(df_wip),100)) 
df = df.loc[idx[:,'wed'],'deliveries'] = list(range(0,100*len(df_wip),100)) #2)

这会引发错误 AttributeError: 'list' 对象没有属性 'loc'

df = df.loc[idx[:,'wed'],'deliveries'] = pd.Series(range(0,100*len(df_wip),100)) #2)

引发类型错误:不可散列类型:'切片'

最佳答案

1) View all the wed data as a slice

要查看多索引中的数据，使用 .xs(横截面)要容易得多，它允许您指定特定索引级别的值，而不是像 .loc w/slice 那样键入所有级别让你做:

df.xs('wed', level='day')

Out:
                                        deliveries
produce_source  chiller_temp    fruit   
Vendor A        low             apples  0.521861
                                pears   0.741856
                                nanas   0.245843
                mid             apples  0.471135
                                pears   0.191322
                                nanas   0.153920
Vendor B        low             apples  0.711457
                                pears   0.211794
                                nanas   0.599071
                mid             apples  0.303910
                                pears   0.657348
                                nanas   0.111750

2) Write an iterable of data only to that wed slice

如果我理解正确的话，您正在尝试将“deliveries”列中的值替换为日期为“wed”的特定可迭代对象(例如列表)。不幸的是，.loc 类型替换在这种情况下不起作用。据我所知，pandas 只有简单的语法，可以使用 .at 或 .loc 以这种方式替换单个单元格的值(请参阅此 SO answer )。但是，我们可以使用 iterrows 来完成此操作:

idx = pd.IndexSlice

# If we don't change the column's type, which was float, this will error
df['deliveries'] = df['deliveries'].astype(object)

# Loop through rows, replacing single values
# Only necessary if the new assigned value is mutable
for index, row in df.loc[idx[:,'wed'], 'deliveries':'deliveries'].iterrows():
    df.at[index, 'deliveries'] = ["We", "changed", "this"]

df.head(10)

Out:
                                            deliveries
produce_source  day  chiller_temp   fruit   
Vendor A        mon  low            apples  0.0287606
                                    pears   0.264512
                                    nanas   0.238089
                     mid            apples  0.814985
                                    pears   0.590967
                                    nanas   0.919351
                wed  low            apples  [We, changed, this]
                                    pears   [We, changed, this]
                                    nanas   [We, changed, this]
                     mid            apples  [We, changed, this]

虽然据我所知需要循环，但在我的选择中使用 df.xs 然后 df.update 而不是 .loc 更容易理解。例如，以下代码与上面的 .loc 代码执行相同的操作:

df['deliveries'] = df['deliveries'].astype(object)

# Create a temporary copy of our cross section
df2 = df.xs('wed', level='day', drop_level=False)

# The same loop as before
for index, row in df2.iterrows():
    df2.at[index, 'deliveries'] = ["We", "changed", "this"]

# Update the original df for the values we want from df2
df.update(df2, join="left", overwrite=True, filter_func=None, raise_conflict=False)

3) add a chiller_temp of high for all vendors and days and fruits

替换多重索引现有级别中的值需要替换整个级别。这可以通过 df.index.set_levels (IMO 更简单的方法)或 pd.MultiIndex.from_arrays 来完成。根据具体的用例， map 和/或替换可能有用。查看this SO answer其他一些例子。

df.index = df.index.set_levels(['high' for v in df.index.get_level_values('chiller_temp')], level='chiller_temp')

4) I saw some slicing happening using idx = pd.IndexSlice...This raises an error AttributeError: 'list' object has no attribute 'loc'...raises TypeError: unhashable type: 'slice'

对于 AttributeError: 'list' object has no attribute 'loc' 和 TypeError: unhashable type: 'slice' 错误，您在这些行中只有两个赋值。

看起来你的 .loc 语法是正确的，除了你不能以这种方式分配 pd.Series 而不导致单元格值为 NaN (请参阅答案 2)以获取正确的语法)。这有效:

idx = pd.IndexSlice
df.loc[idx[:,'wed'], 'deliveries':'deliveries'] = "We changed this"

关于Pandas MultiIndex 切片和索引，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51458293/

Pandas MultiIndex 切片和索引

上一篇：google-colaboratory - 你可以在google colaboratory中使用rmagic(rpy2)吗？

下一篇：r - fit.Matrix(tcm, model) : inherits(model, "mlapiEstimation") 中的错误不是 TRUE