我刚刚开始使用多帧,并且在有关切片和索引的相当稀疏的文档和在线示例方面遇到了一些麻烦。
考虑以下多帧
import pandas as pd
import numpy as np
levels={
'produce_source':['Vendor A', 'Vendor B'],
'day':['mon','wed','fri'],
'chiller_temp':['low','mid'],
'fruit':['apples','pears','nanas']
}
index = pd.MultiIndex.from_product(levels.values(), names = list(levels.keys()))
df = pd.DataFrame(index=index)
df = df.assign(deliveries=np.random.rand(len(df)))
deliveries
produce_source day chiller_temp fruit
Vendor A mon low apples 0.748376
pears 0.639824
nanas 0.604342
mid apples 0.160837
pears 0.970412
nanas 0.301815
wed low apples 0.572627
pears 0.254242
nanas 0.590702
mid apples 0.153772
pears 0.180117
nanas 0.858085
fri low apples 0.535358
pears 0.576359
nanas 0.893993
mid apples 0.334602
pears 0.053892
nanas 0.778767
Vendor B mon low apples 0.565761
pears 0.437994
nanas 0.090994
mid apples 0.261041
pears 0.028795
nanas 0.057612
wed low apples 0.808108
pears 0.914724
nanas 0.020663
mid apples 0.055319
pears 0.888612
nanas 0.623370
fri low apples 0.419422
pears 0.938593
nanas 0.358441
mid apples 0.534191
pears 0.590103
nanas 0.753034
实现以下目标的最Pythonic方法是什么
1)以切片形式查看所有 wed 数据
1a) 延伸目标:不关心“day”是index.names[1],而是按索引名称“day”进行索引
2) 仅向该 wed 切片写入可迭代的数据
3) 为所有供应商、日期和水果添加一个高的 chiller_temp
我看到使用 idx = pd.IndexSlice 进行了一些切片。
idx = pd.IndexSlice
df_wip = df.loc[idx[:,'wed'], ] #1)
#would love to write to df_wip sliced df here but get slice copy warning with df_wip['deliveries'] = list(range(0,100*len(df_wip),100))
df = df.loc[idx[:,'wed'],'deliveries'] = list(range(0,100*len(df_wip),100)) #2)
这会引发错误 AttributeError: 'list' 对象没有属性 'loc'
df = df.loc[idx[:,'wed'],'deliveries'] = pd.Series(range(0,100*len(df_wip),100)) #2)
引发类型错误:不可散列类型:'切片'
最佳答案
1) View all the wed data as a slice
要查看多索引中的数据,使用 .xs(横截面)要容易得多,它允许您指定特定索引级别的值,而不是像 .loc w/slice 那样键入所有级别让你做:
df.xs('wed', level='day')
Out:
deliveries
produce_source chiller_temp fruit
Vendor A low apples 0.521861
pears 0.741856
nanas 0.245843
mid apples 0.471135
pears 0.191322
nanas 0.153920
Vendor B low apples 0.711457
pears 0.211794
nanas 0.599071
mid apples 0.303910
pears 0.657348
nanas 0.111750
2) Write an iterable of data only to that wed slice
如果我理解正确的话,您正在尝试将“deliveries”列中的值替换为日期为“wed”的特定可迭代对象(例如列表)。不幸的是,.loc 类型替换在这种情况下不起作用。据我所知,pandas 只有简单的语法,可以使用 .at 或 .loc 以这种方式替换单个单元格的值(请参阅此 SO answer )。但是,我们可以使用 iterrows 来完成此操作:
idx = pd.IndexSlice
# If we don't change the column's type, which was float, this will error
df['deliveries'] = df['deliveries'].astype(object)
# Loop through rows, replacing single values
# Only necessary if the new assigned value is mutable
for index, row in df.loc[idx[:,'wed'], 'deliveries':'deliveries'].iterrows():
df.at[index, 'deliveries'] = ["We", "changed", "this"]
df.head(10)
Out:
deliveries
produce_source day chiller_temp fruit
Vendor A mon low apples 0.0287606
pears 0.264512
nanas 0.238089
mid apples 0.814985
pears 0.590967
nanas 0.919351
wed low apples [We, changed, this]
pears [We, changed, this]
nanas [We, changed, this]
mid apples [We, changed, this]
虽然据我所知需要循环,但在我的选择中使用 df.xs 然后 df.update 而不是 .loc 更容易理解。例如,以下代码与上面的 .loc 代码执行相同的操作:
df['deliveries'] = df['deliveries'].astype(object)
# Create a temporary copy of our cross section
df2 = df.xs('wed', level='day', drop_level=False)
# The same loop as before
for index, row in df2.iterrows():
df2.at[index, 'deliveries'] = ["We", "changed", "this"]
# Update the original df for the values we want from df2
df.update(df2, join="left", overwrite=True, filter_func=None, raise_conflict=False)
3) add a chiller_temp of high for all vendors and days and fruits
替换多重索引现有级别中的值需要替换整个级别。这可以通过 df.index.set_levels (IMO 更简单的方法)或 pd.MultiIndex.from_arrays 来完成。根据具体的用例, map 和/或替换可能有用。查看this SO answer其他一些例子。
df.index = df.index.set_levels(['high' for v in df.index.get_level_values('chiller_temp')], level='chiller_temp')
4) I saw some slicing happening using idx = pd.IndexSlice...This raises an error AttributeError: 'list' object has no attribute 'loc'...raises TypeError: unhashable type: 'slice'
对于 AttributeError: 'list' object has no attribute 'loc'
和 TypeError: unhashable type: 'slice'
错误,您在这些行中只有两个赋值。
看起来你的 .loc 语法是正确的,除了你不能以这种方式分配 pd.Series 而不导致单元格值为 NaN (请参阅答案 2)以获取正确的语法)。这有效:
idx = pd.IndexSlice
df.loc[idx[:,'wed'], 'deliveries':'deliveries'] = "We changed this"
关于Pandas MultiIndex 切片和索引,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51458293/