python - `.loc` 和 `.iloc` 与 MultiIndex'd DataFrame

当索引一个 MultiIndex-ed DataFrame 时，似乎 .iloc 假定您引用索引的“内部级别”，而 .loc 查看外部水平。

例如:

np.random.seed(123)
iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]
idx = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(8, 4), index=idx)

# .loc looks at the outer index:

print(df.loc['qux'])
# df.loc['two'] would throw KeyError
              0        1        2        3
second                                    
one    -1.25388 -0.63775  0.90711 -1.42868
two    -0.14007 -0.86175 -0.25562 -2.79859

# while .iloc looks at the inner index:

print(df.iloc[-1])
0   -0.14007
1   -0.86175
2   -0.25562
3   -2.79859
Name: (qux, two), dtype: float64

两个问题:

首先，这是为什么？这是一个深思熟虑的设计决定吗？

其次，我可以使用 .iloc 来引用索引的外层，以产生下面的结果吗？我知道我可以先用 get_level_values 找到索引的最后一个成员，然后用它找到 .loc-index，但如果可以更直接地完成它，我会徘徊使用时髦的 .iloc 语法或一些专门为这种情况设计的现有功能。

# df.iloc[-1]
qux   one     0.89071  1.75489  1.49564  1.06939
      two    -0.77271  0.79486  0.31427 -1.32627

最佳答案

是的，这是一个deliberate design decision :

.iloc is a strict positional indexer, it does not regard the structure at all, only the first actual behavior. ... .loc does take into account the level behavior. [emphasis added]

因此，问题中给出的预期结果无法通过 .iloc 以灵活的方式实现。在几个类似的问题中使用的最接近的解决方法是

print(df.loc[[df.index.get_level_values(0)[-1]]])
                    0        1        2        3
first second                                    
qux   one    -1.25388 -0.63775  0.90711 -1.42868
      two    -0.14007 -0.86175 -0.25562 -2.79859

使用 double brackets将保留第一个索引级别。

关于python - `.loc` 和 `.iloc` 与 MultiIndex'd DataFrame，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45967702/

python - `.loc` 和 `.iloc` 与 MultiIndex'd DataFrame

上一篇：python - 双端队列随机访问在 python 中为 O(n) 而在 C++ 中为 O(1)，为什么？

下一篇：Python 3.5 与 3.6 相比，是什么让 "map"比理解更慢