python - 基于多个级别 1 列的子集多索引 df

我有一个多索引数据框，但对于每个级别 0 变量(即列“一”和“二”)，我想在每个级别 1 中只保留两列。我可以分别对它们进行子集化，但我想一起做，这样我就可以并排保留这些值
这是数据框

index = pd.MultiIndex.from_tuples(list(zip(*[['bar1', 'foo1', 'bar1', 'foo2','bar3','foo3'], ['one','two','three','two','one','four']])))
df = pd.DataFrame(np.random.randn(2, 6), columns=index)

这是对级别 1 中的一列进行子集化的方法

df.iloc[:, df.columns.get_level_values(1)== 'one']
# or 
df.xs('one', level=1, axis=1)

# but adding two columns within either command will not work e.g. 
df.xs(('one','two), level=1, axis=1)

这将是预期的输出

         bar1        foo1       foo2         bar3
          one         two        two          one
0   -0.508272   -0.195379   0.865563     2.002205
1   -0.771565    1.360479   1.900931    -1.589277

欢迎任何建议，非常感谢!

最佳答案

这是使用 pd.IndexSlice 的一种方法:

idnx = pd.IndexSlice[:, ['one', 'two']]
df.loc[:, idnx]

输出:

       bar1      bar3      foo1      foo2
        one       one       two       two
0  0.589999  0.261224 -0.106588 -2.309628
1  0.646201 -0.491110  0.430724  1.027424

另一种使用鲜为人知的参数的方法，axis , 的 pd.DataFrame.loc :

df.loc(axis=1)[:, ['one', 'two']]

输出:

       bar1      bar3      foo1      foo2
        one       one       two       two
0  0.589999  0.261224 -0.106588 -2.309628
1  0.646201 -0.491110  0.430724  1.027424

注意:此参数未在 pd.DataFrame.loc 的文档化 API 中列出，但在 Using Slicers 的 MultiIndex/Advanced indexing 部分的用户指南中被引用。一段关于中间的例子。

关于python - 基于多个级别 1 列的子集多索引 df，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/68774700/

上一篇：asp.net-core - 用于 DataAnnotations 本地化的单个资源文件

下一篇：javascript - 一些 Tailwind 样式在 Next.js 的生产环境中不起作用

相关文章：

python - 使用包含时间序列的多索引对 Pandas 数据帧进行重采样

python - 基于开始和结束列扩展数据框(速度)

python - tox 下的测试不一定使用已安装的代码

python - 变量克隆行为不明确

python - 如何从 pandas 多索引数据框中选择此类数据

python - 如何将 4 个多索引级别行层次结构从 excel 上传到 pandas 数据框？

python - Python 中的 "Most likely due to circular import"

Python/Datetime/Pandas - 更改日期时间格式

python - 带 x 值标签的 seaborn 条形图(无色调)

python - 在数据帧上使用 pandas 的重新索引方法时，为什么原始值会丢失？