python - 使用多索引将数据帧的一部分与另一部分进行比较

我有一个具有 3 级 MultiIndex 的数据框:

>>> np.random.seed(0)
>>> df = pd.DataFrame(np.random.randint(10, size=(18, 2)),
                      index=pd.MultiIndex.from_product([[True, False],
                                                        ['yes', 'no', 'maybe'],
                                                        ['one', 'two', 'three']],
                                                       names=['bool', 'ans', 'count']),
                      columns=['A', 'B'])
>>> df
                   A  B
bool  ans   count      
True  yes   one    5  0
            two    3  3
            three  7  9
      no    one    3  5
            two    2  4
            three  7  6
      maybe one    8  8
            two    1  6
            three  7  7
False yes   one    8  1
            two    5  9
            three  8  9
      no    one    4  3
            two    0  3
            three  5  0
      maybe one    2  3
            two    8  1
            three  3  3

我的目标是从具有相同 bool 和 count 的所有其他值中减去 maybe 值。减数是

>>> sub = df.loc[(slice(None), 'maybe', slice(None)), :]
>>> sub
                   A  B
bool  ans   count      
True  maybe one    8  8
            two    1  6
            three  7  7
False maybe one    2  3
            two    8  1
            three  3  3

问题是，当我尝试从其他项目中减去它时，索引与预期不匹配:

>>> df - sub
                     A    B
bool  ans   count          
False maybe one    0.0  0.0
            three  0.0  0.0
            two    0.0  0.0
      no    one    NaN  NaN
            three  NaN  NaN
            two    NaN  NaN
      yes   one    NaN  NaN
            three  NaN  NaN
            two    NaN  NaN
True  maybe one    0.0  0.0
            three  0.0  0.0
            two    0.0  0.0
      no    one    NaN  NaN
            three  NaN  NaN
            two    NaN  NaN
      yes   one    NaN  NaN
            three  NaN  NaN
            two    NaN  NaN

我想要的结果是

                   A   B
bool  ans   count
True  yes   one   -3  -8
            two    2  -3
            three  0   2
True  no    one   -5  -3
            two    1  -2
            three  0  -1
True  maybe one    0   0
            two    0   0
            three  0   0
False yes   one    6  -2
            two   -3   8
            three  5   6
False no    one    2   0
            two   -8   2
            three  2  -3
False maybe one    0   0
            two    0   0
            three  0   0

如何告诉 pandas 遵循 bool 和 count 级别，但忽略 ans 级别？

最佳答案

横截面xs可能是一个不错的选择:

df.sub(
    df.xs('maybe', level=1)
).swaplevel().reindex(df.index)

输出:

                   A  B
bool  ans   count      
True  yes   one   -3 -8
            two    2 -3
            three  0  2
      no    one   -5 -3
            two    1 -2
            three  0 -1
      maybe one    0  0
            two    0  0
            three  0  0
False yes   one    6 -2
            two   -3  8
            three  5  6
      no    one    2  0
            two   -8  2
            three  2 -3
      maybe one    0  0
            two    0  0
            three  0  0

关于python - 使用多索引将数据帧的一部分与另一部分进行比较，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/67203888/

python - 使用多索引将数据帧的一部分与另一部分进行比较

上一篇：python - `multiprocessing` 与 `concurrent.futures` 中的最大 worker 数

下一篇：python - 运行程序时出现“await”外部函数错误