python - 没有聚合的 Pandas 数据透视表形状

标签 python pandas dataframe

我想知道我是否可以在不聚合的情况下将 DataFrame 塑造成多索引和多标题/多列(枢轴)DataFrame,因为这种聚合计算已经存在于我的 DataFrame 的列上。

我有以下数据框:

card_type           payment_status  airbnb                                     paid revenue - sum   revenue - min   debit - sum
American Express    Checked Out     Premium Queen Ensuite                      No   591.49          0.0             2
American Express    Checked Out     Queen Room w. Shared Facilities            No   255.52          0.0             2
American Express    Checked Out     Single Room w. Shared Facilities           No   1602.02         0.0             5
American Express    Confirmed       Compact Double Room w. Shared Facilities   No   189.05          0.0             1
American Express    Confirmed       Premium Queen Ensuite                      No   350.0           0.0             1
American Express    Confirmed       Queen Room w. Shared Facilities            Yes  110.53          0.0             1
American Express    Confirmed       Single Room w. Shared Facilities           No   4258.48         0.0             3
Mastercard          Cancelled       Queen Room w. Shared Facilities            No   28.5            0.0             3
Mastercard          Cancelled       Single Room w. Shared Facilities           Yes  578.55          0.0             2
Mastercard          Checked Out     Compact Double Room w. Shared Facilities   No   4637.71         0.0             22

...

df = pd.DataFrame.from_dict({
    'card_type': {0: 'American Express', 1: 'American Express', 2: 'American Express', 3: 'American Express', 4: 'American Express', 5: 'American Express', 6: 'American Express', 7: 'Mastercard', 8: 'Mastercard', 9: 'Mastercard'},
    'payment_status': {0: 'Checked Out', 1: 'Checked Out', 2: 'Checked Out', 3: 'Confirmed', 4: 'Confirmed', 5: 'Confirmed', 6: 'Confirmed', 7: 'Cancelled', 8: 'Cancelled', 9: 'Checked Out'},
    'airbnb': {0: 'Premium Queen Ensuite ', 1: 'Queen Room w. Shared Facilities ', 2: 'Single Room w. Shared Facilities ', 3: 'Compact Double Room w. Shared Facilities ', 4: 'Premium Queen Ensuite ', 5: 'Queen Room w. Shared Facilities ', 6: 'Single Room w. Shared Facilities ', 7: 'Queen Room w. Shared Facilities ', 8: 'Single Room w. Shared Facilities ', 9: 'Compact Double Room w. Shared Facilities '},
    'paid': {0: 'No', 1: 'No', 2: 'No', 3: 'No', 4: 'No', 5: 'Yes', 6: 'No', 7: 'No', 8: 'Yes', 9: 'No'},
    'revenue - sum': {0: 591.49, 1: 255.52, 2: 1602.02, 3: 189.05, 4: 350.0, 5: 110.53, 6: 4258.48,7: 28.5, 8: 578.55, 9: 4637.71},
    'revenue - min': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0},
    'debit - sum': {0: 2, 1: 2, 2: 5, 3: 1, 4: 1, 5: 1, 6: 3, 7: 3, 8: 2, 9: 22}})

我已经使用这种方法(基于 Pandas Pivot table without aggregating )来实现(部分)我正在寻找的形状。但是,我想将 aggfuncs 标签交换到底部(可能使用 https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.swaplevel.html ),但感觉不对,因为我的值之前已经计算过,我们不需要再次计算:

df.pivot_table(index=["card_type", "payment_status"], columns=["airbnb", "paid"], values=["revenue - sum", "revenue - min", "debit - sum"], aggfunc={"revenue - sum": ["sum"], "revenue - min": ["max"], "debit - sum": ["mean"]}, fill_value="-")

我期望实现的是一个类似这样的DataFrame: enter image description here

有什么办法可以解决这个问题?谢谢!

最佳答案

如果您已经计算了您的值,您可以使用:

  • pivot_tableaggfunc='first'fill_value='_'
  • pivotfillna('-')

对于您的列级别,请使用 reorder_levels代替 swaplevel 使用输入顺序重新排列列级别:级别 [0, 1, 2] 到 [1, 2, 0]:

out = df.pivot(index=["card_type", "payment_status"],
               columns=["airbnb", "paid"],
               values=["revenue - sum", "revenue - min", "debit - sum"]) \
        .fillna('-').reorder_levels([1, 2, 0], axis=1)

输出:

>>> out
airbnb                          Premium Queen Ensuite  Queen Room w. Shared Facilities  Single Room w. Shared Facilities   ... Compact Double Room w. Shared Facilities  Queen Room w. Shared Facilities  Single Room w. Shared Facilities 
paid                                                No                               No                                No  ...                                        No                              Yes                               Yes
                                         revenue - sum                    revenue - sum                     revenue - sum  ...                               debit - sum                      debit - sum                       debit - sum
card_type        payment_status                                                                                            ...                                                                                                             
American Express Checked Out                    591.49                           255.52                           1602.02  ...                                         -                                -                                 -
                 Confirmed                       350.0                                -                           4258.48  ...                                       1.0                              1.0                                 -
Mastercard       Cancelled                           -                             28.5                                 -  ...                                         -                                -                               2.0
                 Checked Out                         -                                -                                 -  ...                                      22.0                                -                                 -

更新

I would like to create one more level which results from the split of values by: "-"

由于您必须将某些列名称分成两部分,因此请使用不同的策略。首先,移动一些列作为数据框的索引,然后将剩余的列名称分解为多级。最后,拆分您的 airbnbpaid 索引级别,然后重新排列列级别的顺序:

out = df.set_index(['card_type', 'payment_status', 'airbnb', 'paid'])
out.columns = out.columns.str.split(' - ').map(tuple)
out = out.unstack(['airbnb', 'paid']) \
         .loc[:, lambda x: x.any()].fillna('-') \
         .reorder_levels([2, 3, 0, 1], axis=1)

输出:

>>> out
airbnb                          Compact Double Room w. Shared Facilities  Premium Queen Ensuite  Queen Room w. Shared Facilities          Single Room w. Shared Facilities   ... Premium Queen Ensuite  Queen Room w. Shared Facilities        Single Room w. Shared Facilities       
paid                                                                   No                     No                               No     Yes                                No  ...                     No                               No   Yes                                No   Yes
                                                                  revenue                revenue                          revenue revenue                           revenue  ...                  debit                            debit debit                             debit debit
                                                                      sum                    sum                              sum     sum                               sum  ...                    sum                              sum   sum                               sum   sum
card_type        payment_status                                                                                                                                              ...                                                                                                      
American Express Checked Out                                            -                 591.49                           255.52       -                           1602.02  ...                    2.0                              2.0     -                               5.0     -
                 Confirmed                                         189.05                  350.0                                -  110.53                           4258.48  ...                    1.0                                -   1.0                               3.0     -
Mastercard       Cancelled                                              -                      -                             28.5       -                                 -  ...                      -                              3.0     -                                 -   2.0
                 Checked Out                                      4637.71                      -                                -       -                                 -  ...                      -                                -     -                                 -     -

[4 rows x 12 columns]

关于python - 没有聚合的 Pandas 数据透视表形状,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70790037/

相关文章:

python模块层次结构命名约定

python - 如何找到一组最后一行和下一组第一行之间的时间差

python - 如何创建基于事件(每 x 事件)的 Stripe 订阅计划,而不是基于时间间隔(每 x 个月)的订阅计划?

python - Python Twisted 的数据库

python - "not in"比较未按预期工作

python - 热图未加载 seaborn 和 pandas 数据框

r - 使用动态分配的列名在 R 中创建一个 data.frame

python - 获取基于另一列但具有不同类别的列的百分比

python - 将 Pandas 中的单元格值附加到行中的空单元格中

python - 是否可以在 pandas 数据框中格式化字符串?