我想知道我是否可以在不聚合的情况下将 DataFrame 塑造成多索引和多标题/多列(枢轴)DataFrame,因为这种聚合计算已经存在于我的 DataFrame 的列上。
我有以下数据框:
card_type payment_status airbnb paid revenue - sum revenue - min debit - sum
American Express Checked Out Premium Queen Ensuite No 591.49 0.0 2
American Express Checked Out Queen Room w. Shared Facilities No 255.52 0.0 2
American Express Checked Out Single Room w. Shared Facilities No 1602.02 0.0 5
American Express Confirmed Compact Double Room w. Shared Facilities No 189.05 0.0 1
American Express Confirmed Premium Queen Ensuite No 350.0 0.0 1
American Express Confirmed Queen Room w. Shared Facilities Yes 110.53 0.0 1
American Express Confirmed Single Room w. Shared Facilities No 4258.48 0.0 3
Mastercard Cancelled Queen Room w. Shared Facilities No 28.5 0.0 3
Mastercard Cancelled Single Room w. Shared Facilities Yes 578.55 0.0 2
Mastercard Checked Out Compact Double Room w. Shared Facilities No 4637.71 0.0 22
...
df = pd.DataFrame.from_dict({
'card_type': {0: 'American Express', 1: 'American Express', 2: 'American Express', 3: 'American Express', 4: 'American Express', 5: 'American Express', 6: 'American Express', 7: 'Mastercard', 8: 'Mastercard', 9: 'Mastercard'},
'payment_status': {0: 'Checked Out', 1: 'Checked Out', 2: 'Checked Out', 3: 'Confirmed', 4: 'Confirmed', 5: 'Confirmed', 6: 'Confirmed', 7: 'Cancelled', 8: 'Cancelled', 9: 'Checked Out'},
'airbnb': {0: 'Premium Queen Ensuite ', 1: 'Queen Room w. Shared Facilities ', 2: 'Single Room w. Shared Facilities ', 3: 'Compact Double Room w. Shared Facilities ', 4: 'Premium Queen Ensuite ', 5: 'Queen Room w. Shared Facilities ', 6: 'Single Room w. Shared Facilities ', 7: 'Queen Room w. Shared Facilities ', 8: 'Single Room w. Shared Facilities ', 9: 'Compact Double Room w. Shared Facilities '},
'paid': {0: 'No', 1: 'No', 2: 'No', 3: 'No', 4: 'No', 5: 'Yes', 6: 'No', 7: 'No', 8: 'Yes', 9: 'No'},
'revenue - sum': {0: 591.49, 1: 255.52, 2: 1602.02, 3: 189.05, 4: 350.0, 5: 110.53, 6: 4258.48,7: 28.5, 8: 578.55, 9: 4637.71},
'revenue - min': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0},
'debit - sum': {0: 2, 1: 2, 2: 5, 3: 1, 4: 1, 5: 1, 6: 3, 7: 3, 8: 2, 9: 22}})
我已经使用这种方法(基于 Pandas Pivot table without aggregating )来实现(部分)我正在寻找的形状。但是,我想将 aggfuncs 标签交换到底部(可能使用 https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.swaplevel.html ),但感觉不对,因为我的值之前已经计算过,我们不需要再次计算:
df.pivot_table(index=["card_type", "payment_status"], columns=["airbnb", "paid"], values=["revenue - sum", "revenue - min", "debit - sum"], aggfunc={"revenue - sum": ["sum"], "revenue - min": ["max"], "debit - sum": ["mean"]}, fill_value="-")
有什么办法可以解决这个问题?谢谢!
最佳答案
如果您已经计算了您的值,您可以使用:
pivot_table
与aggfunc='first'
和fill_value='_'
pivot
和fillna('-')
对于您的列级别,请使用 reorder_levels
代替 swaplevel
使用输入顺序重新排列列级别:级别 [0, 1, 2] 到 [1, 2, 0]:
out = df.pivot(index=["card_type", "payment_status"],
columns=["airbnb", "paid"],
values=["revenue - sum", "revenue - min", "debit - sum"]) \
.fillna('-').reorder_levels([1, 2, 0], axis=1)
输出:
>>> out
airbnb Premium Queen Ensuite Queen Room w. Shared Facilities Single Room w. Shared Facilities ... Compact Double Room w. Shared Facilities Queen Room w. Shared Facilities Single Room w. Shared Facilities
paid No No No ... No Yes Yes
revenue - sum revenue - sum revenue - sum ... debit - sum debit - sum debit - sum
card_type payment_status ...
American Express Checked Out 591.49 255.52 1602.02 ... - - -
Confirmed 350.0 - 4258.48 ... 1.0 1.0 -
Mastercard Cancelled - 28.5 - ... - - 2.0
Checked Out - - - ... 22.0 - -
更新
I would like to create one more level which results from the split of values by: "-"
由于您必须将某些列名称分成两部分,因此请使用不同的策略。首先,移动一些列作为数据框的索引,然后将剩余的列名称分解为多级。最后,拆分您的 airbnb
和 paid
索引级别,然后重新排列列级别的顺序:
out = df.set_index(['card_type', 'payment_status', 'airbnb', 'paid'])
out.columns = out.columns.str.split(' - ').map(tuple)
out = out.unstack(['airbnb', 'paid']) \
.loc[:, lambda x: x.any()].fillna('-') \
.reorder_levels([2, 3, 0, 1], axis=1)
输出:
>>> out
airbnb Compact Double Room w. Shared Facilities Premium Queen Ensuite Queen Room w. Shared Facilities Single Room w. Shared Facilities ... Premium Queen Ensuite Queen Room w. Shared Facilities Single Room w. Shared Facilities
paid No No No Yes No ... No No Yes No Yes
revenue revenue revenue revenue revenue ... debit debit debit debit debit
sum sum sum sum sum ... sum sum sum sum sum
card_type payment_status ...
American Express Checked Out - 591.49 255.52 - 1602.02 ... 2.0 2.0 - 5.0 -
Confirmed 189.05 350.0 - 110.53 4258.48 ... 1.0 - 1.0 3.0 -
Mastercard Cancelled - - 28.5 - - ... - 3.0 - - 2.0
Checked Out 4637.71 - - - - ... - - - - -
[4 rows x 12 columns]
关于python - 没有聚合的 Pandas 数据透视表形状,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70790037/