python - 根据 'Columns' 数据在 Pivot 'Values' 之间添加计算字段

标签 python pandas pivot-table

我正在处理一份报告,以显示两个季度之间的差异。我有一个 SQL 查询,我正在将其读入 Pandas 数据框,然后进行旋转。

这是我的代码:

    df = pd.read_sql_query(mtd_query, cnxn, params=[report_start, end_mtd, report_start, end_mtd, whse])
    ##(m-1)//3 + 1  Determine which Quarter each month is
    ## Create the "Period" column by combining the Quater and the Month
    df['QUARTER'] = (df['INV_MONTH'].astype(int) - 1)//3 + 1
    df['PERIOD'] = df['INV_YEAR'].astype(str) + 'Q' + df['QUARTER'].astype(int).astype(str)
    df['MARGIN'] = (df['PROFIT'].astype(float) / df['SALES'].astype(float))

    df = df.drop('INV_MONTH', axis=1)
    df = df.drop('INV_YEAR', axis=1)
    df = pd.pivot_table(df, index=['REP', 'REP_NAME', 'CUST_NO', 'CUST_NAME', 'TOTALSALES'], columns=['PERIOD'], values=['SALES', 'PROFIT', 'MARGIN'], fill_value=0)
    df = df.reorder_levels([1, 0], axis=1).sort_index(axis=1, ascending=False)
    df = df.sortlevel(level=0, ascending=True)

我正在尝试确定“期间”之间的“ margin ”列之间的差异。我一直无法找到任何方法来实现这一点。任何建议表示赞赏。

当前输出显示:

PERIOD                                                                                            2017Q4                                 2017Q3                                 2017Q2                                 2017Q1                                 2016Q4                        
                                                                                                   SALES        PROFIT    MARGIN          SALES        PROFIT    MARGIN          SALES        PROFIT    MARGIN          SALES        PROFIT    MARGIN          SALES        PROFIT    MARGIN
REP    REP_NAME                       CUST_NO  CUST_NAME                      TOTALSALES                                                                                                                                                                                                    
1.0    Greensboro - House             245.0    TE CONNECTIVITY CORPORATION    103361.05         0.000000      0.000000  0.000000     434.500000     69.520000  0.160000   20391.666667   3262.666667  0.160000       0.000000      0.000000  0.000000       0.000000      0.000000  0.000000
                                      1789.0   GOOD HOUSEKEEPER               50108.47        678.508182     80.170909  0.145883     585.301429     64.180476  0.121915     718.685000     92.033125  0.130453     720.729333     97.955333  0.134821    1237.308333     88.210000  0.099450

所需的输出如下所示:

PERIOD                                                                                            2017Q4                                 2017Q3                                 2017Q2                                 2017Q1                                 2016Q4                        
                                                                                                   SALES        PROFIT    MARGIN   VARIANCE          SALES        PROFIT    MARGIN    VARIANCE          SALES        PROFIT    MARGIN    VARIANCE          SALES        PROFIT    MARGIN    VARIANCE          SALES        PROFIT    MARGIN
REP    REP_NAME                       CUST_NO  CUST_NAME                      TOTALSALES                                                                                                                                                                                                    
1.0    Greensboro - House             245.0    TE CONNECTIVITY CORPORATION    103361.05         0.000000      0.000000  0.000000    -.16         434.500000     69.520000  0.160000    0           20391.666667   3262.666667  0.160000    .16           0.000000      0.000000  0.000000      0            0.000000      0.000000  0.000000
                                      1789.0   GOOD HOUSEKEEPER               50108.47        678.508182     80.170909  0.145883    .023968     585.301429     64.180476  0.121915    -0.008537     718.685000     92.033125  0.130453    -.004368     720.729333     97.955333  0.134821     .035372       1237.308333     88.210000  0.099450

df.to_dict('r') 下面:

[{('2016Q4', 'SALES'): 0.0, ('2017Q3', 'PROFIT'): 69.520000000000067, ('2017Q1', 'PROFIT'): 0.0, ('2017Q2', 'SALES'): 20391.666666666668, ('2017Q3', 'MARGIN'): 0.16, ('2016Q4', 'PROFIT'): 0.0, ('2017Q3', 'SALES'): 434.5, ('2017Q1', 'SALES'): 0.0, ('2017Q4', 'SALES'): 0.0, ('2016Q4', 'MARGIN'): 0.0, ('2017Q4', 'PROFIT'): 0.0, ('2017Q1', 'MARGIN'): 0.0, ('2017Q4', 'MARGIN'): 0.0, ('2017Q2', 'MARGIN'): 0.16, ('2017Q2', 'PROFIT'): 3262.6666666666665}, {('2016Q4', 'SALES'): 1237.3083333333332, ('2017Q3', 'PROFIT'): 64.180476190476185, ('2017Q1', 'PROFIT'): 97.9553333333333, ('2017Q2', 'SALES'): 718.68500000000006, ('2017Q3', 'MARGIN'): 0.1219152103415191, ('2016Q4', 'PROFIT'): 88.209999999999994}]

最佳答案

IIUC:

来源 DF:

In [60]: df
Out[60]:
  2016Q4                     2017Q1                  2017Q2               \
  MARGIN PROFIT        SALES MARGIN     PROFIT SALES MARGIN       PROFIT
0    0.0   0.00     0.000000    0.0   0.000000   0.0   0.16  3262.666667
1    NaN  88.21  1237.308333    NaN  97.955333   NaN    NaN          NaN

                   2017Q3                   2017Q4
          SALES    MARGIN     PROFIT  SALES MARGIN PROFIT SALES
0  20391.666667  0.160000  69.520000  434.5    0.0    0.0   0.0
1    718.685000  0.121915  64.180476    NaN    NaN    NaN   NaN

解决方法:

In [61]: tmp = (df.loc[:, pd.IndexSlice[:, 'MARGIN']]
    ...:          .fillna(0)
    ...:          .diff(axis=1)
    ...:          .rename(columns=lambda x: 'VARIANCE' if x=='MARGIN' else x))
    ...:

In [62]: pd.concat([df, tmp], axis=1).sort_index(axis=1)
Out[62]:
  2016Q4                              2017Q1                           2017Q2  \
  MARGIN PROFIT        SALES VARIANCE MARGIN     PROFIT SALES VARIANCE MARGIN
0    0.0   0.00     0.000000      NaN    0.0   0.000000   0.0      0.0   0.16
1    NaN  88.21  1237.308333      NaN    NaN  97.955333   NaN      0.0    NaN

                                         2017Q3                              \
        PROFIT         SALES VARIANCE    MARGIN     PROFIT  SALES  VARIANCE
0  3262.666667  20391.666667     0.16  0.160000  69.520000  434.5  0.000000
1          NaN    718.685000     0.00  0.121915  64.180476    NaN  0.121915

  2017Q4
  MARGIN PROFIT SALES  VARIANCE
0    0.0    0.0   0.0 -0.160000
1    NaN    NaN   NaN -0.121915

关于python - 根据 'Columns' 数据在 Pivot 'Values' 之间添加计算字段,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47759982/

相关文章:

python - 霍夫变换未检测到正确的圆

python - 将列从一个 Pandas DataFrame 映射到另一个

Python - 基于 100 万行表上的日期差异的向量化条件变量总和

excel - 使用 VBA 选择数据透视表中的最后 3 个值(周),但不包含 'blank'

MySQL 列名作为与联结表的多对多关系的字段名

python - 使用 Python Pandas 在一个 Excel 文件中创建多个电子表格

python - 在 Python 中从另一个文件导入变量

python - 通过 subprocess.call 设置 env 以在远程 Linux 计算机上运行 python 脚本时出错

Python/Pandas 在一列的上方/下方找到最接近的值

excel - Excel 数据透视表中的加权平均值