python - 按对角线旋转数据框

标签 python pandas numpy dataframe

给定一个数据帧

   col1  col2  col3
0     1     4     7
1     2     5     8
2     3     6     9
如何得到这样的东西:
         0    1    2
               
0      1.0  2.0  3.0
1      5.0  4.0  7.0
2      9.0  6.0  NaN
3      NaN  8.0  NaN

如果我们将数据帧视为索引数组 i, j那么在 diag n 中将是那些 abs (i-j) = n
一个加号是能够选择订单:
intercale = True ,first_diag = 'left'
         0    1    2
               
0      1.0  2.0  3.0
1      5.0  4.0  7.0
2      9.0  6.0  NaN
3      NaN  8.0  NaN
intercalate = False, first_diag ='left'
         0    1    2
               
0      1.0  2.0  3.0
1      5.0  6.0  7.0
2      9.0  4.0  NaN
3      NaN  8.0  NaN
intercalate = True, first_diag ='right'
         0    1    2
               
0      1.0  4.0  7.0
1      5.0  2.0  3.0
2      9.0  8.0  NaN
3      NaN  6.0  NaN
intercalate = False, first_diag ='right'
         0    1    2
               
0      1.0  4.0  7.0
1      5.0  8.0  3.0
2      9.0  2.0  NaN
3      NaN  6.0  NaN
通过选择从下角到上角或相反的方向,甚至可以有另一个自由度进行排序。或者选择另一条主对角线
我对 Pandas 的态度
df2 = df.reset_index().melt('index').assign(variable = lambda x: x.variable.factorize()[0])
df2['diag'] = df2['index'].sub(df2['variable']).abs()
new_df = (df2.assign(index = df2.groupby('diag').cumcount())
             .pivot_table(index = 'index',columns = 'diag',values = 'value'))
print(new_df)
diag     0    1    2
index               
0      1.0  2.0  3.0
1      5.0  4.0  7.0
2      9.0  6.0  NaN
3      NaN  8.0  NaN
我想知道是否有更简单的方法可以做到这一点,也许用 numpy

最佳答案

案例#1:每列输出中元素的顺序并不重要

方法#1:这是 NumPy 的一种方式 -

def diagonalize(a): # input is array and output is df
    n = len(a)    
    r = np.arange(n)

    idx = np.abs(r[:,None]-r)
    lens = np.r_[n,np.arange(2*n-2,0,-2)]
    split_idx = lens.cumsum()

    b = a.flat[idx.ravel().argsort()]
    v = np.split(b,split_idx[:-1])
    return pd.DataFrame(v).T

sample 运行 -
In [110]: df
Out[110]: 
   col1  col2  col3  col4
0     1     2     3     4
1     5     6     7     8
2     9    10    11    12
3    13    14    15    16

In [111]: diagonalize(df.to_numpy(copy=False))
Out[111]: 
      0     1     2     3
0   1.0   2.0   3.0   4.0
1   6.0   5.0   8.0  13.0
2  11.0   7.0   9.0   NaN
3  16.0  10.0  14.0   NaN
4   NaN  12.0   NaN   NaN
5   NaN  15.0   NaN   NaN

方法#2:与之前类似,但完全基于 NumPy 且无循环 -
def diagonalize_v2(a): # input, outputs are arrays
    # Setup params
    n = len(a)    
    r = np.arange(n)

    # Get indices based on "diagonalization" (distance off diagonal)
    idx = np.abs(r[:,None]-r)
    lens = np.r_[n,np.arange(2*n-2,0,-2)]

    # Values in the order of "diagonalization"
    b = a.flat[idx.ravel().argsort()]

    # Get a mask for the final o/p where elements are to be assigned
    mask = np.arange(lens.max())[:,None]<lens

    # Setup o/p and assign
    out = np.full(mask.shape,np.nan)
    out.T[mask.T] = b
    return out

sample 运行 -
In [2]: a
Out[2]: 
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])

In [3]: diagonalize_v2(a)
Out[3]: 
array([[ 1.,  2.,  3.,  4.],
       [ 6.,  5.,  8., 13.],
       [11.,  7.,  9., nan],
       [16., 10., 14., nan],
       [nan, 12., nan, nan],
       [nan, 15., nan, nan]])

案例#2:每列中元素的顺序很重要

我们有两个额外的输入参数来管理订单。解决方案是一个主要受 Approach #1 启发的修改版本。 ——
def diagonalize_generic(a, intercale = True ,first_diag = 'left'):
    # Setup params
    n = len(a)    
    r = np.arange(n)

    # Get indices based on "diagonalization" (distance off diagonal)
    idx = np.abs(r[:,None]-r)
    lens = np.r_[n,np.arange(2*n-2,0,-2)]

    if first_diag=='left':
        w = np.triu(np.ones(n, dtype=int))
    elif first_diag=='right':
        w = np.tril(np.ones(n, dtype=int))
    else:
        raise Exception('Wrong first_diag value!')

    order = np.lexsort(np.c_[w.ravel(),idx.ravel()].T)

    split_idx = lens.cumsum()
    o_split = np.split(order,split_idx[:-1])

    f = a.flat

    if intercale==1:
        v = [f[o_split[0]]] + [f[o.reshape(2,-1).ravel('F')] for o in o_split[1:]]
    else:
        v = [f[o] for o in o_split]
    return pd.DataFrame(v).T

sample 运行

输入为数组:
In [53]: a
Out[53]: 
array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

不同的场景:
In [54]: diagonalize_generic(a, intercale = True, first_diag = 'left')
Out[54]: 
     0    1    2
0  1.0  2.0  3.0
1  5.0  4.0  7.0
2  9.0  6.0  NaN
3  NaN  8.0  NaN

In [55]: diagonalize_generic(a, intercale = False, first_diag = 'left')
Out[55]: 
     0    1    2
0  1.0  2.0  3.0
1  5.0  6.0  7.0
2  9.0  4.0  NaN
3  NaN  8.0  NaN

In [56]: diagonalize_generic(a, intercale = True, first_diag = 'right')
Out[56]: 
     0    1    2
0  1.0  4.0  7.0
1  5.0  2.0  3.0
2  9.0  8.0  NaN
3  NaN  6.0  NaN

In [57]: diagonalize_generic(a, intercale = False, first_diag = 'right')
Out[57]: 
     0    1    2
0  1.0  4.0  7.0
1  5.0  8.0  3.0
2  9.0  2.0  NaN
3  NaN  6.0  NaN

关于python - 按对角线旋转数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59885197/

相关文章:

python - 使用 pip 有条件地安装 wheel 文件

python - 如何使用二维数组索引 df 并查找另一个数组中的值?

python - Pandas 中的日期时间转换问题

python - 从对中创建 pandas 数据框的快速方法

python - NumPy:将 1D 数组连接到 3D 数组

arrays - Numpy.count_nonzero 在 64 位 Windows 平台上崩溃

python - numpy random.shuffle 的意外结果

Python:导入错误:没有名为 os 的模块

python - Facebook 将不正确的数据传递给数据库

Python 元循环求值器