给定一个数据帧
col1 col2 col3
0 1 4 7
1 2 5 8
2 3 6 9
如何得到这样的东西: 0 1 2
0 1.0 2.0 3.0
1 5.0 4.0 7.0
2 9.0 6.0 NaN
3 NaN 8.0 NaN
如果我们将数据帧视为索引数组
i, j
那么在 diag n 中将是那些 abs (i-j) = n
一个加号是能够选择订单:
intercale = True ,first_diag = 'left'
0 1 2
0 1.0 2.0 3.0
1 5.0 4.0 7.0
2 9.0 6.0 NaN
3 NaN 8.0 NaN
intercalate = False, first_diag ='left' 0 1 2
0 1.0 2.0 3.0
1 5.0 6.0 7.0
2 9.0 4.0 NaN
3 NaN 8.0 NaN
intercalate = True, first_diag ='right' 0 1 2
0 1.0 4.0 7.0
1 5.0 2.0 3.0
2 9.0 8.0 NaN
3 NaN 6.0 NaN
intercalate = False, first_diag ='right' 0 1 2
0 1.0 4.0 7.0
1 5.0 8.0 3.0
2 9.0 2.0 NaN
3 NaN 6.0 NaN
通过选择从下角到上角或相反的方向,甚至可以有另一个自由度进行排序。或者选择另一条主对角线我对 Pandas 的态度
df2 = df.reset_index().melt('index').assign(variable = lambda x: x.variable.factorize()[0])
df2['diag'] = df2['index'].sub(df2['variable']).abs()
new_df = (df2.assign(index = df2.groupby('diag').cumcount())
.pivot_table(index = 'index',columns = 'diag',values = 'value'))
print(new_df)
diag 0 1 2
index
0 1.0 2.0 3.0
1 5.0 4.0 7.0
2 9.0 6.0 NaN
3 NaN 8.0 NaN
我想知道是否有更简单的方法可以做到这一点,也许用 numpy
最佳答案
案例#1:每列输出中元素的顺序并不重要
方法#1:这是 NumPy 的一种方式 -
def diagonalize(a): # input is array and output is df
n = len(a)
r = np.arange(n)
idx = np.abs(r[:,None]-r)
lens = np.r_[n,np.arange(2*n-2,0,-2)]
split_idx = lens.cumsum()
b = a.flat[idx.ravel().argsort()]
v = np.split(b,split_idx[:-1])
return pd.DataFrame(v).T
sample 运行 -
In [110]: df
Out[110]:
col1 col2 col3 col4
0 1 2 3 4
1 5 6 7 8
2 9 10 11 12
3 13 14 15 16
In [111]: diagonalize(df.to_numpy(copy=False))
Out[111]:
0 1 2 3
0 1.0 2.0 3.0 4.0
1 6.0 5.0 8.0 13.0
2 11.0 7.0 9.0 NaN
3 16.0 10.0 14.0 NaN
4 NaN 12.0 NaN NaN
5 NaN 15.0 NaN NaN
方法#2:与之前类似,但完全基于 NumPy 且无循环 -
def diagonalize_v2(a): # input, outputs are arrays
# Setup params
n = len(a)
r = np.arange(n)
# Get indices based on "diagonalization" (distance off diagonal)
idx = np.abs(r[:,None]-r)
lens = np.r_[n,np.arange(2*n-2,0,-2)]
# Values in the order of "diagonalization"
b = a.flat[idx.ravel().argsort()]
# Get a mask for the final o/p where elements are to be assigned
mask = np.arange(lens.max())[:,None]<lens
# Setup o/p and assign
out = np.full(mask.shape,np.nan)
out.T[mask.T] = b
return out
sample 运行 -
In [2]: a
Out[2]:
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12],
[13, 14, 15, 16]])
In [3]: diagonalize_v2(a)
Out[3]:
array([[ 1., 2., 3., 4.],
[ 6., 5., 8., 13.],
[11., 7., 9., nan],
[16., 10., 14., nan],
[nan, 12., nan, nan],
[nan, 15., nan, nan]])
案例#2:每列中元素的顺序很重要
我们有两个额外的输入参数来管理订单。解决方案是一个主要受
Approach #1
启发的修改版本。 ——def diagonalize_generic(a, intercale = True ,first_diag = 'left'):
# Setup params
n = len(a)
r = np.arange(n)
# Get indices based on "diagonalization" (distance off diagonal)
idx = np.abs(r[:,None]-r)
lens = np.r_[n,np.arange(2*n-2,0,-2)]
if first_diag=='left':
w = np.triu(np.ones(n, dtype=int))
elif first_diag=='right':
w = np.tril(np.ones(n, dtype=int))
else:
raise Exception('Wrong first_diag value!')
order = np.lexsort(np.c_[w.ravel(),idx.ravel()].T)
split_idx = lens.cumsum()
o_split = np.split(order,split_idx[:-1])
f = a.flat
if intercale==1:
v = [f[o_split[0]]] + [f[o.reshape(2,-1).ravel('F')] for o in o_split[1:]]
else:
v = [f[o] for o in o_split]
return pd.DataFrame(v).T
sample 运行
输入为数组:
In [53]: a
Out[53]:
array([[1, 4, 7],
[2, 5, 8],
[3, 6, 9]])
不同的场景:
In [54]: diagonalize_generic(a, intercale = True, first_diag = 'left')
Out[54]:
0 1 2
0 1.0 2.0 3.0
1 5.0 4.0 7.0
2 9.0 6.0 NaN
3 NaN 8.0 NaN
In [55]: diagonalize_generic(a, intercale = False, first_diag = 'left')
Out[55]:
0 1 2
0 1.0 2.0 3.0
1 5.0 6.0 7.0
2 9.0 4.0 NaN
3 NaN 8.0 NaN
In [56]: diagonalize_generic(a, intercale = True, first_diag = 'right')
Out[56]:
0 1 2
0 1.0 4.0 7.0
1 5.0 2.0 3.0
2 9.0 8.0 NaN
3 NaN 6.0 NaN
In [57]: diagonalize_generic(a, intercale = False, first_diag = 'right')
Out[57]:
0 1 2
0 1.0 4.0 7.0
1 5.0 8.0 3.0
2 9.0 2.0 NaN
3 NaN 6.0 NaN
关于python - 按对角线旋转数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59885197/