python - 为什么 pandas unstack 会抛出错误?

标签 python pandas numpy dataframe stack

我正在尝试拆开两列:

cols = res.columns[:31]
res[cols] = res[cols].ffill()
res = res.set_index(cols + [31])[32].unstack().reset_index().rename_axis(None, 1)

但是我得到一个错误:

TypeError: can only perform ops with scalar values

我应该怎么做才能避免它?

我原来的问题:LINK

最佳答案

我认为需要将列转换为列表:

cols = res.columns[:31].tolist()

编辑:

Index contains duplicate entries, cannot reshape

表示重复,这里是前 6 列,所以不可能创建新的 DataFrame,因为前 6 列创建新索引,第 7 列创建新列,第 8 列是 2 个值:

    0  1  2  3  4   5  6   7
0  xx  s  1  d  f  df  f  54 
1  xx  s  1  d  f  df  f  g4 

新数据框:

 index = xx  s  1  d  f  df
 column = f
 value = 54 

 index = xx  s  1  d  f  df
 column = f
 value = g4 

所以解决方案是聚合的,这里使用字符串,所以需要 .apply(', '.join):

 index = xx  s  1  d  f  df
 column = f
 value = 54, g4 

或者通过 drop_duplicates 删除重复项并保留重复行的第一个或最后一个值:

 index = xx  s  1  d  f  df
 column = f
 value = 54
 index = xx  s  1  d  f  df
 column = f
 value = g4

res = pd.DataFrame({0: ['xx',np.nan,np.nan,np.nan,'ds', np.nan, np.nan, np.nan, np.nan, 'as'],
                    1: ['s',np.nan,np.nan,np.nan,'a', np.nan, np.nan, np.nan, np.nan, 't'],
                    2: ['1',np.nan,np.nan,np.nan,'s', np.nan, np.nan, np.nan, np.nan, 'r'],
                    3: ['d',np.nan, np.nan, np.nan,'d', np.nan, np.nan, np.nan, np.nan, 'a'],
                    4: ['f',np.nan, np.nan, np.nan,'f', np.nan, np.nan, np.nan, np.nan, '2'],
                    5: ['df',np.nan,np.nan,np.nan,'ds',np.nan, np.nan, np.nan, np.nan, 'ds'],
                    6: ['f','f', 'x', 'r', 'f', 'd', 's', '1', '3', 'k'], 
                    7: ['54','g4', 'r4', '43', '64', '43', 'se', 'gf', 's3', 's4']})


cols = res.columns[:6].tolist()
res[cols] = res[cols].ffill()
print (res)
    0  1  2  3  4   5  6   7
0  xx  s  1  d  f  df  f  54 
1  xx  s  1  d  f  df  f  g4 
2  xx  s  1  d  f  df  x  r4
3  xx  s  1  d  f  df  r  43
4  ds  a  s  d  f  ds  f  64
5  ds  a  s  d  f  ds  d  43
6  ds  a  s  d  f  ds  s  se
7  ds  a  s  d  f  ds  1  gf
8  ds  a  s  d  f  ds  3  s3
9  as  t  r  a  2  ds  k  s4

res =res.groupby(cols + [6])[7].apply(', '.join).unstack().reset_index().rename_axis(None, 1)
print (res)

    0  1  2  3  4   5    1    3    d       f    k    r    s    x
0  as  t  r  a  2  ds  NaN  NaN  NaN     NaN   s4  NaN  NaN  NaN
1  ds  a  s  d  f  ds   gf   s3   43      64  NaN  NaN   se  NaN
2  xx  s  1  d  f  df  NaN  NaN  NaN  54, g4  NaN   43  NaN   r4 <-54, g4

另一种解决方案是删除重复项:

res = res.drop_duplicates(cols + [6])

res = res.set_index(cols + [6])[7].unstack().reset_index().rename_axis(None, 1)
print (res)
    0  1  2  3  4   5    1    3    d    f    k    r    s    x
0  as  t  r  a  2  ds  NaN  NaN  NaN  NaN   s4  NaN  NaN  NaN
1  ds  a  s  d  f  ds   gf   s3   43   64  NaN  NaN   se  NaN
2  xx  s  1  d  f  df  NaN  NaN  NaN   54  NaN   43  NaN   r4 <- 54
res = res.drop_duplicates(cols + [6], keep='last')

res = res.set_index(cols + [6])[7].unstack().reset_index().rename_axis(None, 1)
print (res)
    0  1  2  3  4   5    1    3    d    f    k    r    s    x
0  as  t  r  a  2  ds  NaN  NaN  NaN  NaN   s4  NaN  NaN  NaN
1  ds  a  s  d  f  ds   gf   s3   43   64  NaN  NaN   se  NaN
2  xx  s  1  d  f  df  NaN  NaN  NaN   g4  NaN   43  NaN   r4 <- g4

关于python - 为什么 pandas unstack 会抛出错误?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51129662/

相关文章:

python - 如何在 Pandas 中编辑标题行 - 样式

python - 如何估计 3d 点的局部切平面?

python mysql 写入错误

python - 元素大小不均匀的时间序列

python - 在 Pandas 中分组和转换数据

python - Pandas Series.value_counts() 的奇怪行为

python-2.7 - 如何使用广播从具有列表 2d 索引的 2D numpy 数组中获取元素?

Python:Pandas 数据框 - 数据被覆盖而不是连接

python - 您可以将四分位距绘制为 seaborn 线图上的误差带吗?

python - 拆分列以修改数据框