python - Pandas 数据帧 : convert columns into rows of a single column

我有一个看起来像的数据框

userId  feature1  feature2  feature3  ...
123456  0         0.45      0         ...
234567  0         0         0         ...
345678  0.6       0         0.2       ...
.
.

这些特征大部分为零，但有时其中一些特征会具有非零值。 userId 的单行可能有零个、一个或多个非零特征。

我想将其转换为以下数据集:

userId  feature  value
123456  feature2 0.45
345678  feature1 0.6
345678  feature3 0.2

本质上，我们只保留每个 userId 的非零特征。因此，对于 userId 345678，转换后的数据集中有 2 行，一行用于特征 1，另一行用于特征 3。 userId 234567 已被删除，因为所有功能均不为零。

这可以使用 groupby 或透视来完成吗？如果是这样，怎么办？

还有其他 pandas-mic 解决方案吗？

最佳答案

来自融化的魔法

df.melt('userId').query('value!=0')
Out[459]: 
   userId  variable  value
2  345678  feature1   0.60
3  123456  feature2   0.45
8  345678  feature3   0.20

注意使用stack，您需要掩码0到NaN

df.mask(df.eq(0)).set_index('userId').stack().reset_index()
Out[460]: 
   userId   level_1     0
0  123456  feature2  0.45
1  345678  feature1  0.60
2  345678  feature3  0.20

关于python - Pandas 数据帧 : convert columns into rows of a single column，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54792245/

上一篇：python - 通过另一个索引或值过滤 DataFrame 索引

下一篇：python - 根据 'hour' 日期时间选择 Pandas 数据框行

python - 传递 Pandas 数据时，Scipy linregress 返回元组

python - 如何将具有多个标题行的 csv 文件读入 pandas？

python - 根据 pandas 索引范围合并行

javascript - 如何使用 python selenium 单击 angularjs 链接？

python - 如果 while 循环中的语句不起作用

python - libxml 解决 python 的 utf 编码问题还是我的问题？

r - R中的xlsx包将数字数据帧转换为xlsx文件中的文本

python - 拆分 pandas 中的地址列

python - 使用正则表达式重复匹配大写姓氏