pandas - 堆叠、拆散、融合、旋转、转置？将多列转换为行的简单方法是什么(PySpark 或 Pandas)？)

我的工作环境主要使用PySpark，但是通过Google一下，在PySpark中转置非常复杂。我想将其保留在 PySpark 中，但如果在 Pandas 中更容易做到这一点，我会将 Spark 数据帧转换为 Pandas 数据帧。我认为数据集不是很大，性能是一个问题。

我想将具有多列的数据框转换为行:

输入:

import pandas as pd
df = pd.DataFrame({'Record': {0: 1, 1: 2, 2: 3},
 'Hospital': {0: 'Red Cross', 1: 'Alberta Hospital', 2: 'General Hospital'},
 'Hospital Address': {0: '1234 Street 429',
  1: '553 Alberta Road 441',
  2: '994 Random Street 923'},
 'Medicine_1': {0: 'Effective', 1: 'Effecive', 2: 'Normal'},
 'Medicine_2': {0: 'Effective', 1: 'Normal', 2: 'Effective'},
 'Medicine_3': {0: 'Normal', 1: 'Normal', 2: 'Normal'},
 'Medicine_4': {0: 'Effective', 1: 'Effective', 2: 'Effective'}})

Record          Hospital       Hospital Address Medicine_1 Medicine_2 Medicine_3 Medicine_4  
     1         Red Cross        1234 Street 429  Effective  Effective     Normal  Effective    
     2  Alberta Hospital   553 Alberta Road 441   Effecive     Normal     Normal  Effective
     3  General Hospital  994 Random Street 923     Normal  Effective     Normal  Effective

输出:

    Record          Hospital       Hospital Address        Name      Value
0        1         Red Cross        1234 Street 429  Medicine_1  Effective
1        2         Red Cross        1234 Street 429  Medicine_2  Effective
2        3         Red Cross        1234 Street 429  Medicine_3     Normal
3        4         Red Cross        1234 Street 429  Medicine_4  Effective
4        5  Alberta Hospital   553 Alberta Road 441  Medicine_1   Effecive
5        6  Alberta Hospital   553 Alberta Road 441  Medicine_2     Normal
6        7  Alberta Hospital   553 Alberta Road 441  Medicine_3     Normal
7        8  Alberta Hospital   553 Alberta Road 441  Medicine_4  Effective
8        9  General Hospital  994 Random Street 923  Medicine_1     Normal
9       10  General Hospital  994 Random Street 923  Medicine_2  Effective
10      11  General Hospital  994 Random Street 923  Medicine_3     Normal
11      12  General Hospital  994 Random Street 923  Medicine_4  Effective

查看 PySpark 示例后，发现很复杂:PySpark Dataframe melt columns into rows

看看 Pandas 的例子，它看起来容易多了。但 Stack Overflow 上有很多不同的答案，其中一些说法是使用pivot、melt、stack、unstack，还有更多，最终会让人感到困惑。

因此，如果有人有一种简单的方法可以在 PySpark 中做到这一点，我会洗耳恭听。如果没有，我会很乐意接受 Pandas 的答案。

非常感谢您的帮助!

最佳答案

您还可以使用.melt并指定id_vars。其他一切都将被考虑 value_vars。您拥有的 value_vars 列数会将数据帧中的行数乘以该数字，将四列中的所有列信息堆叠到一列中，并将复制 id_var code> 列转换为您想要的格式:

数据框设置:

import pandas as pd
df = pd.DataFrame({'Record': {0: 1, 1: 2, 2: 3},
 'Hospital': {0: 'Red Cross', 1: 'Alberta Hospital', 2: 'General Hospital'},
 'Hospital Address': {0: '1234 Street 429',
  1: '553 Alberta Road 441',
  2: '994 Random Street 923'},
 'Medicine_1': {0: 'Effective', 1: 'Effecive', 2: 'Normal'},
 'Medicine_2': {0: 'Effective', 1: 'Normal', 2: 'Effective'},
 'Medicine_3': {0: 'Normal', 1: 'Normal', 2: 'Normal'},
 'Medicine_4': {0: 'Effective', 1: 'Effective', 2: 'Effective'}})

代码:

df = (df.melt(id_vars=['Record','Hospital', 'Hospital Address'],
              var_name='Name',
              value_name='Value')
     .sort_values('Record')
     .reset_index(drop=True))
df['Record'] = df.index+1
df
Out[1]: 
    Record          Hospital       Hospital Address        Name      Value
0        1         Red Cross        1234 Street 429  Medicine_1  Effective
1        2         Red Cross        1234 Street 429  Medicine_2  Effective
2        3         Red Cross        1234 Street 429  Medicine_3     Normal
3        4         Red Cross        1234 Street 429  Medicine_4  Effective
4        5  Alberta Hospital   553 Alberta Road 441  Medicine_1   Effecive
5        6  Alberta Hospital   553 Alberta Road 441  Medicine_2     Normal
6        7  Alberta Hospital   553 Alberta Road 441  Medicine_3     Normal
7        8  Alberta Hospital   553 Alberta Road 441  Medicine_4  Effective
8        9  General Hospital  994 Random Street 923  Medicine_1     Normal
9       10  General Hospital  994 Random Street 923  Medicine_2  Effective
10      11  General Hospital  994 Random Street 923  Medicine_3     Normal
11      12  General Hospital  994 Random Street 923  Medicine_4  Effective

关于pandas - 堆叠、拆散、融合、旋转、转置？将多列转换为行的简单方法是什么(PySpark 或 Pandas)？)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64179626/

pandas - 堆叠、拆散、融合、旋转、转置？将多列转换为行的简单方法是什么(PySpark 或 Pandas)？)

上一篇：node.js - 如何在nodejs中使用 `aws-sdk`读取cloudwatch日志

下一篇：dplyr 中的排名函数