python - 在 Pandas 中旋转一系列评级列

标签 python pandas numpy

在 Pandas 中,我有一个数据框,其中每行对应一个用户,每列对应与该用户相关的变量,包括他们如何评价某件事:

+----------------+--------------------------+----------+----------+
|      name      |          email           | rating_a | rating_b |
+----------------+--------------------------+----------+----------+
| Someone        | someone@mail.com         |      7.8 |      9.9 |
| Someone Else   | someone.else@mail.com    |      2.4 |      9.2 |
| Another Person | another.person@mail.com  |      3.5 |      7.5 |
+----------------+--------------------------+----------+----------+

我想要旋转表格,使一列是评级类型(ab),另一列是评级值(7.83.5 等),其他列与上面相同,如下所示:

+----------------+-------------------------+-------------+--------------+
|      name      |          email          | rating_type | rating_value |
+----------------+-------------------------+-------------+--------------+
| Someone        | someone@mail.com        | a           |          7.8 |
| Someone        | someone@mail.com        | b           |          9.9 |
| Someone Else   | someone.else@mail.com   | a           |          2.4 |
| Someone Else   | someone.else@mail.com   | b           |          9.2 |
| Another Person | another.person@mail.com | a           |          3.5 |
| Another Person | another.person@mail.com | b           |          7.5 |
+----------------+-------------------------+-------------+--------------+

看来 Pandas melt方法是在正确的轨道上,但我不完全确定在这种情况下我的 id_vars 是什么以及我的 value_vars 是什么。此外,它似乎删除了不属于这两个类别之一的所有列,例如电子邮件地址。但我想保留所有这些信息。

我怎样才能用 Pandas 做到这一点?

最佳答案

您可以使用melt + str.replace更改列名称:

df.columns = df.columns.str.replace('rating_','')
df = df.melt(id_vars=['name','email'], var_name='rating_type', value_name='rating_value')
print (df)
             name                     email rating_type  rating_value
0         Someone          someone@mail.com           a           7.8
1    Someone Else     someone.else@mail.com           a           2.4
2  Another Person  another.persone@mail.com           a           3.5
3         Someone          someone@mail.com           b           9.9
4    Someone Else     someone.else@mail.com           b           9.2
5  Another Person  another.persone@mail.com           b           7.5

另一个解决方案 set_index + stack + rename_axis + reset_index :

df.columns = df.columns.str.replace('rating_','')
df = df.set_index(['name','email'])
       .stack()
       .rename_axis(['name','email','rating_type'])
       .reset_index(name='rating_value')
print (df)
             name                     email rating_type  rating_value
0         Someone          someone@mail.com           a           7.8
1         Someone          someone@mail.com           b           9.9
2    Someone Else     someone.else@mail.com           a           2.4
3    Someone Else     someone.else@mail.com           b           9.2
4  Another Person  another.persone@mail.com           a           3.5
5  Another Person  another.persone@mail.com           b           7.5

如果需要更改行的顺序,则使用 melt 解决方案:

df.columns = df.columns.str.replace('rating_','')
df = df.reset_index() \
       .melt(id_vars=['index','name','email'], 
             var_name='rating_type',
             value_name='rating_value')\
       .sort_values(['index','rating_type']) \
       .drop('index', axis=1) \
       .reset_index(drop=True)
print (df)
             name                     email rating_type  rating_value
0         Someone          someone@mail.com           a           7.8
1         Someone          someone@mail.com           b           9.9
2    Someone Else     someone.else@mail.com           a           2.4
3    Someone Else     someone.else@mail.com           b           9.2
4  Another Person  another.persone@mail.com           a           3.5
5  Another Person  another.persone@mail.com           b           7.5

关于python - 在 Pandas 中旋转一系列评级列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44236927/

相关文章:

python - Numpy 索引 : Get every second coloumn for each even row

python - 如何根据多索引 pandas 数据框中的行索引值创建列?

python - 曼德尔布罗集在 2^47 变焦左右变得模糊

python - 输出正确的 CGI、Python 脚本时遇到问题?

python - 使用 Python 进行 SQL 注入(inject) (WordPress)

python - 变量在 while 循环中改变值

python - 在 Python/Pandas 数据框中创建新列时,有没有办法避免键入数据框名称、括号和引号?

python - 带有 lambda 函数的 filter() 的复杂性分析

python - Python Pandas 中的 DataFrame 转换

python - 计算随机2个人在同一组的概率?