python - 如何解决 'The column label ' Avg_Threat_Score' is not unique.'? Pandas 问题

标签 python pandas pivot pivot-table

运行代码时,我遇到了以下错误。 错误 - 列标签“Avg_Threat_Score”不是唯一的。

我正在创建一个数据透视表,并希望将值从高到低排序。

pt = df.pivot_table(index = 'User Name',values = ['Threat Score', 'Score'], 
        aggfunc = {
                   'Threat Score': np.mean,
                   'Score' :[np.mean, lambda x: len(x.dropna())]
                  }, 
        margins = False) 

new_col =['User Name Count', 'AVG_TH_Score', 'Avg_Threat_Score']
pt.columns = [new_col]
#befor this code is working, after that now working 
df = df.reindex(pt.sort_values
                    (by = 'Avg_Threat_Score',ascending=False).index)

需要对“Avg_Threat_Score”列的值进行高低排序

最佳答案

您需要按列表而不是嵌套列表传递新的列名,因为 pandas 创建具有一层的 MultiIndex

new_col =['User Name Count', 'AVG_TH_Score', 'Avg_Threat_Score']
pt.columns = [new_col]

就像:

pt.columns = [['User Name Count', 'AVG_TH_Score', 'Avg_Threat_Score']]

ValueError: The column label 'Avg_Threat_Score' is not unique.
For a multi-index, the label must be a tuple with elements corresponding to each level.

所以使用:

pt.columns = ['User Name Count', 'AVG_TH_Score', 'Avg_Threat_Score']

示例:

df = pd.DataFrame({
        'User Name':list('ababaa'),
         'Threat Score':[4,5,4,np.nan,5,4],
         'Score':[np.nan,8,9,4,2,np.nan],
         'D':[1,3,5,7,1,0]})

pt = (df.pivot_table(index = 'User Name',values = ['Threat Score', 'Score'], 
        aggfunc = {
                   'Threat Score': np.mean,
                   'Score' :[np.mean, lambda x: len(x.dropna())]
                  }, 
        margins = False))

pt.columns = ['User Name Count', 'AVG_TH_Score', 'Avg_Threat_Score']
print (pt)
           User Name Count  AVG_TH_Score  Avg_Threat_Score
User Name                                                 
a                      2.0           5.5              4.25
b                      2.0           6.0              5.00

然后根据 Avg_Threat_Score 的顺序进行排序,使用 ordered Categorical对于 User Name 列,所以最后一个 sort_values 有效:

names = pt.sort_values(by = 'Avg_Threat_Score',ascending=False).index
print (names)
#Index(['b', 'a'], dtype='object', name='User Name')

df['User Name'] = pd.CategoricalIndex(df['User Name'], categories=names, ordered=True)
df = df.sort_values('User Name')

print (df)
  User Name  Threat Score  Score  D
1         b           5.0    8.0  3
3         b           NaN    4.0  7
0         a           4.0    NaN  1
2         a           4.0    9.0  5
4         a           5.0    2.0  1
5         a           4.0    NaN  0

关于python - 如何解决 'The column label ' Avg_Threat_Score' is not unique.'? Pandas 问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56312312/

相关文章:

带有重复变量的sql pivot

sql - 在 SQL 中转置列和行的简单方法?

python - 如何在没有 pywintypes.com_error 的情况下使用 xlwings 在 Excel 中插入数据框?

sql - 将不同的行属性转置/旋转为列并将另一个属性分组?

python - 无法在 Eclipse for OS X 上升级/安装插件

python - 将箭头键发送到 Popen

python - 如何计算两个包含 Python 字符串的列表的 Jaccard 相似度?

python - PostgreSQL - Psycopg2 - copy_from - 编码 "UTF8": 0x00 的字节序列无效

python - 将制表符分隔的文本文件读入 Pandas 数据帧时出现 RunTimeError

python - 如何仅按小时聚合 Pandas 日期时间轴系列