python - Pandas 数据透视表格式化列名称

标签 python pandas dataframe pivot-table data-munging

我在 Pandas 数据帧上使用了 pandas.pivot_table 函数,我的输出看起来与此类似:

                    Winners                 Runnerup            
         year       2016    2015    2014    2016    2015    2014
Country  Sport                              
india    badminton                              
india    wrestling  

我真正需要的是像下面这样的东西

Country Sport   Winners_2016    Winners_2015    Winners_2014    Runnerup_2016   Runnerup_2015   Runnerup_2014
india   badminton   1   1   1   1   1   1
india   wrestling   1   0   1   0   1   0

我有很多专栏和年份,所以我无法手动编辑它们,所以任何人都可以告诉我如何做吗?

最佳答案

您还可以使用列表理解:

df.columns = ['_'.join(col) for col in df.columns]
print (df)
                   Winners_2016  Winners_2015  Winners_2014  Runnerup_2016  \
Country Sport                                                                
india   badminton             1             1             1              1   
        wrestling             1             1             1              1   

                   Runnerup_2015  Runnerup_2014  
Country Sport                                    
india   badminton              1              1  
        wrestling              1              1  

转换的另一种解决方案to_series然后调用join :

df.columns = df.columns.to_series().str.join('_')
print (df)
                   Winners_2016  Winners_2015  Winners_2014  Runnerup_2016  \
Country Sport                                                                
india   badminton             1             1             1              1   
        wrestling             1             1             1              1   

                   Runnerup_2015  Runnerup_2014  
Country Sport                                    
india   badminton              1              1  
        wrestling              1              1  

我对时间非常感兴趣:

In [45]: %timeit ['_'.join(col) for col in df.columns]
The slowest run took 7.82 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.05 µs per loop

In [44]: %timeit ['{}_{}'.format(x,y) for x,y in zip(df.columns.get_level_values(0),df.columns.get_level_values(1))]
The slowest run took 4.56 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 131 µs per loop

In [46]: %timeit df.columns.to_series().str.join('_')
The slowest run took 4.31 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 452 µs per loop

关于python - Pandas 数据透视表格式化列名称,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39050318/

相关文章:

python - Pyramid 和 .ini 配置

python - 想要根据条件连接两个数据框的最后一行

python - 合并/连接两个数据帧,一个具有 IP 地址,一个具有 IP 网络

python - 如何在 Python 中找到与某个项目最频繁的配对

python - 如何将python Flask中的变量传递给mysqldb?

python - 如何使用 python-docx 添加页面边框

python - 如何知道边界框(矩形)是否位于另一个边界框(矩形)内?

python - pandas 读取顶行左列空白的 Excel 文件时出现问题

python - Merce csv 文件(来自文件夹)合并为一个,使用 Python 添加具有不同名称的列

python - pandas - 通过另一个数据帧索引数据帧