python - 将值插入到已知列 pandas 中指定的列中

标签 python pandas bigdata

我正在为机器学习准备数据,其中数据位于 pandas DataFrame 中,如下所示:

Column   v1    v2
first    1      2
second   3      4
third    5      6

现在我想把它变成:

Column  v1  v2  first-v1  first-v2  second-v1  econd-v2  third-v1  third-v2
first   1   2     1        2         Nan        Nan       Nan      Nan
second  3   4     Nan      Nan       3          4         Nan      Nan
third   5   6     Nan      Nan       Nan        Nan       5        6

我尝试过做这样的事情:

# we know how many values there are but 
# length can be changed into length of [1, 2, 3, ...] values
values = ['v1', 'v2']

# data with description from above is saved in data 
for value in values:
    data[ str(data['Column'] + '-' + value)] = data[ value]

结果是一个列,其名称为: ['第一个-v1''第二个-v1'..],['第一个-v2''第二个-v2'..] 哪里有正确的值。我做错了什么?由于我的数据很大,是否有更优化的方法来执行此操作?

感谢您的宝贵时间!

最佳答案

您可以使用unstack在列中交换和排序 MultiIndex:

df = data.set_index('Column', append=True)[values].unstack()
         .swaplevel(0,1, axis=1).sort_index(1)
df.columns = df.columns.map('-'.join)
print (df)
   first-v1  first-v2  second-v1  second-v2  third-v1  third-v2
0       1.0       2.0        NaN        NaN       NaN       NaN
1       NaN       NaN        3.0        4.0       NaN       NaN
2       NaN       NaN        NaN        NaN       5.0       6.0

或者stack + unstack :

df = data.set_index('Column', append=True).stack().unstack([1,2])
df.columns = df.columns.map('-'.join)
print (df)
   first-v1  first-v2  second-v1  second-v2  third-v1  third-v2
0       1.0       2.0        NaN        NaN       NaN       NaN
1       NaN       NaN        3.0        4.0       NaN       NaN
2       NaN       NaN        NaN        NaN       5.0       6.0

最后加入到原始:

df = data.join(df)
print (df)
   Column  v1  v2  first-v1  first-v2  second-v1  second-v2  third-v1  \
0   first   1   2       1.0       2.0        NaN        NaN       NaN   
1  second   3   4       NaN       NaN        3.0        4.0       NaN   
2   third   5   6       NaN       NaN        NaN        NaN       5.0   

   third-v2  
0       NaN  
1       NaN  
2       6.0  

关于python - 将值插入到已知列 pandas 中指定的列中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43629853/

相关文章:

python - Django / python : screen sharing api

python - 使用python在sqlite3中存储numpy数组时遇到问题

python - 计算Python中特定时间间隔内的记录数

python dataframe与列表项匹配

hadoop - Hadoop DFSClient安装

python - Shell脚本在目录中查找文件,如何传入python脚本

python - 在后台运行计数器

python - 权重随时间变化的时间序列的加权平均值

hadoop - 为什么我无法在 hadoop hdfs 的父文件夹结构中创建子文件夹?

python - 随机访问可重新创建的随机生成的大数据