pandas - 将 Pandas 列拆分为多列

我有如下数据框:

     ColumnA      ColumnB         ColumnC
0       usr       usr1,usr2       X1
1       xyz       xyz1,xyz2,xyz3  X2
2       abc       abc1,abc2,abc3  X3

我想做的是:

用“,”分割B列

问题是 B 列的某些单元格有 3 个变量(xyz1、xyz2、xyz3)，其中一些有 6 个等等。它不稳定。

预期输出:

     ColumnA      ColumnB          usercol1    usercol2    usercol3  ColumnC
0       usr       usr1,usr2           usr1      usr2           -       X1
1       xyz       xyz1,xyz2,xyz3      xyz1      xyz2          xyz3     X2
2       abc       abc1,abc2,abc3      abc1      abc2          abc3     X3

最佳答案

创建一个使用 expand=True 的新数据框与 str.split()

然后concat前两列，新的扩展数据框和第三个原始数据框列。这对于不均匀的列表长度是动态的。

df1 = df['ColumnB'].str.split(',',expand=True).add_prefix('usercol')
df1 = pd.concat([df[['ColumnA', 'ColumnB']],df1, df[['ColumnC']]], axis=1).replace(np.nan, '-')
df1
Out[1]: 
     ColumnA      ColumnB          usercol0    usercol1    usercol2  ColumnC
0       usr       usr1,usr2           usr1      usr2          -        X1
1       xyz       xyz1,xyz2,xyz3      xyz1      xyz2          xyz3     X2
2       abc       abc1,abc2,abc3      abc1      abc2          abc3     X3

从技术上讲，这也可以用一行来完成:

df = pd.concat([df[['ColumnA', 'ColumnB']],
                df['ColumnB'].str.split(',',expand=True).add_prefix('usercol'),
                df[['ColumnC']]], axis=1).replace(np.nan, '-')
df
Out[1]: 
  ColumnA         ColumnB usercol0 usercol1 usercol2 ColumnC
0     usr       usr1,usr2     usr1     usr2        -      X1
1     xyz  xyz1,xyz2,xyz3     xyz1     xyz2     xyz3      X2
2     abc  abc1,abc2,abc3     abc1     abc2     abc3      X3

关于pandas - 将 Pandas 列拆分为多列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64278047/

上一篇：firebase - 云 Firestore 中的时间字段 serverTimestamp() 在第一个快照上返回 null

下一篇：jupyter-notebook - 如何一次将多个单元格从一个笔记本复制到另一个(粘贴后不应合并单元格)

相关文章：

python - 编辑数据框中的日期以在 pandas 中显示年份

python - 如何迭代不同数据帧中的行并将其用作其他数据帧中的值？

python - Pandas :将特定行更改为百分比

perl - 在 Perl 中，将字符串转换为其字符列表的明智方法是什么？

python - Pandas 按逗号将列拆分为多列

r - 如何根据列值在 R 中创建组件(子集)数据框？

python - 如何有效地加入/合并/连接 Pandas 中的大型数据框？

python - 使用 date_range 时如何使 x 轴更详细

python - 以 python sorted() 函数方式对 pandas.DataFrame 进行排序

pandas - 将函数应用于带有 Pandas 的多列