python - 将 Dataframe 列的内容“扩展”到新列中

我敢肯定一定有 away 可以在不诉诸嵌套循环的情况下做到这一点。

我有一个 df(注意有一列包含字符串列表)

df = DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3,5], 'C' : [['a','b'],['b','c'] ,['g','h'],['x','y']]})

最终，我想“扩展”列中列表中的值，以便每个可能的列表项都有一个 col，如果出现该值，则每一行的正确列中都有一个 1。例如

df =

A  B      C      a  b  c  g  h  x  y
5  1  ['a','b']  1  1
6  2  ['b','c']     1  1
3  3  ['g','h']           1  1
4  5  ['x','y']                 1  1

最佳答案

您可以使用 pandas.get_dummies , 但随后需要 groupby 按 columns 聚合 max:

df1 = pd.get_dummies(pd.DataFrame(df.C.values.tolist()), prefix='', prefix_sep='')
        .groupby(axis=1, level=0).max()

df1 = pd.concat([df, df1], axis=1)
print (df1)

   A  B       C  a  b  c  g  h  x  y
0  5  1  [a, b]  1  1  0  0  0  0  0
1  6  2  [b, c]  0  1  1  0  0  0  0
2  3  3  [g, h]  0  0  0  1  1  0  0
3  4  5  [x, y]  0  0  0  0  0  1  1

另一种解决方案 replace + str.get_dummies :

df1 = df.C.astype(str).replace(['\[','\]', "'", "\s+"], '', regex=True).str.get_dummies(',')
df1 = pd.concat([df, df1], axis=1)
print (df1)

   A  B       C  a  b  c  g  h  x  y
0  5  1  [a, b]  1  1  0  0  0  0  0
1  6  2  [b, c]  0  1  1  0  0  0  0
2  3  3  [g, h]  0  0  0  1  1  0  0
3  4  5  [x, y]  0  0  0  0  0  1  1

也可以删除 0，但获取带有数字的字符串值和一些 pandas 函数可能会被破坏:

df1 = df.C.astype(str).replace(['\[','\]', "'", "\s+"], '', regex=True).str.get_dummies(',')
df1 = df1.replace(0,'')
df1 = pd.concat([df, df1], axis=1)
print (df1)
   A  B       C  a  b  c  g  h  x  y
0  5  1  [a, b]  1  1               
1  6  2  [b, c]     1  1            
2  3  3  [g, h]           1  1      
3  4  5  [x, y]                 1  1

关于python - 将 Dataframe 列的内容“扩展”到新列中，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43544707/

python - 将 Dataframe 列的内容“扩展”到新列中

上一篇：python - 输入时从列表列表中获取最接近的元素

下一篇：Python Pandas 日期索引