我有一个包含列表的列的数据框,我试图遍历数据框中的每一行并与该行列表的每个元素连接。我正在尝试编写代码来实现“molecule_species”中显示的结果。对此的任何想法将不胜感激。
数据框 =
import pandas as pd
df = pd.DataFrame({'molecule': ['a',
'b',
'c',
'd',
'e'],
'species' : [['dog'],
['horse','pig'],
['cat', 'dog'],
['cat','horse','pig'],
['chicken','pig']]})
我试图通过迭代行和列表元素来创建新列,将“分子”与“物种”中包含的列表中的每个元素连接起来。
df['molecule_species'] = [['a dog'],
['b horse','b pig'],
['c cat', 'c dog'],
['d cat','d horse','d pig'],
['e chicken','e pig']]
最佳答案
Pandas > 0.25.0
使用 Series.explode
然后 join
,
返回列表 GroupBy.agg
:
df['molecule_species'] = (df.explode('species')
.apply(' '.join,axis=1)
.groupby(level=0)
.agg(list) )
print(df)
molecule species molecule_species
0 a [dog] [a dog]
1 b [horse, pig] [b horse, b pig]
2 c [cat, dog] [c cat, c dog]
3 d [cat, horse, pig] [d cat, d horse, d pig]
4 e [chicken, pig] [e chicken, e pig]
Pandas < 0.25.0
df['molecule_species']=(df.reindex(df.index.repeat(df.species.str.len()))
.assign(species=np.concatenate(df.species.values))
.apply(' '.join,axis=1)
.groupby(level=0)
.agg(list) )
print(df)
molecule species molecule_species
0 a [dog] [a dog]
1 b [horse, pig] [b horse, b pig]
2 c [cat, dog] [c cat, c dog]
3 d [cat, horse, pig] [d cat, d horse, d pig]
4 e [chicken, pig] [e chicken, e pig]
另一种方法是
Series.str.cat
df2 = df.explode('species')
df['molecule_species']=df2['molecule'].str.cat(df2['species'],sep=' ').groupby(level=0).agg(list)
关于python - 循环遍历 Pandas 数据框列中的列表元素以返回新列中的列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59776345/