我有一个 pandas 数据框:
word_list
['nuclear','election','usa','baseball']
['football','united','thriller']
['marvels','hollywood','spiderman']
....................
....................
....................
我也有多个带有类别名称的列表,例如:-
movies=['蜘蛛侠','奇迹','惊悚']'
sports=['棒球','曲棍球','橄榄球']
,
politics=['election','china','usa']
和许多其他类别。
我只想将 pandas 列 word_list
的关键字与我的类别列表进行匹配,并在关键字匹配在一起时在单独的列中分配相应的列表名称,如果任何关键字在任何然后简单地列出 miscellaneous
所以,我正在寻找的输出是:-
word_list matched_list_names
['nuclear','election','usa','baseball'] politics,sports,miscellaneous
['football','united','thriller'] sports,movies,miscellaneous
['marvels','spiderman','hockey'] movies,sports
.................... .....................
.................... .....................
.................... ....................
我成功获得了匹配关键字:-
for i in df['word_list']:
for j in movies:
if i in j:
print (i)
但这给了我匹配关键字的列表。我如何获取列表名称并将其添加到 pandas 列中?
最佳答案
您可以先展平列表字典,然后通过 .get
和 miscellaneous
查找不匹配的值,然后转换为 set
独特的类别并通过 join
转换为 string
:
movies=['spiderman','marvels','thriller']
sports=['baseball','hockey','football']
politics=['election','china','usa']
d = {'movies':movies, 'sports':sports, 'politics':politics}
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
f = lambda x: ','.join(set([d1.get(y, 'miscellaneous') for y in x]))
df['matched_list_names'] = df['word_list'].apply(f)
print (df)
word_list matched_list_names
0 [nuclear, election, usa, baseball] politics,miscellaneous,sports
1 [football, united, thriller] miscellaneous,sports,movies
2 [marvels, hollywood, spiderman, budget] miscellaneous,movies
列表理解的类似解决方案:
df['matched_list_names'] = [','.join(set([d1.get(y, 'miscellaneous') for y in x]))
for x in df['word_list']]
关于python - 将 pandas 列中的关键字与另一个元素列表匹配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51574485/