我想查找拆分列是否包含类列表中的任何内容。如果是,我想使用类列表中的值更新类别列。想要的类别是我的最佳目标。
domain split Category Desired Category
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="28494a4b6870717c064b4745" rel="noreferrer noopener nofollow">[email protected]</a> XYT.com Null XYT
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="64050606243c303d4a070b09" rel="noreferrer noopener nofollow">[email protected]</a> XTY.com Null Null
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="49282b2a093a3a28672a2624" rel="noreferrer noopener nofollow">[email protected]</a> ssa.com Null ssa
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="2f4d4d4d6f4d4d4c014c4042" rel="noreferrer noopener nofollow">[email protected]</a> bbc.com Null bbc
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="563535351637343d7835393b" rel="noreferrer noopener nofollow">[email protected]</a> abk.com Null abk
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="a0c1c3c3e0d3d3c28ec3cfcd" rel="noreferrer noopener nofollow">[email protected]</a> ssb.com Null ssb
Class=['NaN','XYT','ssa','abk','abc','def','asds','ssb','bbc','XY','ab']
for index, row in df.iterrows():
for x in class:
intersection=row.split.contains(x)
if intersection:
df.loc[index,'class'] = intersection
就是做不到
请帮忙, 谢谢
最佳答案
使用str.extract
。创建一个将匹配列表中的单词之一的正则表达式,并提取将匹配的单词(如果没有,则提取 NaN)。
更新:作为“|”运算符永远不会贪婪,即使它会产生更长的整体匹配,您必须手动对列表进行反向排序。
lst = ['NaN','XY','ab','XYT','ssa','abk','abc','def','asds','ssb','bbc']
lst = sorted(lst, reverse=True)
pat = fr"({'|'.join(lst)})"
df['Category'] = df['split'].str.extract(pat)
>>> df
domain split Category
0 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="fc9d9e9fbca4a5a8d29f9391" rel="noreferrer noopener nofollow">[email protected]</a> XYT.com XYT
1 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="0a6b68684a525e5324696567" rel="noreferrer noopener nofollow">[email protected]</a> XTY.com NaN
2 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="08696a6b487b7b69266b6765" rel="noreferrer noopener nofollow">[email protected]</a> ssa.com ssa
3 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e2808080a2808081cc818d8f" rel="noreferrer noopener nofollow">[email protected]</a> bbc.com bbc
4 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="a9cacacae9c8cbc287cac6c4" rel="noreferrer noopener nofollow">[email protected]</a> abk.com abk
5 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="37565454774444551954585a" rel="noreferrer noopener nofollow">[email protected]</a> ssb.com ssb
>>> lst
['ssb', 'ssa', 'def', 'bbc', 'asds', 'abk', 'abc', 'ab', 'XYT', 'XY', 'NaN']
>>> pat
'(ssb|ssa|def|bbc|asds|abk|abc|ab|XYT|XY|NaN)'
关于python - 如何用另一列中包含的值填充 NaN 值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68854237/