python - 如何用另一列中包含的值填充 NaN 值?

标签 python pandas

我想查找拆分列是否包含类列表中的任何内容。如果是,我想使用类列表中的值更新类别列。想要的类别是我的最佳目标。

sampledata

domain      split   Category    Desired Category
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="28494a4b6870717c064b4745" rel="noreferrer noopener nofollow">[email protected]</a> XYT.com Null         XYT
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="64050606243c303d4a070b09" rel="noreferrer noopener nofollow">[email protected]</a> XTY.com Null         Null
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="49282b2a093a3a28672a2624" rel="noreferrer noopener nofollow">[email protected]</a> ssa.com Null         ssa
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="2f4d4d4d6f4d4d4c014c4042" rel="noreferrer noopener nofollow">[email protected]</a> bbc.com Null         bbc
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="563535351637343d7835393b" rel="noreferrer noopener nofollow">[email protected]</a> abk.com Null         abk
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="a0c1c3c3e0d3d3c28ec3cfcd" rel="noreferrer noopener nofollow">[email protected]</a> ssb.com Null         ssb
            
Class=['NaN','XYT','ssa','abk','abc','def','asds','ssb','bbc','XY','ab']    



for index, row in df.iterrows():
    for x in class:
        intersection=row.split.contains(x)
        if intersection:
           df.loc[index,'class'] = intersection

就是做不到

请帮忙, 谢谢

最佳答案

使用str.extract。创建一个将匹配列表中的单词之一的正则表达式,并提取将匹配的单词(如果没有,则提取 NaN)。

更新:作为“|”运算符永远不会贪婪,即使它会产生更长的整体匹配,您必须手动对列表进行反向排序。

lst = ['NaN','XY','ab','XYT','ssa','abk','abc','def','asds','ssb','bbc']
lst = sorted(lst, reverse=True)
pat = fr"({'|'.join(lst)})"

df['Category'] = df['split'].str.extract(pat)
>>> df
        domain    split Category
0  <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="fc9d9e9fbca4a5a8d29f9391" rel="noreferrer noopener nofollow">[email protected]</a>  XYT.com      XYT
1  <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="0a6b68684a525e5324696567" rel="noreferrer noopener nofollow">[email protected]</a>  XTY.com      NaN
2  <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="08696a6b487b7b69266b6765" rel="noreferrer noopener nofollow">[email protected]</a>  ssa.com      ssa
3  <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e2808080a2808081cc818d8f" rel="noreferrer noopener nofollow">[email protected]</a>  bbc.com      bbc
4  <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="a9cacacae9c8cbc287cac6c4" rel="noreferrer noopener nofollow">[email protected]</a>  abk.com      abk
5  <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="37565454774444551954585a" rel="noreferrer noopener nofollow">[email protected]</a>  ssb.com      ssb

>>> lst
['ssb', 'ssa', 'def', 'bbc', 'asds', 'abk', 'abc', 'ab', 'XYT', 'XY', 'NaN']

>>> pat
'(ssb|ssa|def|bbc|asds|abk|abc|ab|XYT|XY|NaN)'

关于python - 如何用另一列中包含的值填充 NaN 值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68854237/

相关文章:

python - 绘制一年时间序列中的每周滴答数据

python - 当 pandas 的日期发生变化时,如何减去时间?

python - 按列表顺序将两个数据框列与列表合并

python - 将 Pandas 数据框列值合并到新列中

python - pandas.DataFrame 构造函数中不允许元组的元组

python - 集合内的 drop_duplicates

python - Pandas:计算 Z 分数以避免 "look ahead"偏差

python - 如何使用 python pyhs2 连接到配置单元?

python - 字符串的子集

带有未知参数的 Python 包装函数