Python numpy where 函数行为

有一个关于使用 numpy 的 where 条件的问题。我可以将 where 条件与 == 运算符一起使用，但无法将 where 条件与“一个字符串是另一个字符串的子字符串吗？”一起使用

代码:

    import pandas as pd
    import datetime as dt
    import numpy as np

    data = {'name': ['Smith, Jason', 'Bush, Molly', 'Smith, Tina',    
        'Clinton,     Jake', 'Hamilton, Amy'],
        'age': [42, 52, 36, 24, 73],
        'preTestScore': [4, 24, 31, 2, 3],
        'postTestScore': [25, 94, 57, 62, 70]}
    df = pd.DataFrame(data, columns = ['name', 'age', 'preTestScore',     
    'postTestScore'])
    print "BEFORE---- "
    print df
    print "AFTER----- "
    df["Smith Family"]=np.where("Smith" in df['name'],'Y','N' )
    print df

输出:

    BEFORE-----

                name  age  preTestScore  postTestScore
    0   Smith, Jason   42             4             25
    1    Bush, Molly   52            24             94
    2    Smith, Tina   36            31             57
    3  Clinton, Jake   24             2             62
    4  Hamilton, Amy   73             3             70


    AFTER----- 
                name  age  preTestScore  postTestScore Smith Family
    0   Smith, Jason   42             4             25            N
    1    Bush, Molly   52            24             94            N
    2    Smith, Tina   36            31             57            N
    3  Clinton, Jake   24             2             62            N
    4  Hamilton, Amy   73             3             70            N

为什么 numpy.where 条件在上述情况下不起作用。本来期望史密斯家族有值(value)观是氮是氮 N

但没有得到该输出。上面看到的输出都是 N,N,N,N,N 而不是在 df['name'] 中使用条件“Smith”(也尝试过 str(df['name']).find("Smith") >-1 )，但这也不起作用。

知道哪里出了问题或者我可以采取什么不同的做法吗？

最佳答案

我认为你需要str.contains对于 bool 掩码:

print (df['name'].str.contains("Smith"))
0     True
1    False
2     True
3    False
4    False
Name: name, dtype: bool

df["Smith Family"]=np.where(df['name'].str.contains("Smith"),'Y','N' )
print (df)
                name  age  preTestScore  postTestScore Smith Family
0       Smith, Jason   42             4             25            Y
1        Bush, Molly   52            24             94            N
2        Smith, Tina   36            31             57            Y
3  Clinton,     Jake   24             2             62            N
4      Hamilton, Amy   73             3             70            N

或者str.startswith :

df["Smith Family"]=np.where(df['name'].str.startswith("Smith"),'Y','N' )
print (df)
                name  age  preTestScore  postTestScore Smith Family
0       Smith, Jason   42             4             25            Y
1        Bush, Molly   52            24             94            N
2        Smith, Tina   36            31             57            Y
3  Clinton,     Jake   24             2             62            N
4      Hamilton, Amy   73             3             70            N

如果想在处理标量时使用in，则需要apply:

此解决方案速度更快，但如果 name 列中的 NaN 则不起作用。

df["Smith Family"]=np.where(df['name'].apply(lambda x: "Smith" in x),'Y','N' )
print (df)
                name  age  preTestScore  postTestScore Smith Family
0       Smith, Jason   42             4             25            Y
1        Bush, Molly   52            24             94            N
2        Smith, Tina   36            31             57            Y
3  Clinton,     Jake   24             2             62            N
4      Hamilton, Amy   73             3             70            N

关于Python numpy where 函数行为，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/40723299/

Python numpy where 函数行为

上一篇：python - 编程决策结构

下一篇：python - 通过Python添加Microsoft Face API的本地路径