python - 为什么空格会影响相等字符串的身份比较？

<分区>

我注意到，使用 is 向相同字符串添加空格会使它们比较不相等，而非空格版本比较相等。

a = 'abc'
b = 'abc'
a is b
#outputs: True

a = 'abc abc'
b = 'abc abc'
a is b
#outputs: False

我已阅读 this question about comparing strings with == and is .我认为这是一个不同的问题，因为空格字符正在改变行为，而不是字符串的长度。见:

a = 'abc'
b = 'abc'
a is b # True

a = 'gfhfghssrtjyhgjdagtaerjkdhhgffdhfdah'
b = 'gfhfghssrtjyhgjdagtaerjkdhhgffdhfdah'
a is b # True

为什么在字符串中添加空格会改变比较结果？

最佳答案

python 解释器根据特定条件缓存一些字符串，第一个 abc 字符串被缓存并用于两者，但第二个不是。对于从 -5 到 256 的小整数也是一样的。

因为字符串被驻留/缓存，将 a 和 b 分配给 "abc" 使得 a 和 b 指向内存中的相同对象，因此使用 is 检查两个对象是否实际上是同一对象，返回 True。

第二个字符串 abc abc 没有缓存，所以它们在内存中是两个完全不同的对象，所以使用 is 进行身份检查会返回 False。这次 a 不是 b。它们都指向内存中的不同对象。

In [43]: a = "abc" # python caches abc
In [44]: b = "abc" # it reuses the object when assigning to b
In [45]: id(a)
Out[45]: 139806825858808    # same id's, same object in memory
In [46]: id(b)
Out[46]: 139806825858808    
In [47]: a = 'abc abc'   # not cached  
In [48]: id(a)
Out[48]: 139806688800984    
In [49]: b = 'abc abc'    
In [50]: id(b)         # different id's different objects
Out[50]: 139806688801208

缓存字符串的标准是字符串中是否只有字母、下划线和数字，所以在你的情况下空格确实不符合标准。

在使用解释器的情况下，即使字符串不满足上述条件，您也可能最终指向同一个对象，即多次赋值。

In [51]: a,b  = 'abc abc','abc abc'

In [52]: id(a)
Out[52]: 139806688801768

In [53]: id(b)
Out[53]: 139806688801768

In [54]: a is b
Out[54]: True

寻找 codeobject.c source 来决定我们看到的标准 NAME_CHARS 决定什么可以被实习:

#define NAME_CHARS \
    "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz"

/* all_name_chars(s): true iff all chars in s are valid NAME_CHARS */

static int
all_name_chars(unsigned char *s)
{
    static char ok_name_char[256];
    static unsigned char *name_chars = (unsigned char *)NAME_CHARS;

    if (ok_name_char[*name_chars] == 0) {
        unsigned char *p;
        for (p = name_chars; *p; p++)
            ok_name_char[*p] = 1;
    }
    while (*s) {
        if (ok_name_char[*s++] == 0)
            return 0;
    }
    return 1;
}

长度为 0 或 1 的字符串将始终被共享，正如我们在 stringobject.c 源代码的 PyString_FromStringAndSize 函数中所见。

/* share short strings */
    if (size == 0) {
        PyObject *t = (PyObject *)op;
        PyString_InternInPlace(&t);
        op = (PyStringObject *)t;
        nullstring = op;
        Py_INCREF(op);
    } else if (size == 1 && str != NULL) {
        PyObject *t = (PyObject *)op;
        PyString_InternInPlace(&t);
        op = (PyStringObject *)t;
        characters[*str & UCHAR_MAX] = op;
        Py_INCREF(op);
    }
    return (PyObject *) op;
}

与问题没有直接关系，但对于那些感兴趣的人 PyCode_New 同样来自 codeobject.c 源代码显示一旦字符串满足all_name_chars 中的标准。

PyCodeObject *
PyCode_New(int argcount, int nlocals, int stacksize, int flags,
       PyObject *code, PyObject *consts, PyObject *names,
       PyObject *varnames, PyObject *freevars, PyObject *cellvars,
       PyObject *filename, PyObject *name, int firstlineno,
       PyObject *lnotab)
{
    PyCodeObject *co;
    Py_ssize_t i;
    /* Check argument types */
    if (argcount < 0 || nlocals < 0 ||
        code == NULL ||
        consts == NULL || !PyTuple_Check(consts) ||
        names == NULL || !PyTuple_Check(names) ||
        varnames == NULL || !PyTuple_Check(varnames) ||
        freevars == NULL || !PyTuple_Check(freevars) ||
        cellvars == NULL || !PyTuple_Check(cellvars) ||
        name == NULL || !PyString_Check(name) ||
        filename == NULL || !PyString_Check(filename) ||
        lnotab == NULL || !PyString_Check(lnotab) ||
        !PyObject_CheckReadBuffer(code)) {
        PyErr_BadInternalCall();
        return NULL;
    }
    intern_strings(names);
    intern_strings(varnames);
    intern_strings(freevars);
    intern_strings(cellvars);
    /* Intern selected string constants */
    for (i = PyTuple_Size(consts); --i >= 0; ) {
        PyObject *v = PyTuple_GetItem(consts, i);
        if (!PyString_Check(v))
            continue;
        if (!all_name_chars((unsigned char *)PyString_AS_STRING(v)))
            continue;
        PyString_InternInPlace(&PyTuple_GET_ITEM(consts, i));
    }

这个答案是基于使用 cpython 解释器的简单赋值，就与函数相关的实习或简单赋值之外的任何其他功能而言，没有被询问或回答。

如果任何对 c 代码有更深入了解的人有任何要添加的内容，请随时编辑。

here 对整个字符串实习有更详尽的解释。

关于python - 为什么空格会影响相等字符串的身份比较？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/28329498/

上一篇：python - 如何找到字符串中任何一组字符的第一个索引

下一篇：Python WebDriver 如何打印整页源码(html)

python - OpenCV:从 VideoCapture 读取帧将视频推进到奇怪的错误位置

python - 如何检查一个序列是严格单调的还是有一个转折点两边都是严格单调的？

python - 将 .dat 文件转换为数组

python - django-elasticsearch-dsl-drf:TypeError:search()得到了意外的关键字参数 'doc_type'

python - 使用两个数据帧的 Pandas bool 索引

python - 使用Python使用游标访问MySQL数据库？

python - Pandas 中 transpose() 和 .T 的区别

python - Requests 1.0.4 异步请求

python - 如何在数据框中一次选择多个值？