python - cython中不同字符串的相同内存地址

我在 cython 中写了一个树对象，它有很多节点，每个节点包含一个 unicode 字符。如果我使用 Py_UNICODE 或 str 作为变量类型，我想测试角色是否被实习。我试图通过创建节点类的多个实例并为每个实例获取字符的内存地址来对此进行测试，但不知何故我最终得到了相同的内存地址，即使不同的实例包含不同的字符。这是我的代码:

from libc.stdint cimport uintptr_t

cdef class Node():
    cdef:
        public str character
        public unsigned int count
        public Node lo, eq, hi

    def __init__(self, str character):
        self.character = character

    def memory(self):
        return <uintptr_t>&self.character[0]

我正在尝试通过 Python 比较内存位置:

a = Node("a")
a2 = Node("a")
b = Node("b")
print(a.memory(), a2.memory(), b.memory())

但是打印出来的内存地址都是一样的。我做错了什么？

最佳答案

显然，您正在做的并不是您认为自己会做的。

self.character[0]不返回第一个字符的地址/引用(例如数组的情况)，而是一个 Py_UCS4 -value(即一个有符号的 32 位整数)，它被复制到堆栈上的一个(本地，临时)变量。

在你的函数中，<uintptr_t>&self.character[0]获取堆栈上局部变量的地址，这可能总是相同的，因为在调用 memory 时总是有相同的堆栈布局。

为了更清楚，这是与 char * c_string 的区别, 其中&c_string[0]为您提供 c_string 中第一个字符的地址.

比较:

%%cython
from libc.stdint cimport uintptr_t

cdef char *c_string = "name";
def get_addresses_from_chars():
    for i in range(4):
        print(<uintptr_t>&c_string[i])

cdef str py_string="name";
def get_addresses_from_pystr():
    for i in range(4):
        print(<uintptr_t>&py_string[i])

一个现在:

>>> get_addresses_from_chars() # works  - different addresses every time
# ...7752
# ...7753
# ...7754
# ...7755
>>> get_addresses_from_pystr() # works differently - the same address.
# ...0672 
# ...0672
# ...0672
# ...0672

你可以这样看: c_string[...]是 cdef功能，但py_string[...]是一个 python 功能，因此不能为每个构造返回一个地址。

要影响堆栈布局，您可以使用递归函数:

def memory(self, level):
    if level==0 :
        return <uintptr_t>&self.character[0]
    else:
        return self.memory(level-1)

现在用 a.memory(0) 调用它, a.memory(1)等等会给你不同的地址(除非 tail-call-optimization 开始，我不相信它会发生，但你可以禁用优化( -O0 )只是为了确定)。因为取决于 level/recursion-depth，将返回其地址的局部变量在堆栈中的不同位置。

要查看 Unicode 对象是否被驻留，使用 id 就足够了，它产生对象的地址(这是 CPython 的实现细节)所以你根本不需要 Cython:

>>> id(a.character) == id(a2.character)
# True

或者在 Cython 中，做同样的事情 id确实(快一点):

%%cython
from libc.stdint cimport uintptr_t
from cpython cimport PyObject
...
    def memory(self):
        # cast from object to PyObject, so the address can be used
        return <uintptr_t>(<PyObject*>self.character)

你需要投一个object至 PyObject * ，因此 Cython 将允许获取变量的地址。

现在:

 >>> ...
 >>> print(a.memory(), a2.memory(), b.memory())
 # ...5800 ...5800 ...5000

如果你想得到unicode对象中第一个code-point的地址(与字符串的地址不同)，你可以使用<PY_UNICODE *>self.character Cython 将通过调用 PyUnicode_AsUnicode 来代替，例如:

%%cython
...   
def memory(self):
    return <uintptr_t>(<Py_UNICODE*>self.character), id(self.character)

现在

>>> ...
>>> print(a.memory(), a2.memory(), b.memory())
# (...768, ...800) (...768, ...800) (...144, ...000)

即"a"已被实习，地址与 "b" 不同代码点缓冲区的地址与包含它的对象的地址不同(正如人们所期望的那样)。

关于python - cython中不同字符串的相同内存地址，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56174313/

python - cython中不同字符串的相同内存地址

上一篇：python - Quickfix:如何在 Python 中使用 SSL

下一篇：python - 正则表达式获取字母数字字符串之间的非字母数字字符串