c - 为什么以 null 结尾的字符串？或者: null-terminated vs.个字符+长度存储

我正在用 C 编写语言解释器，我的 string 类型包含一个 length 属性，如下所示:

struct String
{
    char* characters;
    size_t length;
};

因此，我不得不在我的解释器中花费大量时间来手动处理这种字符串，因为 C 不包含对它的内置支持。我考虑过切换到简单的空终止字符串只是为了符合底层 C，但似乎有很多理由不这样做:

如果您使用“长度”而不是查找空值，则内置边界检查。

您必须遍历整个字符串才能找到它的长度。

您必须做额外的事情来处理以空字符结尾的字符串中间的空字符。

以 Null 结尾的字符串与 Unicode 的处理不佳。

非空结尾的字符串可以保留更多，即“Hello, world”和“Hello”的字符可以存储在同一个地方，只是长度不同。这不能用以 null 结尾的字符串来完成。

字符串切片(注意:字符串在我的语言中是不可变的)。显然第二个更慢(而且更容易出错:考虑为这两个函数添加 begin 和 end 的错误检查)。

struct String slice(struct String in, size_t begin, size_t end)
{
    struct String out;
    out.characters = in.characters + begin;
    out.length = end - begin;

    return out;
}

char* slice(char* in, size_t begin, size_t end)
{
    char* out = malloc(end - begin + 1);

    for(int i = 0; i < end - begin; i++)
        out[i] = in[i + begin];

    out[end - begin] = '\0';

    return out;
}

经过这一切，我的想法不再是我是否应该使用以 null 结尾的字符串:我正在考虑为什么 C 使用它们!

所以我的问题是:空终止是否有我遗漏的任何好处？

最佳答案

来自 Joel 的 Back to Basics :

Why do C strings work this way? It's because the PDP-7 microprocessor, on which UNIX and the C programming language were invented, had an ASCIZ string type. ASCIZ meant "ASCII with a Z (zero) at the end."

Is this the only way to store strings? No, in fact, it's one of the worst ways to store strings. For non-trivial programs, APIs, operating systems, class libraries, you should avoid ASCIZ strings like the plague.

关于c - 为什么以 null 结尾的字符串？或者: null-terminated vs.个字符+长度存储，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/1253291/

c - 为什么以 null 结尾的字符串？或者: null-terminated vs.个字符+长度存储

上一篇：c - 如何在 freopen ("out.txt", "a", stdout 之后将输出重定向回屏幕)

下一篇：c - 全局使用 argv 指针安全吗？