C 字符串数组

标签 c arrays string

我刚刚写了一个字符串拆分函数:

typedef enum {
    strspl_allocation_error = 1
} strspl_error;


int strspl(const char *string, const char separator, char ***result) {

    const int stringLength = strlen(string);
    int lastSplit = 0;
    int numberOfComponents = 1;

    // Compute the number of components
    for (int i = 0; i <= stringLength; i++) {
        if (string[i] == separator)
            numberOfComponents++;
    }

    // Allocate space to hold pointers to each component
    *result = (char **) malloc(numberOfComponents * sizeof(char *));
    if (result == NULL)
        return strspl_allocation_error;

    numberOfComponents = 0;

    for (int i = 0; i <= stringLength; i++) {
        char c = string[i];
        if (c == separator || i == stringLength) {

            const int componentLength = i - lastSplit;

            // Allocate space to hold the component
            char *component = (char *) malloc(componentLength * sizeof(char));
            if (component == NULL)
                return strspl_allocation_error;

            // Copy the characters from the string into the component
            for (int j = 0; j < componentLength; j++)
                component[j] = string[lastSplit + j];
            component[componentLength] = '\0';

            // Put the component into the array
            *result[numberOfComponents] = component;

            lastSplit = i + 1;
            numberOfComponents++;
        }
    }

    return numberOfComponents;
}

例子:

char **result;
int r = strspl("aaaBcBddddeeBk", 'B', result);

for (int i = 0; i < r; i++)
    printf("component: %s\n", result[i]);

应该输出:

component: aaa
component: c
component: ddddee
component: k

但是当我运行它时,它要么崩溃要么返回垃圾值。我不明白我在哪里犯了错误...

更新:这是一个希望没有错误的版本:

int strspl(const char *string, const char separator, char ***results) {

    const char *separatorString = (char[]){separator, '\0'};
    int numberOfComponents = 0;
    int stringLength = strlen(string);

    int lastCharacterWasSeparator = 1;

    // Compute the number of components
    for (int i = 0; i < stringLength; i++) {
        if (string[i] != separator) {
            if (lastCharacterWasSeparator)
                numberOfComponents++;
            lastCharacterWasSeparator = 0;
        }
        else
            lastCharacterWasSeparator = 1;
    }

    // Allocate space to hold pointers to components
    *results = malloc(numberOfComponents * sizeof(**results));

    char *stringCopy = strdup(string); // A reference to the copy of the string to modify it and to free() it later.
    char *strptr = stringCopy; // This will be used to iterate through the string.
    int componentLength = 0;
    int component = 0;

    while (component < numberOfComponents) {

        // Move to the startpoint of the next component.
        while (componentLength == 0) {
            componentLength = strcspn(strptr, separatorString);

            // Break out the while loop if we found an actual component.
            if (componentLength != 0)
                break;

            // If we found two adjacent separators, we just "silently" move over them.
            strptr += componentLength + 1;
        }

        // Replace the terminating separator character with a NULL character.
        strptr[componentLength] = '\0';

        // Copy the new component into the array.
        (*results)[component++] = strdup(strptr);

        // Move the string pointer ahead so we can work on the next component.
        strptr += componentLength + 1;

        componentLength = 0;
    }

    // Free the copy of the string.
    free(stringCopy);

    return numberOfComponents;
}

最佳答案

很抱歉,我们建议您为修复它所做的所有操作组合在一起并再次损坏它!根据您的原始代码,以下是您需要进行的调整:

  1. 函数的签名需要是 char ***result 而不是 char **result
  2. 数组分配应该是 *result = malloc(...) 而不是 result = malloc(...)
  3. 组件指针从未存储在结果数组中,一行内容为:(*result)[numberOfComponents] = component; 应该放在 component[componentLength] = '\0';(需要括号,因为结果参数已更改为char***)。
  4. 最后,对函数的调用应该是这样的:strspl(..., &result); 而不是 strspl(..., result);

在使用 C/C++ 时,指针一直是最难理解的东西之一......让我看看我能否解释一下:

假设调用者的堆栈是这样的:

Address     -  Data        -  Description
0x99887760  -  0xbaadbeef  -  caller-result variable (uninitialized garbage)

当这样调用时:strspl(..., result);,编译器将本地指针 (0xbaadbeef) 复制到 的堆栈中>strspl:

Address     -  Data        -  Description
0x99887750  -  0xbaadbeef  -  strspl-result variable (copy of uninitialized garbage)
...
0x99887760  -  0xbaadbeef  -  caller-result variable (uninitialized garbage)

现在当我们调用 result = malloc(...) 并将结果复制到本地 strspl-result 变量时,我们得到:

Address     -  Data        -  Description
0x99887750  -  0x01000100  -  strspl-result variable (new array)
...
0x99887760  -  0xbaadbeef  -  caller-result variable (uninitialized garbage)

显然不会更新调用者的结果变量。


如果我们用结果变量的地址调用:strspl(..., &result); 我们得到这个:

Address     -  Data        -  Description
0x99887750  -  0x99887760  -  strspl-result variable (pointer to the caller's result)
...
0x99887760  -  0xbaadbeef  -  caller-result variable (uninitialized garbage)

然后当我们调用 result = malloc(...) 时,我们得到:

Address     -  Data        -  Description
0x99887750  -  0x01000100  -  strspl-result variable (new array)
...
0x99887760  -  0xbaadbeef  -  caller-result variable (uninitialized garbage)

仍然不是我们想要的,因为调用者永远不会得到指向数组的指针。


如果我们改为调用 *result = malloc(...),我们会得到:

Address     -  Data        -  Description
0x99887750  -  0x99887760  -  strspl-result variable (pointer to the caller's result)
...
0x99887760  -  0x01000100  -  caller-result variable (new array)

这样,当我们返回时,我们已经用我们新的 malloc 数组覆盖了调用者的垃圾。


如您所见,编译器正在将调用者变量的地址复制到被调用函数的堆栈中。因为它是复制的,所以函数不能修改它,除非调用者将指针传递给它的变量(这就是为什么它需要是 char*** 而不是 char** 的原因)。

我希望这能把事情弄清楚,而不会让事情变得更难理解! :-P

关于C 字符串数组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4770456/

相关文章:

c++ - 用用户插入的元素填充数组

php - 按最后一个字符拆分字符串并保存到数组?

c - 如何在c中有效地构建字符串?

c - 我如何使用两个堆栈(LIFO)以便它可以像队列(FIFO)一样工作?

c - 在函数中操作多维数组

c - 为什么 time() 不调用系统调用?

c - 将数值变量视为文本

javascript - Vue war : Expected Array, 获得对象

javascript - 如何避免使用 2 个条件对 JavaScript 数组进行两次排序

c# - C++ 字符串与 C# 字符串,不同的运行时间。为什么?