我刚刚写了一个字符串拆分函数:
typedef enum {
strspl_allocation_error = 1
} strspl_error;
int strspl(const char *string, const char separator, char ***result) {
const int stringLength = strlen(string);
int lastSplit = 0;
int numberOfComponents = 1;
// Compute the number of components
for (int i = 0; i <= stringLength; i++) {
if (string[i] == separator)
numberOfComponents++;
}
// Allocate space to hold pointers to each component
*result = (char **) malloc(numberOfComponents * sizeof(char *));
if (result == NULL)
return strspl_allocation_error;
numberOfComponents = 0;
for (int i = 0; i <= stringLength; i++) {
char c = string[i];
if (c == separator || i == stringLength) {
const int componentLength = i - lastSplit;
// Allocate space to hold the component
char *component = (char *) malloc(componentLength * sizeof(char));
if (component == NULL)
return strspl_allocation_error;
// Copy the characters from the string into the component
for (int j = 0; j < componentLength; j++)
component[j] = string[lastSplit + j];
component[componentLength] = '\0';
// Put the component into the array
*result[numberOfComponents] = component;
lastSplit = i + 1;
numberOfComponents++;
}
}
return numberOfComponents;
}
例子:
char **result;
int r = strspl("aaaBcBddddeeBk", 'B', result);
for (int i = 0; i < r; i++)
printf("component: %s\n", result[i]);
应该输出:
component: aaa
component: c
component: ddddee
component: k
但是当我运行它时,它要么崩溃要么返回垃圾值。我不明白我在哪里犯了错误...
更新:这是一个希望没有错误的版本:
int strspl(const char *string, const char separator, char ***results) {
const char *separatorString = (char[]){separator, '\0'};
int numberOfComponents = 0;
int stringLength = strlen(string);
int lastCharacterWasSeparator = 1;
// Compute the number of components
for (int i = 0; i < stringLength; i++) {
if (string[i] != separator) {
if (lastCharacterWasSeparator)
numberOfComponents++;
lastCharacterWasSeparator = 0;
}
else
lastCharacterWasSeparator = 1;
}
// Allocate space to hold pointers to components
*results = malloc(numberOfComponents * sizeof(**results));
char *stringCopy = strdup(string); // A reference to the copy of the string to modify it and to free() it later.
char *strptr = stringCopy; // This will be used to iterate through the string.
int componentLength = 0;
int component = 0;
while (component < numberOfComponents) {
// Move to the startpoint of the next component.
while (componentLength == 0) {
componentLength = strcspn(strptr, separatorString);
// Break out the while loop if we found an actual component.
if (componentLength != 0)
break;
// If we found two adjacent separators, we just "silently" move over them.
strptr += componentLength + 1;
}
// Replace the terminating separator character with a NULL character.
strptr[componentLength] = '\0';
// Copy the new component into the array.
(*results)[component++] = strdup(strptr);
// Move the string pointer ahead so we can work on the next component.
strptr += componentLength + 1;
componentLength = 0;
}
// Free the copy of the string.
free(stringCopy);
return numberOfComponents;
}
最佳答案
很抱歉,我们建议您为修复它所做的所有操作组合在一起并再次损坏它!根据您的原始代码,以下是您需要进行的调整:
- 函数的签名需要是
char ***result
而不是char **result
。 - 数组分配应该是
*result = malloc(...)
而不是result = malloc(...)
。 - 组件指针从未存储在结果数组中,一行内容为:
(*result)[numberOfComponents] = component;
应该放在component[componentLength] = '\0';
(需要括号,因为结果参数已更改为char***
)。 - 最后,对函数的调用应该是这样的:
strspl(..., &result);
而不是strspl(..., result);
在使用 C/C++ 时,指针一直是最难理解的东西之一......让我看看我能否解释一下:
假设调用者的堆栈是这样的:
Address - Data - Description
0x99887760 - 0xbaadbeef - caller-result variable (uninitialized garbage)
当这样调用时:strspl(..., result);
,编译器将本地指针 (0xbaadbeef
) 复制到 的堆栈中>strspl
:
Address - Data - Description
0x99887750 - 0xbaadbeef - strspl-result variable (copy of uninitialized garbage)
...
0x99887760 - 0xbaadbeef - caller-result variable (uninitialized garbage)
现在当我们调用 result = malloc(...)
并将结果复制到本地 strspl-result 变量时,我们得到:
Address - Data - Description
0x99887750 - 0x01000100 - strspl-result variable (new array)
...
0x99887760 - 0xbaadbeef - caller-result variable (uninitialized garbage)
显然不会更新调用者的结果变量。
如果我们用结果变量的地址调用:strspl(..., &result);
我们得到这个:
Address - Data - Description
0x99887750 - 0x99887760 - strspl-result variable (pointer to the caller's result)
...
0x99887760 - 0xbaadbeef - caller-result variable (uninitialized garbage)
然后当我们调用 result = malloc(...)
时,我们得到:
Address - Data - Description
0x99887750 - 0x01000100 - strspl-result variable (new array)
...
0x99887760 - 0xbaadbeef - caller-result variable (uninitialized garbage)
仍然不是我们想要的,因为调用者永远不会得到指向数组的指针。
如果我们改为调用 *result = malloc(...)
,我们会得到:
Address - Data - Description
0x99887750 - 0x99887760 - strspl-result variable (pointer to the caller's result)
...
0x99887760 - 0x01000100 - caller-result variable (new array)
这样,当我们返回时,我们已经用我们新的 malloc
数组覆盖了调用者的垃圾。
如您所见,编译器正在将调用者变量的地址复制到被调用函数的堆栈中。因为它是复制的,所以函数不能修改它,除非调用者将指针传递给它的变量(这就是为什么它需要是 char***
而不是 char**
的原因)。
我希望这能把事情弄清楚,而不会让事情变得更难理解! :-P
关于C 字符串数组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4770456/