C - wcstok() 错误结果

标签 c wchar-t

我的程序中的其中一个函数出现问题。我有一个由句子组成的文本。在每个句子中,我需要找到符号“@”、“#”、“%”并将它们更改为“(at)”、“<решетка>”、“”。我使用 wcstok 来完成此操作,因为我正在使用俄语。我有以下问题。

Input:

He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without tak%ing a fish. In the first forty days a boy had been with him. But after forty days without a fish the boy’s parents had told him that the old man was now definitely and finally sa@lao, which is the worst form of unlucky, and the boy had gone at their orders in another boat which caught three good fis#h the first week.

输出:

He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without tak<>ing a fish. In the first forty days a boy had been with him. B(at) (at)f(at)er for(at)y d(at)ys wi(at)ho(at) (at) fish (at)he boy’s p(at)ren(at)s h(at)d (at)old him (at)h(at) (at)he old m(at)n w(at)s now defini(at)ely (at)nd fin(at)lly s(at)l(at)o, which is (at)he wors(at) form of (at)nl(at)cky, (at)nd (at)he boy h(at)d gone (at) (at)heir orders in (at)no(at)her bo(at) which c(at)gh(at) (at)hree good fis(at)h (at)he firs(at) week.

如您所见,它将所有字母“a”和“t”更改为“(at)”。我不明白为什么会发生这种情况。俄语字母的情况也是如此。这是两个函数,负责这项工作。

void changeSomeSymbols(Text *text) {
wchar_t atSymbol = L'@';
wchar_t atString[5] = L"(at)";
wchar_t percentSymbol = L'%';
wchar_t percentString[10] = L"<percent>";
wchar_t barsSymbol = L'#';
wchar_t barsString[10] = L"<решетка>";
for (int i = 0; i < text->textSize; i++) {
    for (int j = 0; j < text->sentences[i].sentenceSize; j++) {
        switch (text->sentences[i].symbols[j])
        {
        case L'@':
            changeSentence(&(text->sentences[i]), &atSymbol, atString);
            break;
        case L'#':
            changeSentence(&(text->sentences[i]), &barsSymbol, barsString);
            break;
        case L'%':
            changeSentence(&(text->sentences[i]), &percentSymbol, percentString);
            break;
        default:
            break;
        }
    }
}

}

void changeSentence(Sentence *sentence, wchar_t *flagSymbol, wchar_t *insertWstr) {
wchar_t *pwc;
wchar_t *newWcsentence;
wchar_t *buffer;
int insertionSize;
int tokenSize;
int newSentenceSize = 0;
insertionSize = wcslen(insertWstr);
newWcsentence = (wchar_t*)malloc(1 * sizeof(wchar_t));
newWcsentence[0] = L'\0';
pwc = wcstok(sentence->symbols, flagSymbol, &buffer);
do {
    tokenSize = wcslen(pwc);
    newWcsentence = (wchar_t*)realloc(newWcsentence, (newSentenceSize + tokenSize + 1) * sizeof(wchar_t));
    newSentenceSize += tokenSize;
    wcscat(newWcsentence, pwc);
    newWcsentence = (wchar_t*)realloc(newWcsentence, (newSentenceSize + insertionSize + 1) * sizeof(wchar_t));
    newSentenceSize += insertionSize;
    wcscat(newWcsentence, insertWstr);
    pwc = wcstok(NULL, flagSymbol, &buffer);
} while (pwc != NULL);
newSentenceSize -= insertionSize;
newWcsentence = (wchar_t*)realloc(newWcsentence, (newSentenceSize) * sizeof(wchar_t));
newWcsentence[newSentenceSize] = '\0';
free(sentence->symbols);
sentence->symbols = (wchar_t*)malloc((newSentenceSize + 1) * sizeof(wchar_t));
wcscpy(sentence->symbols, newWcsentence);
sentence->sentenceSize = newSentenceSize;
free(pwc);
free(newWcsentence);

}

最佳答案

TextSentence 未定义,不清楚它们应该是什么。只需在一个函数中完成即可。

void realloc_and_copy(wchar_t** dst, int *dstlen, const wchar_t *src)
{
    if(!src)
        return;
    int srclen = wcslen(src);
    *dst = realloc(*dst, (*dstlen + srclen + 1) * sizeof(wchar_t));
    if (*dstlen)
        wcscat(*dst, src);
    else
        wcscpy(*dst, src);
    *dstlen += srclen;
}

int main()
{
    const wchar_t* src = L"He was an old man who fished alone in a skiff \
in the Gulf Stream and he had gone eighty - four days now without tak%ing a fish.\
In the first forty days a boy had been with him.But after forty days without a fish \
the boy’s parents had told him that the old man was now definitely and finally sa@lao, \
which is the worst form of unlucky, and the boy had gone at their orders in another \
boat which caught three good fis#h the first week.";

    wchar_t *buf = wcsdup(src);
    wchar_t *dst = NULL;
    int dstlen = 0;

    wchar_t *context = NULL;
    const wchar_t* delimiter = L"@#%";
    wchar_t *token = wcstok(buf, delimiter, &context);
    while(token)
    {
        const wchar_t* modify = NULL;
        int cursor = token - buf - 1;
        if (cursor >= 0)
            switch(src[cursor])
            {
            case L'@': modify = L"(at)"; break;
            case L'%': modify = L"<percent>"; break;
            case L'#': modify = L"<решетка>"; break;
            }

        //append modified text
        realloc_and_copy(&dst, &dstlen, modify);

        //append token
        realloc_and_copy(&dst, &dstlen, token);

        token = wcstok(NULL, delimiter, &context);
    }

    wprintf(L"%s\n", dst);

    free(buf);
    free(dst);

    return 0;
}

关于C - wcstok() 错误结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53792617/

相关文章:

c++ - 如何在不丢失数据的情况下将 wchar_t* 转换为 char*?

c - malloc函数(动态内存分配)导致全局使用时出错

c - 带有格式化 float 的 scanf

c++ - 为什么 towlower() 函数不将 Я 转换为小写 я?

windows - 如何在 Linux 中使用 POSIX 方法从文件中读取 Unicode-16 字符串?

c - wcwidth() 参数的预期编码

c++ - 在应用程序文件夹中为 C/C++ 安装库 "IGRAPH"

c - C 中的线程 - 关于多线程的教科书答案

c - 在 C 中打印一个空的二维数组

c++ - 如何将char转换为wchar_t *?