c - C 中字符的频率 - 奇怪的数字

标签 c

这是我的功能:

void printStatistics(const char *current) {
    int count = 0, i = 0, length = strlen(current);
    int lowercaseLetters[26] = {0};
    int uppercaseLetters[26] = {0};
    char *token;

    for (i = 0; i < length; i++) {
        if (current[i] >= 'a' & current[i] <= 'z') {
            lowercaseLetters[current[i] - 'a']++;
        }
    }

    for (i = 0; i < length; i++) {
        if (current[i] >= 'A' & current[i] <= 'Z') {
            uppercaseLetters[current[i] - 'A']++;
        }
    }

    char tempToken[10] = "";
    strcpy(tempToken, current);
    token = strtok(tempToken, " ");
    while (token != NULL) {
        token = strtok(NULL, " ");
        count++;
    }

    printf("Statistics:\n"
           "\tlength:\t\t%d\n"
           "\tword:\t\t%d\n"
           "Frequency:\n", length, count);

    printf("Printing Uppercase matrix...\n");
    for (i = 0; i < 26; i++) {
        printf("\tfrequency of %c:\t%d\n", 'a' + i, uppercaseLetters[i]);
    }

    printf("Printing Lowercase matrix...\n");
    for (i = 0; i < 26; i++) {
        printf("\tfrequency of %c:\t%d\n", 'a' + i, lowercaseLetters[i]);
    }
}

这是我尝试检查字符串时得到的结果

Statistics:
    length:         74
    word:           2
Frequency:
Printing Uppercase matrix...
    frequency of a: 1734829927
    frequency of b: 1734829927
    frequency of c: 1107322727
    frequency of d: 1111638594
    frequency of e: 1111638594
    frequency of f: 1111638594
    frequency of g: 1111638594
    frequency of h: 1111638594
    frequency of i: 1111638594
    frequency of j: 1111638594
    frequency of k: 1111638594
    frequency of l: 1111638594
    frequency of m: 1111638594
    frequency of n: 1111638594
    frequency of o: 1111638594
    frequency of p: 1111638594
    frequency of q: 0
    frequency of r: 0
    frequency of s: 0
    frequency of t: 0
    frequency of u: 0
    frequency of v: 0
    frequency of w: 0
    frequency of x: 0
    frequency of y: 0
    frequency of z: 0
Printing Lowercase matrix...
    frequency of a: 0
    frequency of b: 0
    frequency of c: 0
    frequency of d: 0
    frequency of e: 0
    frequency of f: 0
    frequency of g: 20
    frequency of h: 0
    frequency of i: 0
    frequency of j: 0
    frequency of k: 0
    frequency of l: 0
    frequency of m: 0
    frequency of n: 0
    frequency of o: 0
    frequency of p: 0
    frequency of q: 0
    frequency of r: 0
    frequency of s: 0
    frequency of t: 0
    frequency of u: 0
    frequency of v: 0
    frequency of w: 0
    frequency of x: 0
    frequency of y: 0
    frequency of z: 0

为什么我会在大写矩阵中得到这些奇怪的长数字?似乎我没有在大写数组之外进行索引 - 我以与小写数组完全相同的方式处理它。

我在这里做错了什么?

最佳答案

你造成 undefined behaviour 通过写入超过缓冲区的末尾。主要问题在这里:

char tempToken[10] = "";
strcpy(tempToken, current);

由于在将字符串复制到 tempToken 之前,您没有在 current 检查字符串的长度,因此您很可能会超过 9 个字符的限制(允许一个终止 '\0' 字节的额外字符)并破坏分配给其他数据的内存。

在您的情况下,这是程序调用 printStatistics() 时堆栈的样子:(但请参阅下面的注释)

+--------------------+--------------------------+--------------------------+--------------
| char tempToken[10] | int uppercaseLetters[26] | int lowercaseLetters[26] | token, etc...
+--------------------+--------------------------+--------------------------+--------------

当你复制字符串 gggggggggggggggggggg BBBBBBBBBBBBBB...tempToken 时,前十个字符完全填满这个数组,其余的被写入数组 大写字母 代替。因此,当您从该数组中获取数据时,实际上是在读回这些 ASCII 字符 (1734829927 == 0x67676767 == "gggg"; 1111638594 == 0x42424242 == "BBBB")。

如果你复制一个较长的字符串,你也会覆盖lowercaseLetters,然后是其他变量(token等)。

strncpy()功能旨在避免此类问题。您也应该使用它。


此外,正如其他人所指出的,您正在使用按位“与”运算符 &,其中需要逻辑“与”&&


  • Note: Other systems and other compilers will store things differently, and will misbehave in other ways. Your code simply crashed when compiled on my computer.

关于c - C 中字符的频率 - 奇怪的数字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39885417/

相关文章:

c - ‘outchar’, 'outint',但不是 'outfloat'

访问结构体数组中的下一个元素时,C 程序崩溃

c - 使数组指向另一个数组的内存 C

自定义 RS485 协议(protocol)

c - 强制编译器为某个变量使用某个寄存器

c - 乘法和减法的数组比较

你能帮我改进这个吗?

c++ - 在 Ubuntu 中使用 C/C++ 进行线路输入插孔感知

c - 如何打印指针指向的一定数量的字节(字符)

c - 此 C 代码不断给我带来代码块段错误