c - 如何仅计算列表中以大写字母开头的单词？

如何统计仅以大写字母开头的单词数？

查找所有大写开头的单词(例如 Any = 3)
计算这些单词出现的次数(例如 Any=3 + any+4 = Any=7)

<小时/>

查看此演示中的示例代码:

https://codeforwin.org/2018/02/c-program-count-occurrences-of-all-words-a-file.html

代码示例:

/**
* C program to count occurrences of all words in a file.
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

#define MAX_WORDS   50

char *strlwr(char *str)
{
    unsigned char *p = (unsigned char *)str;

    while (*p) {
        *p = tolower((unsigned char)*p);
        p++;
    }

    return str;
}

int main()
{
    FILE *fptr;
    char path[100];
    int i, len, index, isUnique;

    // List of distinct words
    char words[MAX_WORDS][50];
    char word[50];

    // Count of distinct words
    int  count[MAX_WORDS];


    /* Input file path */
    printf("Enter file path: ");
    scanf("%s", path);


    /* Try to open file */
    fptr = fopen(path, "r");

    /* Exit if file not opened successfully */
    if (fptr == NULL)
    {
        printf("Unable to open file.\n");
        printf("Please check you have read privileges.\n");

        exit(EXIT_FAILURE);
    }

    // Initialize words count to 0
    for (i=0; i<MAX_WORDS; i++)
        count[i] = 0;




    index = 0;

    while (fscanf(fptr, "%s", word) != EOF)
    {
        // Convert word to lowercase
        strlwr(word);

        // Remove last punctuation character
        len = strlen(word);
        if (ispunct(word[len - 1]))
            word[len - 1] = '\0';


        // Check if word exits in list of all distinct words
        isUnique = 1;
        for (i=0; i<index && isUnique; i++)
        {
            if (strcmp(words[i], word) == 0)
                isUnique = 0;
        }

        // If word is unique then add it to distinct words list
        // and increment index. Otherwise increment occurrence
        // count of current word.
        if (isUnique)
        {
            strcpy(words[index], word);
            count[index]++;

            index++;
        }
        else
        {
            count[i - 1]++;
        }



    }

    // Close file
    fclose(fptr);


    /*
     * Print occurrences of all words in file.
     */
    printf("\nOccurrences of all distinct words in file: \n");
    for (i=0; i<index; i++)
    {
        /*
         * %-15s prints string in 15 character width.
         * - is used to print string left align inside
         * 15 character width space.
         */
        printf("%-15s %d\n", words[i], count[i]);
    }


    return 0;
}

在此代码示例中，他们将所有单词变为小写字母，然后对所有这些单词进行计数。

相反:如何仅将单词添加到以大写字母开头的唯一列表中，然后计算该单词的所有出现次数

是否应该将 fscanf 与 if (isupper[0]) 结合使用

Test.txt 文件

Any girl jumped over one boy.
Some car skipped to some boy.
One town drove over the town.
Any town ran under some dog.
Some girl drove to a town.
The boy walked under any town.
A town jumped over any car.
Any boy jumped from a car.
A dog ran over a boy.
A girl ran to some car.
A car ran under the girl.
The car ran on any town.
One dog walked under any dog.
A car jumped on some town.
A boy ran to a boy.
The dog drove over a boy.
A boy jumped over the car.
Some car drove on some girl.
One boy drove under some girl.
A girl walked over some dog.

预期输出:

Any  7
Some 3
One  4
The  6
A    8

当前输出:

any             7
girl            7
jumped          5
over            7
one             4
boy             10
some            10
car             9
skipped         1
to              4
town            8
drove           5
the             6
ran             6
under           5
dog             6
a               13
walked          3
from            1
on              3

可能的解决方案:

    // skip the word if it does not contain a capital letter at start
    if (islower(word[0])) {
        continue;
    }

然后创建另一个 FOR 循环，检查这些单词以小写或大写开头的情况在文件中出现的频率

最佳答案

您的工作方向是正确的，您只需稍微重新排列一下代码即可。虽然您可以使用单独的数组来尝试跟踪每个唯一的单词，同时将该索引与保存该单词出现次数的数组中的索引以及另一个数组中保存该单词是否出现在文件中的相同索引(大写或大写)进行协调不是——但是有更好的方法。

每当您必须将不同类型的数据协调为一个单元时，您都需要考虑struct。使用struct，您可以协调每个唯一的单词，无论它在文件中是否大写，以及它作为一个单元出现的次数(不区分大小写)，例如

typedef struct {            /* use a struct to hold */
    char word[MAX_WORD];    /* lowercase word, and */
    int cap, count;         /* if it appeast capitalized, and its count */
} words_t;

现在您可以简单地创建一个 words_t 数组，并能够将每个 word(小写)添加为结构中的单词，捕获它是否曾经出现过 cap斜体以及发生的总计数。

这简化了代码中的逻辑。现在您只需声明一个 words_t 数组，例如

#define MAX_WORD     50     /* max word size */
#define MAX_WORDS   512     /* max number of words */
...
    /* Array of struct of distinct words, initialized all zero */
    words_t words[MAX_WORDS] = {{ .word = "" }};

您询问您的文件名 - 验证每个用户输入，例如

    /* Input file path */
    printf ("Enter file path: ");
    if (scanf ("%s", path) != 1) {  /* validate every input */
        fputs ("error: invalid file path or cancellation.\n", stderr);
        return 1;
    }

现在循环你的单词——同时保护你的数组边界并读取每个单词:

    while (index < MAX_WORDS &&                 /* protect array bounds  */
            fscanf (fptr, "%s", word) == 1) {   /* while valid word read */

现在，存储和跟踪哪些单词出现大写所需的逻辑关键开始发挥作用。首先，您需要一个标志来测试该单词是否显示为大写，以便在将单词转换为小写之前捕获信息(以及您的 isunique 标志)。您只需使用 isupper() 测试第一个字符即可捕获单词是否大写，例如

        int iscap = 0, isunique = 1;    /* is captial, is unique flags */

        if (isupper (*word))            /* is the word uppercase */
            iscap = 1;

您不仅可以修剪单个标点符号，还可以在将单词转换为小写之前轻松修剪所有标点符号，例如

        /* remove all trailing punctuation characters */
        len = strlen (word);                    /* get length */
        while (len && ispunct(word[len - 1]))   /* only if len > 0 */
            word[--len] = 0;

        strlwr (word);                  /* convert word to lowercase */

现在，从数据存储的角度来看，剩下的就是循环确定单词是否唯一，如果不是，只需设置cap 标记 iscap 是否为 TRUE 并增加计数。如果它是唯一的，那么在退出循环后，您可以将单词复制到数组中的新元素(使用memcpy您可以获得长度，无需再次扫描)，您对 cap 和 count 执行相同的操作，完成后递增 index，

        /* check if word exits in list of all distinct words */
        for (i = 0; i < index; i++) {
            if (strcmp(words[i].word, word) == 0) {
                isunique = 0;               /* set unique flag zero */
                if (iscap)                  /* if capital flag set */
                    words[i].cap = iscap;   /* set capital flag in struct */
                words[i].count++;           /* increment word count */
                break;                      /* bail - done */
            }
        }
        if (isunique) { /* if unique, add to array, increment index */
            memcpy (words[index].word, word, len + 1);  /* have len */
            if (iscap)                      /* if cap flag set */
                words[index].cap = iscap;   /* set capital flag in struct */
            words[index++].count++;         /* increment count & index */
        }
    }
    fclose (fptr);  /* close file */

剩下的就是循环遍历数组中存储的元素并检查 .cap 成员以确定该单词在打印该单词之前是否大写以及它出现的次数(注意:您的预期输出:根据您的示例文本计数是错误的)

    /*
     * Print occurrences of all words in file.
     */
    puts ("\nOccurrences of all distinct words with Cap in file:");
    for (i = 0; i < index; i++) {
        if (words[i].cap) {
            strcpy (word, words[i].word);
            *word = toupper (*word);
            /*
             * %-15s prints string in 15 character width.
             * - is used to print string left align inside
             * 15 character width space.
             */
            printf("%-15s %d\n", word, words[i].count);
        }
    }

(注意:使用 puts 而不是 printf 来表示“出现...”不需要转换...一个好的编译器会为您优化它)

总而言之，你可以这样做:

/**
 * C program to count occurrences of all words in a file.
 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <limits.h>

#define MAX_WORD     50     /* max word size */
#define MAX_WORDS   512     /* max number of words */

#ifndef PATH_MAX
#define PATH_MAX   2048     /* max path (defined for Linux in limits.h) */
#endif

typedef struct {            /* use a struct to hold */
    char word[MAX_WORD];    /* lowercase word, and */
    int cap, count;         /* if it appeast capitalized, and its count */
} words_t;

char *strlwr (char *str)    /* no need for unsigned char */
{
    char *p = str;

    while (*p) {
        *p = tolower(*p);
        p++;
    }

    return str;
}

int main (void) {

    FILE *fptr;
    char path[PATH_MAX], word[MAX_WORD];
    size_t i, len, index = 0;

    /* Array of struct of distinct words, initialized all zero */
    words_t words[MAX_WORDS] = {{ .word = "" }};

    /* Input file path */
    printf ("Enter file path: ");
    if (scanf ("%s", path) != 1) {  /* validate every input */
        fputs ("error: invalid file path or cancellation.\n", stderr);
        return 1;
    }

    fptr = fopen (path, "r");   /* open file */
    if (fptr == NULL) {         /* validate file open */
        fputs ( "Unable to open file.\n"
                "Please check you have read privileges.\n", stderr);
        exit (EXIT_FAILURE);
    }

    while (index < MAX_WORDS &&                 /* protect array bounds  */
            fscanf (fptr, "%s", word) == 1) {   /* while valid word read */
        int iscap = 0, isunique = 1;    /* is captial, is unique flags */

        if (isupper (*word))            /* is the word uppercase */
            iscap = 1;

        /* remove all trailing punctuation characters */
        len = strlen (word);                    /* get length */
        while (len && ispunct(word[len - 1]))   /* only if len > 0 */
            word[--len] = 0;

        strlwr (word);                  /* convert word to lowercase */

        /* check if word exits in list of all distinct words */
        for (i = 0; i < index; i++) {
            if (strcmp(words[i].word, word) == 0) {
                isunique = 0;               /* set unique flag zero */
                if (iscap)                  /* if capital flag set */
                    words[i].cap = iscap;   /* set capital flag in struct */
                words[i].count++;           /* increment word count */
                break;                      /* bail - done */
            }
        }
        if (isunique) { /* if unique, add to array, increment index */
            memcpy (words[index].word, word, len + 1);  /* have len */
            if (iscap)                      /* if cap flag set */
                words[index].cap = iscap;   /* set capital flag in struct */
            words[index++].count++;         /* increment count & index */
        }
    }
    fclose (fptr);  /* close file */

    /*
     * Print occurrences of all words in file.
     */
    puts ("\nOccurrences of all distinct words with Cap in file:");
    for (i = 0; i < index; i++) {
        if (words[i].cap) {
            strcpy (word, words[i].word);
            *word = toupper (*word);
            /*
             * %-15s prints string in 15 character width.
             * - is used to print string left align inside
             * 15 character width space.
             */
            printf("%-15s %d\n", word, words[i].count);
        }
    }

    return 0;
}

示例使用/输出

使用您发布的输入

$ ./bin/unique_words_with_cap
Enter file path: dat/girljumped.txt

Occurrences of all distinct words with Cap in file:
Any             7
One             4
Some            10
The             6
A               13

(注意: “Some/some” 出现 10 次，“A/a” 出现 13 次，而不是预期输出:中显示的 3/8，您可以通过简单的计数来确认)

仔细检查一下，如果您还有其他问题，请告诉我。

关于c - 如何仅计算列表中以大写字母开头的单词？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56141468/

c - 如何仅计算列表中以大写字母开头的单词？

上一篇：c - 如何修复 sctp_sendmsg 挂起至无法访问的主机

下一篇：c - C 中的错误；方法分割错误