c - 尝试查找一行中单词的索引(以单词数衡量)

我有以下代码，用于检查一行。计算一些单词并统计每个单词出现的频率。基本上，我想添加一个计数器，以便对于每个单词(通过 getNextWord 获取)，它将将该单词的数量记录为 struct word_t 中的 wordCount (例如，如果该行是“all, cats! Swim very远”，我需要它为所有人记录 0，为猫记录 1，为游泳记录 2 等)这是我正在研究的功能:

void
func4(line_t * linePtr, int argnum, char * argval[]) {

    int getfrequency(line_t* linePtr, char* checkword);
    word_t word;
    int Size;
    int i;
    int k=0;
    int wordfound[MAX_QUERYLENGTH]; /* 1 if query term matches word in line */
    for(i = 1; i < argnum; i++,k++) {
        int freq = 0;
        int offsetval = 0;
        int* offset = &offsetval;
        wordfound[i] = 0; 


        if (wordfound[i] == 1) {
            freq = getfrequency(linePtr, argval[i]);            
            /* this is where I require access to the index */
        }
    }                   
}

我的结构在这里定义。请注意，stopIndex 和 startIndex 以字符为单位。我正在寻找的索引(以单词为单位)将称为 wordCount(如下所列)。

typedef struct word_t {
  int startIndex;
  int stopIndex;
  int wordCount;
  int frequency;
  char buf[MAX_LINELEN];
  int wordSz; /* szie of word*/
} word_t;

仅供引用，我也将函数调用放在这里: while (-1 != (line.lineLength = getline((char**) &line.buf, &maxSz, stdin))) { /* 这里有更多代码! */ stage4(&line,argc,argv,&word); }

我尝试在调用 getNextWord 之后添加 word.wordCount++，但这不起作用(即使在函数调用所在的 while 循环中初始化为 0 之后)。我也一直在尝试使用 wordPtr->wordCount，但没有得到我想要的。

谢谢。

进一步说明: 我正在接受命令行参数，其中 argv 是 argval。这些组成了查询术语。然后，我逐行搜索文本(标准输入)中的查询术语。频率是与查询词匹配的单词在文本的 1 行中出现的频率(例如，如果查询词是“cat hat mat”并且标准输入的第一行是“cat cat cat sat that”，则猫的出现频率为 3)。

我需要的是获取 stdin 的每一行并记录与查询项匹配的每个单词的单词编号 (wordCount)(使用 strncasecmp 确定匹配)。也就是说，如果查询是“cat hat mat”并且标准输入的第一行是“cat cat cat sat mat”，我需要记录第一只猫位于 wordCount = 0，第二只猫位于 wordCount = 1，第三只猫位于 wordCount = 2，mat 位于 wordCount = 4。我只需要一种将每个 wordCount 存储在我的结构中的方法，以供以后访问。

最佳答案

我会以不同的方式处理这个问题。您可以在单个循环中获取单词的频率和第一次出现的次数。我还会交换内部循环和外部循环，这样您就不必再次对循环进行标记化。

使用每个单词的计数器来跟踪单词索引。同样保留频率数组。

void stage4(line_t * linePtr, int argnum, char * argval[])
{
    word_t word;
    int wordindex[MAX_QUERYLENGTH];    /* First index in this line or -1 if none */
    int freq[MAX_QUERYLENGTH];         /* Frequency in this line */
    int count = 0;                     /* Word counter */

    int offsetval = 0;
    int* offset = &offsetval;
    int i;    

    /* Initialise */
    for(i = 1; i < argnum; i++) {
        wordindex[i] = -1;
        freq[i] = 0;
    }

    word = getNextWord(linePtr->buf, linePtr->lineLength, offset, &word);

    while (0 != word.stopIndex) {           
        for(i = 1; i < argnum; i++,k++) {
            if (0 == strcasecmp(argval[i], word.buf)) {
                /* Found a word, update frequency and pos */
                if (freq[i] == 0) wordindex[i] = count;
                freq[i]++;

                /* Assuming argval is unique, there are no further hits */
                break;
            }
        }                   

        /* Update word count and read next word */
        count++;
        word = getNextWord(linePtr->buf, linePtr->lineLength, offset, &word);
    }    

    /* Print frequencies */    
    for(i = 1; i < argnum; i++) {
        if (wordindex[i] >= 0) printf("%s: %d\n", argval[i], freq[i]);
    }   
}

此代码未经测试，因为我无法访问您的(相当复杂的)阅读功能。它还没有解决如何存储本地频率和位置以供进一步使用。但它应该可以让您大致了解如何解决这个问题。

(此处单词索引的 -1 是多余的，因为单词的缺失已经由零频率表示，但如果频率是所有行上的累积值，则可以使用它。)

关于c - 尝试查找一行中单词的索引(以单词数衡量)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26031705/

c - 尝试查找一行中单词的索引(以单词数衡量)

上一篇：c - 从多个线程读取数组时要注意什么？

下一篇：c - ( C ) 无法找到动态分配结构的核心转储错误的原因