c++ - 计算文本文件中每个单词的出现次数

<分区>

给定一个包含多个字符串的大型文本文件，在 C++ 中读取文本文件并计算每个单词出现次数的最有效方法是什么？文本文件的大小未知，所以我不能只使用一个简单的数组。此外，还有另一个问题。该文本文件的每一行都以一个类别关键字开头，后面的词是该类别的特征。我需要能够计算每个单词在该类别中出现的次数。

例如:

colors red blue green yellow orange purple
sky blue high clouds air empty vast big
ocean wet water aquatic blue
colors brown black blue white blue blue

在这个例子中，我需要计算在“colors”类别中，“blue”出现了 4 次，尽管总共出现了 6 次共蓝色。

最佳答案

我会使用 stream用于阅读和分隔单词(它通过查找空格来分隔单词)并将它们保存到 dictionary (标准 C++ 方法是使用 std::map )。

这是一段 C++ 文档化代码:

#include <iostream>
#include <map> // A map will be used to count the words.
#include <fstream> // Will be used to read from a file.
#include <string> // The map's key value.
using namespace std;


//Will be used to print the map later.
template <class KTy, class Ty>
void PrintMap(map<KTy, Ty> map)
{
    typedef std::map<KTy, Ty>::iterator iterator;
    for (iterator p = map.begin(); p != map.end(); p++)
        cout << p->first << ": " << p->second << endl;
}

int main(void)
{
    static const char* fileName = "C:\\MyFile.txt";

    // Will store the word and count.
    map<string, unsigned int> wordsCount;


    {
        // Begin reading from file:
        ifstream fileStream(fileName);

        // Check if we've opened the file (as we should have).
        if (fileStream.is_open())
            while (fileStream.good())
            {
                // Store the next word in the file in a local variable.
                string word;
                fileStream >> word;

                //Look if it's already there.
                if (wordsCount.find(word) == wordsCount.end()) // Then we've encountered the word for a first time.
                    wordsCount[word] = 1; // Initialize it to 1.
                else // Then we've already seen it before..
                    wordsCount[word]++; // Just increment it.
            }
        else  // We couldn't open the file. Report the error in the error stream.
        {
            cerr << "Couldn't open the file." << endl;
            return EXIT_FAILURE;
        }

        // Print the words map.
        PrintMap(wordsCount);
    }

    return EXIT_SUCCESS;
}

输出:

空气:1
水生:1
大:1
黑色:1
蓝色:6
棕色:1
云:1
颜色:2
空:1
绿色:1
高:1
海洋:1
橙色:1
紫色:1
红色:1
天空:1
广大:1
水:1
湿的:1
白色:1
黄色:1

关于c++ - 计算文本文件中每个单词的出现次数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/16867944/

上一篇：c++ - 成员没有默认构造函数

下一篇：c++ - 不是 C 处指针的预期值

xml - XSLT - 使用 XPath 计算子元素的数量

mysql - 在同一个表中使用 JOIN 和 COUNT

javascript - JQuery Javascript 按最高计数对数组进行排序

Mysql行数count(*)与information_schema中的table.table_rows不一致

c++ - C++ 模板中不能接受返回参数

c++ - 特征检测算法的实现

C++ 在同一函数中返回字符串引用并传递字符串引用

c++ - 无法通过shared_from_this的继承进行编译

如果符号之间没有空格，C 计数字程序就可以工作，为什么？