计算文件中每个“英文”单词出现频率的优雅而有效的方法是什么？

最佳答案

首先，我定义letter_only std::locale以便忽略来自流的标点符号，并仅从输入流中读取有效的“英文”字母。这样，流会将词 "ways"、"ways." 和 "ways!" 视为同一个词 "ways"，因为流会忽略像 "." 和 "!" 这样的标点符号。

struct letter_only: std::ctype<char> 
{
    letter_only(): std::ctype<char>(get_table()) {}

    static std::ctype_base::mask const* get_table()
    {
        static std::vector<std::ctype_base::mask> 
            rc(std::ctype<char>::table_size,std::ctype_base::space);

        std::fill(&rc['A'], &rc['z'+1], std::ctype_base::alpha);
        return &rc[0];
    }
};

解决方案1

int main()
{
     std::map<std::string, int> wordCount;
     ifstream input;
     input.imbue(std::locale(std::locale(), new letter_only())); //enable reading only letters!
     input.open("filename.txt");
     std::string word;
     while(input >> word)
     {
         ++wordCount[word];
     }
     for (std::map<std::string, int>::iterator it = wordCount.begin(); it != wordCount.end(); ++it)
     {
           cout << it->first <<" : "<< it->second << endl;
     }
}

解决方案2

struct Counter
{
    std::map<std::string, int> wordCount;
    void operator()(const std::string & item) { ++wordCount[item]; }
    operator std::map<std::string, int>() { return wordCount; }
};

int main()
{
     ifstream input;
     input.imbue(std::locale(std::locale(), new letter_only())); //enable reading only letters!
     input.open("filename.txt");
     istream_iterator<string> start(input);
     istream_iterator<string> end;
     std::map<std::string, int> wordCount = std::for_each(start, end, Counter());
     for (std::map<std::string, int>::iterator it = wordCount.begin(); it != wordCount.end(); ++it)
     {
          cout << it->first <<" : "<< it->second << endl;
     }
 }

关于c++ - 计算文件中单词频率的优雅方法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/4888879/

c++ - 计算文件中单词频率的优雅方法

解决方案1

解决方案2

上一篇：c++ - int main(int argc, char** argv)

下一篇：c++ - 什么是用于矩阵运算的好 C++ 库