c++ - 在 C++ 中创建自定义比较器

标签 c++ algorithm sorting

背景:

我今天在一次在线练习面试中被问到这个问题,我很难找出一个自定义比较器来排序。这是问题

问题:

Implement a document scanning function wordCountEngine, which receives a string document and returns a list of all unique words in it and their number of occurrences, sorted by the number of occurrences in a descending order. If two or more words have the same count, they should be sorted according to their order in the original sentence. Assume that all letters are in english alphabet. You function should be case-insensitive, so for instance, the words “Perfect” and “perfect” should be considered the same word.

The engine should strip out punctuation (even in the middle of a word) and use whitespaces to separate words.

Analyze the time and space complexities of your solution. Try to optimize for time while keeping a polynomial space complexity.

Examples:

input: document = "Practice makes perfect. you'll only get Perfect by practice. just practice!"

output: [ ["practice", "3"], ["perfect", "2"], ["makes", "1"], ["youll", "1"], ["only", "1"], ["get", "1"], ["by", "1"], ["just", "1"] ]

我的想法:

我首先想到的是首先将没有标点符号且全部为小写的字符串放入字符串 vector 中。然后我使用 unordered_map 容器来存储字符串及其出现次数。我遇到困难的地方是创建一个自定义比较器,以确保如果我有一个具有相同计数的字符串,那么我会根据它在实际给定字符串中的优先级对其进行排序。

代码:

#include <iostream>
#include <string>
#include <vector>
#include <unordered_map>
#include <sstream>
#include <iterator>
#include <numeric>
#include <algorithm>
using namespace std;


struct cmp
{
  bool operator()(std::string& word1, std::string& word2)
  {

  }
};

vector<vector<string>> wordCountEngine( const string& document ) 
{
  // your code goes here
  // Step 1
  auto doc = document;
  std::string str;
  remove_copy_if(doc.begin(), doc.end(), std::back_inserter(str), 
                     std::ptr_fun<int, int>(&std::ispunct));
  for(int i = 0; i < str.size(); ++i)
    str[i] = tolower(str[i]);
  std::stringstream ss(str);
  istream_iterator<std::string> begin(ss);
  istream_iterator<std::string> end;
  std::vector<std::string> vec(begin, end);

  // Step 2
  std::unordered_map<std::string, int> m;
  for(auto word : vec)
    m[word]++;

  // Step 3
  std::vector<std::vector<std::string>> result;
  for(auto it : m)
  {
    result.push_back({it.first, std::to_string(it.second)});
  }


  return result;

}

int main() {

  std::string document = "Practice makes perfect. you'll only get Perfect by practice. just practice!";
  auto result = wordCountEngine(document);
  for(int i = 0; i < result.size(); ++i)
  {
    for(int j = 0; j < result[0].size(); ++j)
    {
      std::cout << result[i][j] << " ";
    }
    std::cout << "\n";
  }

  return 0;
}

如果有人可以帮助我学习如何为此代码构建自定义比较器,我将不胜感激。

最佳答案

你可以使用 std::vector<std::pair<std::string, int>> ,每一对代表一个词和该词在序列中出现的次数。当两个或多个单词具有相同的计数时,使用 vector 将有助于保持原始序列的顺序。最后按出现次数排序。

#include <vector>
#include <algorithm>
#include <string>
#include <sstream>

std::vector<std::vector<std::string>> wordCountEngine(const std::string& document)
{
    std::vector<std::pair<std::string, int>> words;
    std::istringstream ss(document);
    std::string word;

    //Loop through words in sequence
    while (getline(ss, word, ' '))
    {
        //Convert to lowercase
        std::transform(word.begin(), word.end(), word.begin(), tolower);

        //Remove punctuation characters
        auto it = std::remove_if(word.begin(), word.end(), [](char c) { return !isalpha(c); });
        word.erase(it, word.end());

        //Find this word in the result vector
        auto pos = std::find_if(words.begin(), words.end(),
            [&word](const std::pair<std::string, int>& p) { return p.first == word; });
        if (pos == words.end()) {
            words.push_back({ word, 1 });  //Doesn't occur -> add it
        }
        else {
            pos->second++;                 //Increment count
        }
    }

    //Sort vector by word occurrences
    std::sort(words.begin(), words.end(),
        [](const std::pair<std::string, int>& p1, const std::pair<std::string, int>& p2) { return p1.second > p2.second; });

    //Convert to vector<vector<string>>
    std::vector<std::vector<std::string>> result;
    result.reserve(words.size());

    for (auto& p : words)
    {
        std::vector<std::string> v = { p.first, std::to_string(p.second) };
        result.push_back(v);
    }
    return result;
}

int main()
{
    std::string document = "Practice makes perfect. you'll only get Perfect by practice. just practice!";
    auto result = wordCountEngine(document);
    for (auto& word : result)
    {
        std::cout << word[0] << ", " << word[1] << std::endl;
    }
    return 0;
}

输出:
实践,3
完美,2
使,1
你,1
只有,1
得到,1
通过, 1
只是,1

关于c++ - 在 C++ 中创建自定义比较器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58830827/

相关文章:

algorithm - 识别数据 block (段)以实现集群排序

c++ - 访问 3 索引数组的最快方法

c++ - 使用 Android NDK 构建 mariadb 客户端

java - 将二进制数转换为十进制数

algorithm - 四面体网格中的点位置

c++ - List sort() vs own sort() 时差

c++ - 使用nanoflann的缠结行为

algorithm - 是否有任何算法可以将图像投影到非平面上?

java - 如何在不改变列表的情况下对列表进行排序

JQuery tablesorter 插件 - 修改行后更新排序