C++高效大文件解析器

问题

我正在寻找一种更有效的方法来创建包含 50,000-400,000 多行的日志文件的较小部分，其中每行包含约 50 个值。简单示例:

   log file                                subsection log file
x   y   z   a   b   c                      a   b   c
1.0 2.0 3.0 4.0 5.0 6.0 ...                4.0 5.0 6.0 ...
1.1 2.1 3.1 4.1 5.1 6.1 ...    -->         4.1 5.1 6.1 ...
...                                        ...

当前实现

我目前的实现需要大约 3 分钟，这看起来很慢。

int main() {
  string input_file_name = "<path/filename>";

  motion path;
  string line;

  ifstream input_file(input_file_name);
  ofstream output_file(input_file_name + "_parsed");

  vector<string> line_split_values;
  for (line; getline(input_file, line); entry_num++) {
    boost::algorithm::split(line_split_values, line, is_any_of("\t "), boost::token_compress_on);

    // extract data points
    vector<string> line(7);
    for (int i = 0; i < 7; i++) {
      line[i] = line_split_values[3 + i];
    }
    output_file << boost::algorithm::join(line, ",") << endl;
  }
  input_file.close();
  output_file.close();
}

最佳答案

我可能会尽量避免使用 std::vector 并尝试更像这样的东西:

std::ofstream output_file(input_file_name + "_parsed");

std::string line;
while(std::getline(input_file >> line >> line >> line >> std::ws, line))
{
    output_file << line << '\n'; // '\n' should be faster than std::endl
}

假设您的“分段日志文件”是您想要的输出格式。

关于C++高效大文件解析器，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25950880/

C++高效大文件解析器

问题

当前实现

上一篇：c++ - 功能范围不正确的输出

下一篇：c++ - 从 Shotcast 和 Icecast 服务器流式传输音频文件