algorithm - 使用 STL 运行长度使用 std::adjacent_find 对字符串进行编码

我正在尝试对我正在使用的特殊协议(protocol)的字符串执行运行长度压缩。当运行大小或字符串中的特定字符 >=3 时，运行被认为是有效的。有人可以帮助我实现这一目标。我有 live demo on coliru 。我很确定这可以通过标准库的 std::adjacent_find 实现。与 std::not_equal_to<> 的组合作为二元谓词来搜索运行边界并可能使用 std::equal_to<>一旦我找到边界。到目前为止，这是我所拥有的，但我对结果有疑问:

给定以下包含运行或空格和其他字符的输入文本字符串(在本例中为字母“s”的运行:

"---thisssss---is-a---tesst--"

我正在尝试将上面的文本字符串转换为一个向量，其中包含的元素要么是 > 2 个字符的纯运行，要么是混合字符。结果几乎正确，但不完全正确，我无法发现错误。

g++ -std=c++14 -O2 -Wall -pedantic -pthread main.cpp && ./a.out expected the following ====================== ---,thi,sssss,---,is-a,---,tesst--, actual results ============== ---,thi,sssss,---,is-a,---,te,ss,--,

编辑:我修复了之前的代码，使这个版本更接近最终解决方案。具体来说，我添加了运行大小的显式测试 > 2 以包含在内。不过，我似乎遇到了边界案例问题——所有空格案例以及字符串末尾以多个空格结尾的案例:

#include <iterator> #include <iostream> #include <memory> #include <string> #include <vector> #include <algorithm> #include <functional> int main() { // I want to convert this string containing adjacent runs of characters std::string testString("---thisssss---is-a---tesst--"); // to the following std::vector<std::string> idealResults = { "---", "thi", "sssss", "---", "is-a", "---", "tesst--" }; std::vector<std::string> tokenizedStrings; auto adjIter = testString.begin(); auto lastIter = adjIter; // temporary string used to accumulate characters that // are not part of a run. std::unique_ptr<std::string> stringWithoutRun; while ((adjIter = std::adjacent_find( adjIter, testString.end(), std::not_equal_to<>())) != testString.end()) { auto next = std::string(lastIter, adjIter + 1); // append to foo if < run threshold if (next.length() < 2) { if (!stringWithoutRun) { stringWithoutRun = std::make_unique<std::string>(); } *stringWithoutRun += next; } else { // if we have encountered non run characters, save them first if (stringWithoutRun) { tokenizedStrings.push_back(*stringWithoutRun); stringWithoutRun.reset(); } tokenizedStrings.push_back(next); } lastIter = adjIter + 1; adjIter = adjIter + 1; } tokenizedStrings.push_back(std::string(lastIter, adjIter)); std::cout << "expected the following" << std::endl; std::cout << "======================" << std::endl; std::copy(idealResults.begin(), idealResults.end(), std::ostream_iterator<std::string>(std::cout, ",")); std::cout << std::endl; std::cout << "actual results" << std::endl; std::cout << "==============" << std::endl; std::copy(tokenizedStrings.begin(), tokenizedStrings.end(), std::ostream_iterator<std::string>(std::cout, ",")); std::cout << std::endl; }

最佳答案

if (next.length() < 2) { if (!stringWithoutRun) { stringWithoutRun = std::make_unique<std::string>(); } *stringWithoutRun += next; }

这应该是 if (next.length() <= 2) .如果当前 token 的长度为 1，则需要向当前 token 添加一系列相同的字符。或 2 .

I seem to be having boundary case problems though - the all spaces case and the case where the end of the strings ends in several spaces

当 stringWithoutRun循环结束后不为空，其中累积的字符不会添加到 token 数组中。您可以这样修复它:

// The loop has finished if (stringWithoutRun) tokenizedStrings.push_back(*stringWithoutRun); tokenizedStrings.push_back(std::string(lastIter, adjIter));

关于algorithm - 使用 STL 运行长度使用 std::adjacent_find 对字符串进行编码，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45666512/

algorithm - 使用 STL 运行长度使用 std::adjacent_find 对字符串进行编码

上一篇：java - 一组 3D 点的聚类

下一篇：algorithm - Python递归理解问题