我正在尝试用 boostspirit 替换一段旧代码，该代码与字符串末尾是否存在子字符串相匹配。当我对两个实现进行基准测试时，我发现旧的实现速度更快。

我的问题是

当我用 boostspirit 替换旧代码时，期望我会获得更快的解析和匹配是否明智？
如果(1.)是肯定的，请就我做错了什么或我应该尝试什么提出一些建议。

逻辑解释: 如果子字符串是“TRINNDECK” 我需要检查“SERVERTRINN-11/TRINNDECK-1/TRINNUSER-1”中的最后一个条目是否有子字符串。所有条目都具有相似的格式(STRING-INT/STRING-INT) 所以在我的示例中，“SERVERMAGNUS-1/MAGNUSDECK-1/TRINNDECK-1”匹配，但不匹配“SERVERTRINN-11/TRINNDECK-1/TRINNUSER-1”

#include <iostream>
#include <chrono>
#include <boost/spirit/home/x3.hpp>

bool isSubStrInStr(const std::string subStr, const std::string& str)
{
    auto begin = str.find_last_of('/');
    auto end = str.find_last_of('-');
    if (end != std::string::npos)
    {
        if (begin != std::string::npos)
        {
            if (str.substr((begin + 1), (end - begin - 1)) == subStr)
            {
                return true;
            }
        }
        else
        {
            if (str.substr(0, end) == subStr)
            {
                return true;
            }
        }
    }
    return false;
}

bool isSubStrInStrSpirit(const std::string& subStr, const std::string& str)
{
    std::vector<std::string> values;
    bool const result = boost::spirit::x3::parse(
        str.begin(), str.end(),
        +boost::spirit::x3::alnum % +(boost::spirit::x3::char_('-') >> boost::spirit::x3::int_ >> "/"),
        values
    );

    if (result && values.back() == subStr)
    {
        return true;
    }
    return false;
}


int main()
{
    std::string name = "TRINNDECK";
    std::vector<std::string> infoMaps{
        "SERVERTRINN-11/TRINNDECK-1/TRINNUSER-1",
        "SERVERMAGNUS-1/MAGNUSDECK-1/TRINNDECK-1",
    };

    std::cout << "WITH SPIRIT" << std::endl;
    auto start = std::chrono::steady_clock::now();
    for (auto& item : infoMaps)
    {
        std::cout << item << "  " << std::boolalpha << isSubStrInStrSpirit(name, item) << std::endl;
    }
    auto stop = std::chrono::steady_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
    std::cout << "Time taken: " << duration.count() << " microseconds." << std::endl;

    std::cout << "WITHOUT SPIRIT" << std::endl;
    start = std::chrono::steady_clock::now();
    for (auto& item : infoMaps)
    {
        std::cout << item << "  " << std::boolalpha << isSubStrInStr(name, item) << std::endl;
    }
    stop = std::chrono::steady_clock::now();
    duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
    std::cout << "Time taken: " << duration.count() << " microseconds." << std::endl;
    return 0;
}

最佳答案

我开始优化，直到我费心阅读你的原始实现。这是我的银河大脑版本:

bool isSubStrInStrGalaxybrain(std::string_view expected, std::string_view input) {
    auto last = input.substr(input.find_last_of('/') + 1);
    return expected == last.substr(0, last.find('-'));
}

每个人都讨厌 string/string_view 接口(interface)，但如果你找到最佳点，它会非常强大。

剧透:对原始答案的其余部分持保留态度，因为除非您希望手动 SSE4 优化它(编译器可能已经这样做了)，否则没有什么可以超越它:

旧东西

我稍微简化了 Spirit 解析器，使其更加直接

bool isSubStrInStrSpirit(std::string const& expected, std::string const& input) {
    namespace x3 = boost::spirit::x3;
    std::vector<std::string> values;

    bool result = parse(input.begin(), input.end(), +x3::alnum % +('-' >> x3::int_ >> '/'), values);
    return (result && values.back() == expected);
}

让我们看看这个。

它所做的工作比您需要的多得多。特别是它

严格验证整个输入的整数和分隔符(即使它丢弃结果)
它会在 vector 中为您实际不需要的所有“值”创建 std::string 条目
每次都编译解析器表达式。

此外，您的基准测试主要测量控制台 IO。让我们使用适当的微基准工具进行统计分析:

NONIUS_BENCHMARK("Spirit", [](/*nonius::chronometer cm*/) {
    for (auto& item : infoMaps)
        isSubStrInStrSpirit(name, item);
});

NONIUS_BENCHMARK("Nospirit", [](/*nonius::chronometer cm*/) {
    for (auto& item : infoMaps)
        isSubStrInStr(name, item);
});

输出:

clock resolution: mean is 21.2206 ns (20480002 iterations)

benchmarking Spirit
collecting 100 samples, 33 iterations each, in estimated 2.1648 ms
mean: 490.892 ns, lb 479.67 ns, ub 509.147 ns, ci 0.95
std dev: 71.4656 ns, lb 49.6218 ns, ub 98.7286 ns, ci 0.95
found 21 outliers among 100 samples (21%)
variance is severely inflated by outliers

benchmarking Nospirit
collecting 100 samples, 315 iterations each, in estimated 2.1105 ms
mean: 60.2737 ns, lb 59.6758 ns, ub 61.197 ns, ci 0.95
std dev: 3.69908 ns, lb 2.68804 ns, ub 5.1282 ns, ci 0.95
found 21 outliers among 100 samples (21%)
variance is severely inflated by outliers

这证实了预期的缓慢。

修复/改进

避免重新编译解析器表达式

让我们将规则编译从热路径中取出:

namespace x3 = boost::spirit::x3;
static const inline auto rule = +x3::alnum % +('-' >> x3::int_ >> '/');

bool isSubStrInStrSehe(std::string const& expected, std::string const& input) {
    std::vector<std::string> values;
    bool result = parse(input.begin(), input.end(), rule, values);
    return (result && values.back() == expected);
}

最小的改变:)编译器确实巧妙地优化了 X3

避免字符串构造

bool isSubStrInStrSehe(std::string_view expected, std::string_view input) {
    std::vector<boost::iterator_range<char const*> > values;
    bool result = parse(input.begin(), input.end(), rule, values);
    return (result && values.back() == expected);
}

这非常简单，并且使功能界面的启动更加灵活。现在我们看到了差异!

避免 vector 构建

您只需要最后一个值。让我们切割 vector ，并使用语义 Action 来记住最后一个匹配的值:

static const inline auto token     = x3::raw[+x3::alnum];
static const inline auto trailer = '-' >> x3::int_ >> '/';

bool isSubStrInStrSehe(std::string_view expected, std::string_view input) {
    boost::iterator_range<char const*> last;
    auto assign = [&last](auto& ctx) { last = _attr(ctx); };

    bool result = parse(input.begin(), input.end(), token[assign] % +trailer);

    return (result && last == expected);
}

如果你像我一样，结果会令人失望

关于c++ - 任何方法都可以使子字符串的解析和匹配比默认解析和匹配更快，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/76405808/

c++ - 任何方法都可以使子字符串的解析和匹配比默认解析和匹配更快

旧东西

修复/改进

避免重新编译解析器表达式

避免字符串构造

避免 vector 构建

上一篇：debugging - 在 Dyalog APL 中条件断点后恢复执行

下一篇：java - 连接到远程 JMX 代理 ActiveMQ Artemis 时出错