c++ - 如何在不使用函数或类的情况下重复代码段以实现 C++ 中的高性能循环

标签 c++ algorithm loops inline compiler-optimization

我的 C++11 程序正在执行序列化数据的在线处理,循环需要运行数百万个内存位置。计算效率是必须的,我担心的是,在这样的循环中调用函数或类会产生不必要的操作,从而影响效率,例如在不同的变量范围之间传递操作所需的几个指针值。


do {
    if ( spam == spam_spam ) {
    else {
while (foo<bar);



按照建议,我使用下面提供的代码运行了一些测试。我测试了几个关于如何调用简单增量 1 亿次的选项。我在 Hyper-V 下的 x86_64 虚拟机上通过 RHEL 7 Server 7.6 使用 GCC。

最初,使用“g++ -std=c++17 -o test.o test.cpp”编译

  • 简单循环计算(基线):211.046ms

  • 内联函数:468.768 毫秒

  • Lambda 函数:253.466 毫秒

  • 定义宏:211.995ms

  • 函数传递值:466.986ms

  • 函数传递指针:344.646ms

  • 带 void 的函数:190.557 毫秒

  • 对象方法与成员操作:231.458ms

  • 对象方法传递值:227.615ms

从这些结果中,我意识到编译器没有采用内联建议,即使在尝试按照 g++ doesn't inline functions 中的建议将其膨胀后也是如此。

后来,按照 Mat 在同一篇文章的回答中的建议,我使用“g++ -std=c++17 -O2 -o test.o test.cpp”打开了编译器优化,并得到了以下结果与未优化的测试相比,迭代次数相同。

  • 简单循环计算(基线):62.9254ms

  • 内联函数:65.0564 毫秒

  • Lambda 函数:32.8637 毫秒

  • 定义宏:63.0299ms

  • 函数传递值:64.2876ms

  • 函数传递指针:63.3416ms

  • 带 void 的函数:32.1073ms

  • 对象方法与成员操作:63.3847ms

  • 对象方法传递值:62.5151ms


  • 内联函数不是好的选择,因为无法确定编译器将如何真正接受它,结果可能与使用标准函数一样糟糕。

  • “定义宏”和“lambda 函数”是内联的更好替代方法。每个都有其优点和特点,#define 更灵活。

  • 使用对象成员和方法可以很好地平衡解决任何情况下的问题,同时以更易于维护和优化的形式维护代码。

  • 调整编译器是值得的;


// Libraries
    #include <iostream>
    #include <cmath>
    #include <chrono>

// Namespaces
    using namespace std;
    using namespace std::chrono;

// constants that control program behaviour
    const long END_RESULT = 100000000;
    const double AVERAGING_LENGTH = 40.0;
    const int NUMBER_OF_ALGORITHM = 9;
    const long INITIAL_VALUE = 0;
    const long INCREMENT = 1;

// Global variables used for test with void function and to general control of the program;
    long global_variable;
    long global_increment;

// Function that returns the execution time for a simple loop
int64_t simple_loop_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // Perform the computation for baseline
        do {
            local_variable += local_increment;
        } while ( local_variable != END_RESULT);

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return(duration_cast<microseconds>( timer_stop - timer_start ).count());

// Functions that computes the execution time when using inline code within the loop
inline long increment_variable() __attribute__((always_inline));
inline long increment_variable(long local_variable, long local_increment) {
    return local_variable += local_increment;

int64_t inline_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // Perform the computation for baseline
        do {
            local_variable = increment_variable(local_variable,local_increment);
        } while ( local_variable != END_RESULT);

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();

// Functions that computes the execution time when using lambda code within the loop
int64_t labda_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // define lambda function
        auto lambda_increment = [&] {
            local_variable += local_increment;

    // Perform the computation for baseline
        do {
        } while ( local_variable != END_RESULT);

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();

// define lambda function
    #define define_increment() local_variable += local_increment;

// Functions that computes the execution time when using lambda code within the loop
int64_t define_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // Perform the computation for baseline
        do {
        } while ( local_variable != END_RESULT);

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();
// Functions that compute the execution time when calling a function within the loop passing variable values
long increment_with_values_function(long local_variable, long local_increment) {
    return local_variable += local_increment;

int64_t function_values_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // Perform the computation for baseline
        do {
            local_variable = increment_with_values_function(local_variable,local_increment);
        } while ( local_variable != END_RESULT);

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();
// Functions that compute the execution time when calling a function within the loop passing variable pointers
long increment_with_pointers_function(long *local_variable, long *local_increment) {
    return *local_variable += *local_increment;

int64_t function_pointers_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // Perform the computation for baseline
        do {
            local_variable = increment_with_pointers_function(&local_variable,&local_increment);
        } while ( local_variable != END_RESULT);

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();
// Functions that compute the execution time when calling a function within the loop without passing variables 
void increment_with_void_function(void) {
    global_variable += global_increment;

int64_t function_void_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // set global variables
        global_variable = local_variable;
        global_increment = local_increment;

    // Perform the computation for baseline
        do {
        } while ( global_variable != END_RESULT);

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();
// Object and Function that compute the duration when using a method of the object where data is stored without passing variables
struct object {
    long object_variable = 0;
    long object_increment = 1;

    object(long local_variable, long local_increment) {
        object_variable = local_variable;
        object_increment = local_increment;

    void increment_object(void){

    void increment_object_with_value(long local_increment){

int64_t object_members_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // Create object
        object object_instance = {local_variable,local_increment};

    // Perform the computation for baseline
        do {
        } while ( object_instance.object_variable != END_RESULT);

    // Get the results out of the object
        local_variable = object_instance.object_variable;

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();

// Function that compute the duration when using a method of the object where data is stored passing variables
int64_t object_values_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // Create object
        object object_instance = {local_variable,local_increment};

    // Perform the computation for baseline
        do {
        } while ( object_instance.object_variable != END_RESULT);

    // Get the results out of the object
        local_variable = object_instance.object_variable;

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();

int main() {

    // Create array to store execution time results for all tests
        pair<string,int64_t> duration_sum[NUMBER_OF_ALGORITHM]={
            make_pair("Simple loop computation (baseline): ",0.0),
            make_pair("Inline Function: ",0.0),
            make_pair("Lambda Function: ",0.0),
            make_pair("Define Macro: ",0.0)
            make_pair("Function passing values: ",0.0),
            make_pair("Function passing pointers: ",0.0),
            make_pair("Function with void: ",0.0),
            make_pair("Object method operating with members: ",0.0),
            make_pair("Object method passing values: ",0.0),

    // loop to compute average of several execution times
        for ( int i = 0; i < AVERAGING_LENGTH; i++) {
            // Compute the execution time for a simple loop as the baseline
                duration_sum[0].second = duration_sum[0].second + simple_loop_computation(INITIAL_VALUE, INCREMENT);

            // Compute the execution time when using inline code within the loop (expected same as baseline)
                duration_sum[1].second = duration_sum[1].second + inline_computation(INITIAL_VALUE, INCREMENT);

            // Compute the execution time when using lambda code within the loop (expected same as baseline)
                duration_sum[2].second = duration_sum[2].second + labda_computation(INITIAL_VALUE, INCREMENT);

            // Compute the duration when using a define macro
                duration_sum[3].second = duration_sum[3].second + define_computation(INITIAL_VALUE, INCREMENT);

            // Compute the execution time when calling a function within the loop passing variables values
                duration_sum[4].second = duration_sum[4].second + function_values_computation(INITIAL_VALUE, INCREMENT);

            // Compute the execution time when calling a function within the loop passing variables pointers
                duration_sum[5].second = duration_sum[5].second + function_pointers_computation(INITIAL_VALUE, INCREMENT);

            // Compute the execution time when calling a function within the loop without passing variables
                duration_sum[6].second = duration_sum[6].second + function_void_computation(INITIAL_VALUE, INCREMENT);

            // Compute the duration when using a method of the object where data is stored without passing variables
                duration_sum[7].second = duration_sum[7].second + object_members_computation(INITIAL_VALUE, INCREMENT);

            // Compute the duration when using a method of the object where data is stored passing variables
                duration_sum[8].second = duration_sum[8].second + object_values_computation(INITIAL_VALUE, INCREMENT);

        double average_baseline_duration = 0.0;

    // Print out results
        for ( int i = 0; i < NUMBER_OF_ALGORITHM; i++) {
        // compute averave from sum
            average_baseline_duration = ((double)duration_sum[i].second/AVERAGING_LENGTH)/1000.0;

        // Print the result
            cout << duration_sum[i].first << average_baseline_duration << "ms \n";

    return 0;




关于c++ - 如何在不使用函数或类的情况下重复代码段以实现 C++ 中的高性能循环,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55383866/


C++ 映射 : Smart algorithm needed

c++ - 使用外部库的 OpenCV C++ 应用程序部署

algorithm - 指纹扫描仪的输出是什么?是否有任何确定性的识别信息?


c++ - 在两个线程之间传递数据时的时间不一致

c++ - 为每个模板实例化生成唯一的类型或 ID? (示例观察者模式)

java - Java中的基数算法理解

javascript - 无法弄清楚为什么我的 change(...) 事件处理程序在这种情况下不起作用

Java for循环错误,第一项重复

java - 如何让菜单在while循环中重新出现