C++ 元组与结构

使用 std::tuple 和纯数据 struct 有什么区别吗？

typedef std::tuple<int, double, bool> foo_t;

struct bar_t {
    int id;
    double value;
    bool dirty;
}

从网上查到的，主要有两个区别:struct可读性更强，而tuple有很多通用函数可以使用. 是否应该有任何显着的性能差异？另外，数据布局是否相互兼容(可互换)？

最佳答案

我们对 tuple 和 struct 进行了类似的讨论，我在一位同事的帮助下编写了一些简单的基准测试，以确定 tuple 和 struct 在性能方面的差异。我们首先从一个默认结构和一个元组开始。

struct StructData {
    int X;
    int Y;
    double Cost;
    std::string Label;

    bool operator==(const StructData &rhs) {
        return std::tie(X,Y,Cost, Label) == std::tie(rhs.X, rhs.Y, rhs.Cost, rhs.Label);
    }

    bool operator<(const StructData &rhs) {
        return X < rhs.X || (X == rhs.X && (Y < rhs.Y || (Y == rhs.Y && (Cost < rhs.Cost || (Cost == rhs.Cost && Label < rhs.Label)))));
    }
};

using TupleData = std::tuple<int, int, double, std::string>;

然后我们使用 Celero 来比较我们的简单结构和元组的性能。以下是使用 gcc-4.9.2 和 clang-4.0.0 收集的基准代码和性能结果:

std::vector<StructData> test_struct_data(const size_t N) {
    std::vector<StructData> data(N);
    std::transform(data.begin(), data.end(), data.begin(), [N](auto item) {
        std::random_device rd;
        std::mt19937 gen(rd());
        std::uniform_int_distribution<> dis(0, N);
        item.X = dis(gen);
        item.Y = dis(gen);
        item.Cost = item.X * item.Y;
        item.Label = std::to_string(item.Cost);
        return item;
    });
    return data;
}

std::vector<TupleData> test_tuple_data(const std::vector<StructData> &input) {
    std::vector<TupleData> data(input.size());
    std::transform(input.cbegin(), input.cend(), data.begin(),
                   [](auto item) { return std::tie(item.X, item.Y, item.Cost, item.Label); });
    return data;
}

constexpr int NumberOfSamples = 10;
constexpr int NumberOfIterations = 5;
constexpr size_t N = 1000000;
auto const sdata = test_struct_data(N);
auto const tdata = test_tuple_data(sdata);

CELERO_MAIN

BASELINE(Sort, struct, NumberOfSamples, NumberOfIterations) {
    std::vector<StructData> data(sdata.begin(), sdata.end());
    std::sort(data.begin(), data.end());
    // print(data);

}

BENCHMARK(Sort, tuple, NumberOfSamples, NumberOfIterations) {
    std::vector<TupleData> data(tdata.begin(), tdata.end());
    std::sort(data.begin(), data.end());
    // print(data);
}

使用 clang-4.0.0 收集的性能结果

Celero
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
     Group      |   Experiment    |   Prob. Space   |     Samples     |   Iterations    |    Baseline     |  us/Iteration   | Iterations/sec  | 
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort            | struct          | Null            |              10 |               5 |         1.00000 |    196663.40000 |            5.08 | 
Sort            | tuple           | Null            |              10 |               5 |         0.92471 |    181857.20000 |            5.50 | 
Complete.

以及使用 gcc-4.9.2 收集的性能结果

Celero
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
     Group      |   Experiment    |   Prob. Space   |     Samples     |   Iterations    |    Baseline     |  us/Iteration   | Iterations/sec  | 
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort            | struct          | Null            |              10 |               5 |         1.00000 |    219096.00000 |            4.56 | 
Sort            | tuple           | Null            |              10 |               5 |         0.91463 |    200391.80000 |            4.99 | 
Complete.

从上面的结果我们可以清楚的看到

元组比默认结构更快
clang 生成的二进制文件比 gcc 具有更高的性能。 clang-vs-gcc 不是这个讨论的目的，所以我不会深入细节。

我们都知道为每个单独的结构定义编写 == 或 < 或 > 运算符将是一项痛苦且有缺陷的任务。让我们使用 std::tie 替换我们的自定义比较器并重新运行我们的基准测试。

bool operator<(const StructData &rhs) {
    return std::tie(X,Y,Cost, Label) < std::tie(rhs.X, rhs.Y, rhs.Cost, rhs.Label);
}

Celero
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
     Group      |   Experiment    |   Prob. Space   |     Samples     |   Iterations    |    Baseline     |  us/Iteration   | Iterations/sec  | 
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort            | struct          | Null            |              10 |               5 |         1.00000 |    200508.20000 |            4.99 | 
Sort            | tuple           | Null            |              10 |               5 |         0.90033 |    180523.80000 |            5.54 | 
Complete.

现在我们可以看到，使用 std::tie 使我们的代码更优雅，更不容易出错，但是，我们会损失大约 1% 的性能。我现在将继续使用 std::tie 解决方案，因为我还会收到有关将 float 与自定义比较器进行比较的警告。

到目前为止，我们还没有任何解决方案可以让我们的结构代码运行得更快。让我们看一下swap函数并重写它，看看我们是否可以获得任何性能:

struct StructData {
    int X;
    int Y;
    double Cost;
    std::string Label;

    bool operator==(const StructData &rhs) {
        return std::tie(X,Y,Cost, Label) == std::tie(rhs.X, rhs.Y, rhs.Cost, rhs.Label);
    }

    void swap(StructData & other)
    {
        std::swap(X, other.X);
        std::swap(Y, other.Y);
        std::swap(Cost, other.Cost);
        std::swap(Label, other.Label);
    }  

    bool operator<(const StructData &rhs) {
        return std::tie(X,Y,Cost, Label) < std::tie(rhs.X, rhs.Y, rhs.Cost, rhs.Label);
    }
};

使用 clang-4.0.0 收集的性能结果

Celero
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
     Group      |   Experiment    |   Prob. Space   |     Samples     |   Iterations    |    Baseline     |  us/Iteration   | Iterations/sec  | 
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort            | struct          | Null            |              10 |               5 |         1.00000 |    176308.80000 |            5.67 | 
Sort            | tuple           | Null            |              10 |               5 |         1.02699 |    181067.60000 |            5.52 | 
Complete.

以及使用gcc-4.9.2收集的性能结果

Celero
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
     Group      |   Experiment    |   Prob. Space   |     Samples     |   Iterations    |    Baseline     |  us/Iteration   | Iterations/sec  | 
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort            | struct          | Null            |              10 |               5 |         1.00000 |    198844.80000 |            5.03 | 
Sort            | tuple           | Null            |              10 |               5 |         1.00601 |    200039.80000 |            5.00 | 
Complete.

现在我们的结构体比元组快一点(clang 大约 3%，gcc 不到 1%)，但是，我们确实需要为所有结构体编写自定义交换函数。

关于C++ 元组与结构，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/5852261/

上一篇：c++ - 为什么 C 和 C++ 支持在结构中按成员分配数组，但通常不支持？

下一篇：c++ - 使用 c++11 复制代码