我正在做一些涉及 std:vector
的快速基准测试。我从一个相对较小的 100 个整数 vector 开始,并调用各种方法用 1,000,000 个整数填充它。我的大多数功能涉及清除元素并再次添加元素或创建新 vector 并将其移动或与原始 vector 交换。我还有一个函数,可以调整 vector 大小并覆盖元素。
您可以在下面的代码中看到这些函数。有趣的是,调整 vector 大小并覆盖元素是迄今为止最快的。我认为在推送元素之前保留内存会提高性能。
我知道 std::vector::resize()
将调整 vector 的大小以包含新的计数。根据cppreference :
If the current size is less than count, additional elements are appended and initialized with copies of value.
resize()
应该比其他函数少构造 100 个整数。所以我对速度的差异感到惊讶。我认为resize()会分配并初始化新元素,而reserve只会分配内存。
#include <algorithm>
#include <chrono>
#include <iostream>
constexpr int InitialSize = 100;
constexpr int NewSize = 1000000;
void overwrite(std::vector<int>& v)
{
v.resize(NewSize);
for (int i = 0; i < NewSize; ++i)
{
v[i] = i;
}
}
void clear(std::vector<int>& v)
{
v.clear();
v.reserve(NewSize);
for (int i = 0; i < NewSize; ++i)
{
v.push_back(i);
}
}
void swap(std::vector<int> &v)
{
std::vector<int> vnew;
vnew.reserve(NewSize);
for (int i = 0; i < NewSize; ++i)
{
vnew.push_back(i);
}
v.swap(vnew);
}
void move(std::vector<int> &v)
{
std::vector<int> vnew;
vnew.reserve(NewSize);
for (int i = 0; i < NewSize; ++i)
{
vnew.push_back(i);
}
v = std::move(vnew);
}
int main()
{
{
std::vector<int> v(InitialSize);
std::iota(v.begin(), v.end(), 1);
auto start = std::chrono::high_resolution_clock::now();
move(v);
auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> elapsed = finish - start;
std::cout << "Move - elapsed time: " << elapsed.count() << " ms" << std::endl;
}
{
std::vector<int> v(InitialSize);
std::iota(v.begin(), v.end(), 1);
auto start = std::chrono::high_resolution_clock::now();
clear(v);
auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> elapsed = finish - start;
std::cout << "Clear - elapsed time: " << elapsed.count() << " ms" << std::endl;
}
{
std::vector<int> v(InitialSize);
std::iota(v.begin(), v.end(), 1);
auto start = std::chrono::high_resolution_clock::now();
swap(v);
auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> elapsed = finish - start;
std::cout << "Swap - elapsed time: " << elapsed.count() << " ms" << std::endl;
}
{
std::vector<int> v(InitialSize);
std::iota(v.begin(), v.end(), 1);
auto start = std::chrono::high_resolution_clock::now();
overwrite(v);
auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> elapsed = finish - start;
std::cout << "Overwrite - elapsed time: " << elapsed.count() << " ms" << std::endl;
}
return 0;
}
输出:
Move - elapsed time: 14.6284 ms
Clear - elapsed time: 17.5072 ms
Swap - elapsed time: 12.9111 ms
Overwrite - elapsed time: 5.19079 ms
快速工作台results .
有人能解释一下这是怎么回事吗?
最佳答案
push_back 是比基于索引的访问成本更高的操作,即使分配已经由保留预先处理好。
- push_back 需要处理结束指针,以便正确计算 vector 大小
- push_back 将检查是否需要重新分配。本质上是分支预测。
- push_back 将导致值的复制(或移动)被推回。如果是 int,则不会导致性能差异。
如果您看到汇编转换(取自问题中给出的 godbolt 链接),则索引操作是少量移动和移位操作的非分支序列,而 push_back 涉及更多。在长时间运行的循环中(给定示例中为 1000000),这种差异很重要。编译器优化级别肯定会影响差异。
对于索引运算符[]
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
mov QWORD PTR [rbp-16], rsi
mov rax, QWORD PTR [rbp-8]
mov rax, QWORD PTR [rax]
mov rdx, QWORD PTR [rbp-16]
sal rdx, 2
add rax, rdx
pop rbp
ret
对于push_back
push rbp
mov rbp, rsp
sub rsp, 16
mov QWORD PTR [rbp-8], rdi
mov QWORD PTR [rbp-16], rsi
mov rax, QWORD PTR [rbp-8]
mov rdx, QWORD PTR [rax+8]
mov rax, QWORD PTR [rbp-8]
mov rax, QWORD PTR [rax+16]
cmp rdx, rax
je .L73
mov rax, QWORD PTR [rbp-8] // When allocation is not needed
mov rcx, QWORD PTR [rax+8]
mov rax, QWORD PTR [rbp-8]
mov rdx, QWORD PTR [rbp-16]
mov rsi, rcx
mov rdi, rax
call void std::allocator_traits<std::allocator<int> >::construct<int, int const&>(std::allocator<int>&, int*, int const&)
mov rax, QWORD PTR [rbp-8]
mov rax, QWORD PTR [rax+8]
lea rdx, [rax+4]
mov rax, QWORD PTR [rbp-8]
mov QWORD PTR [rax+8], rdx
jmp .L75
.L73: // When allocation is needed
mov rax, QWORD PTR [rbp-8]
mov rdi, rax
call std::vector<int, std::allocator<int> >::end()
mov rcx, rax
mov rdx, QWORD PTR [rbp-16]
mov rax, QWORD PTR [rbp-8]
mov rsi, rcx
mov rdi, rax
.L75:
nop
leave
ret
关于c++ - 为什么调整 vector 大小比 Reserve 和 Push_back 更快?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60583054/