C++ 堆内存性能改进

我正在编写一个需要大量堆内存的函数。是否可以告诉编译器这些数据将在特定的 for 循环中被频繁访问，以提高性能(通过编译选项或类似方式)？

我不能使用堆栈的原因是我需要存储的元素数量很大，如果我尝试这样做会出现段错误。

现在代码可以正常工作，但我认为它可以更快。

更新: 我正在做这样的事情

vector< set<uint> > vec(node_vec.size());
for(uint i = 0; i < node_vec.size(); i++)
  for(uint j = i+1; j < node_vec.size(); j++)
    // some computation, basic math, store the result in variable x
      if( x > threshold ) {
         vec[i].insert(j);
         vec[j].insert(i);
      }

一些细节:
- 我使用了 hash_set，几乎没有改进，除了 hash_set 在我用于模拟目的的所有机器上都不可用
- 我尝试使用数组在堆栈上分配 vec，但正如我所说，如果元素数量太大，我可能会遇到段错误

如果 node_vec.size() 等于 k，其中 k 大约是几千，我预计 vec 比 node_vec 大 4 或 5 倍。考虑到我必须多次运行它这一事实，对于这个数量级的代码似乎很慢。当然，我正在使用多线程来并行化这些调用，但我无法让函数本身运行得比我现在看到的快得多。

例如，是否可以在高速缓存内存中分配 vec 以进行快速数据检索或类似操作？

最佳答案

I'm writing a function where I need a significant amount of heap memory ... will be accessed frequently within a specific for loop

这不是您可以在编译器级别真正优化的东西。我认为你担心的是你有很多内存可能是“陈旧的”(分页)但是在特定的时间点你需要迭代所有它，可能是几次并且你不想要内存要调出到磁盘的页面。

您需要研究特定于平台的策略以提高性能。使用 mlockall 可以将页面保留在内存中或 VirtualLock但你真的不需要这样做。但是，请确保您了解将应用程序的内存页面锁定到 RAM 中的含义。您正在占用其他进程的内存。

您可能还想调查 low fragmentation heap (但它可能与此问题根本无关)和 this page它描述了关于 for 循环的缓存行。

后一页介绍了 CPU 在内存访问方面的工作原理(您通常不必关心的细节)。

Example 1: Memory accesses and performance How much faster do you expect Loop 2 to run, compared Loop 1?
int[] arr = new int[64 * 1024 * 1024];

// Loop 1
for (int i = 0; i < arr.Length; i++) arr[i] *= 3;

// Loop 2
for (int i = 0; i < arr.Length; i += 16) arr[i] *= 3;
The first loop multiplies every value in the array by 3, and the second loop multiplies only every 16-th. The second loop only does about 6% of the work of the first loop, but on modern machines, the two for-loops take about the same time: 80 and 78 ms respectively on my machine.

关于C++ 堆内存性能改进，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/5683972/

C++ 堆内存性能改进

上一篇：c++ - 如何组织我的 C++ 游戏的 Windows 和 Android 版本？

下一篇：c++ - 如何解析 POST body/GET 参数？