c++ - 将大量二进制数据加载到 RAM 中

我的应用程序需要从 MB 到几十 GB 的二进制数据(多个文件)加载到 RAM 中。经过一番搜索，我决定使用 std::vector<unsigned char>为此目的，尽管我不确定这是最佳选择。

我会为每个文件使用一个 vector 。由于应用程序先前知道文件大小，因此它会调用 reserve()为其分配内存。有时应用程序可能需要完整地读取一个文件，而在其他一些应用程序中只需要读取文件的一部分，而 vector 的迭代器非常适合这种情况。它可能需要从 RAM 中卸载一个文件并将其他文件放在适当的位置，std::vector::swap()和 std::vector::shrink_to_fit()会很有用。我不想在处理低级内存分配方面进行艰苦的工作(否则我会选择 C)。

我有一些问题:

应用程序必须从列表中尽可能多地将文件加载到 RAM 中。它怎么知道是否有足够的内存空间来加载一个文件？它应该叫reserve()吗？并寻找错误？如何？引用只说 reserve()当请求的大小大于 std::vector::max_size 时抛出异常.
是std::vector<unsigned char>适用于将如此大量的二进制数据放入 RAM 中吗？我很担心 std::vector::max_size ，因为它的引用说它的值将取决于系统或实现限制。我认为系统限制是可用 RAM，对吗？所以，没问题。但是实现限制呢？是否有任何关于可能阻止我做我想做的事情的实现？案例肯定，请给我一个替代方案。
如果我想使用除 N GB 之外的整个 RAM 空间怎么办？真正使用的最佳方式是sysinfo()并根据可用 RAM 推断是否可以加载每个文件？

Obs.:应用程序的这一部分必须获得尽可能高的性能(低处理时间/CPU 使用率和 RAM 消耗)。非常感谢您的帮助。

最佳答案

How would it know if there is enough memory space to load one more file?

你不会事先知道。将加载过程包装在 try - catch 中。如果内存用完，则会抛出 std::bad_alloc(假设您使用默认分配器)。在加载代码时假设内存充足，在异常处理程序中处理内存不足。

But what about implementations limitation? ... Are there anything regarding to implementations that could prevent me from doing what I want to?

您可以在运行时检查 std::vector::max_size 来验证。

如果程序是用 64 位字大小编译的，那么 vector 很可能有足够的 max_size 来容纳几百 GB。

This section of the application must be get the more performance

这与

冲突

I don't want to have the hard work of dealing with low level memory allocation stuff

但如果低级内存的东西对于性能来说是值得的，你可以 memory-map将文件存入内存。

I've read on some SO questions to avoid them on applications that need high performance and prefer dealing with return values, errno, etc

不幸的是，如果您使用标准容器，则非抛出内存分配不是一个选项。如果您对异常过敏，那么您必须使用 vector 的另一种实现 - 或者您决定使用的任何容器。不过，您不需要任何带有 mmap 的容器。

Won't handling exceptions break performance?

幸运的是，与从磁盘读取数百 GB 数据相比，异常的运行时间成本微不足道。

May it be better to run sysinfo() and work on checking free RAM before loading a file?

sysinfo 调用可能比处理异常慢很多(我没有测量，这只是一个猜想)- 它不会告诉您可能存在的进程特定限制。

And also, it looks hard and costly to repetitively try load a file, catch exception and try load a smaller file (requires recursion?)

不需要递归。如果您愿意，可以使用它；它可以用尾调用编写，可以优化掉。

About memory mapping: I took a look on it sometime ago and found boring to deal with. Would require to use C's open() and all that stuff and say bye to std::fstream.

一旦你映射了内存，它比std::fstream更容易使用。您可以跳过复制到 vector 部分，只需使用映射内存，就好像它是内存中已经存在的数组一样。

Looks like best way of partially reading a file using std::fstream is to derive std::streambuf

我不明白你为什么需要派生任何东西。只需使用 std::basic_fstream::seekg() 即可跳至您希望阅读的部分。

关于c++ - 将大量二进制数据加载到 RAM 中，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38653627/

c++ - 将大量二进制数据加载到 RAM 中

上一篇：c++ - 我无法理解奇怪的 std::atomic_short.load() 行为

下一篇：c++ - 如何访问 "parent" protected 成员？