python - 在 C++ 中初始化非常大的 vector

我在 python 中创建了非常大的 O(10M) 浮点列表。我想在我的 C++ 项目中使用这个查找表。将此数组从 python 传输到 C++ 的最简单和最有效的方法是什么。

我的第一个想法是生成c++函数，负责初始化这么长的vector，然后编译它。 python 代码如上所示:

def generate_initLookupTable_function():
    numbers_per_row = 100
    function_body = """
#include "PatBBDTSeedClassifier.h"

std::vector<double> PatBBDTSeedClassifier::initLookupTable()
{
   std::vector<double> indicesVector ={
    """
    row_nb = 1
    for bin_value in classifier._lookup_table[:,0]:
        function_body += "\t" + str(bin_value) +" , "
        if (row_nb % numbers_per_row) == 0:
            function_body += "\n"
        row_nb += 1

    function_body += """\n };
return indicesVector;
}
    """
    return function_body

输出文件的大小为 500 MB。并且无法编译它(编译因 gcc 崩溃而终止):

../src/PatBBDTSeedClassifier_lookupTable.cpp
lcg-g++-4.9.3: internal compiler error: Killed (program cc1plus)

0x409edc execute
../../gcc-4.9.3/gcc/gcc.c:2854
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.

另一个想法是将 python 数组存储到二进制文件中，然后在 C++ 中读取它。但这很棘手。我无法正确阅读它。我使用这样简单的命令生成表格:

file = open("models/BBDT_lookuptable.dat", 'wb')
table = numpy.array(classifier._lookup_table[:,0])
table.tofile(file)
file.close()

你能告诉我我该怎么做吗？我用谷歌搜索，但找不到足够的答案。

你知道我该如何处理这么大的数组吗？

我应该给你更详细的问题描述。我使用 python 来训练 ML (sklearn) 分类器，然后我想用 C++ 部署它。 Doe to timing issue(执行速度是我研究的关键部分)我使用 bonsai boosted decision trees 的想法.在这种方法中，您将 BDT 传输到查找表中。

最佳答案

如果您使用的是 GNU 工具，直接使用 objcopy 会相当容易。实现 Jean-Francois 建议的目标；结合写二进制数组的PM2Ring的python脚本，可以执行:

objcopy -I binary test.data -B i386:x86-64 -O elf64-x86-64 testdata.o

(根据您的实际处理器架构，您可能需要进行调整)。该命令将创建一个名为 testdata.o 的新对象带有以下符号:

0000000000000100 D _binary_test_data_end
0000000000000100 A _binary_test_data_size
0000000000000000 D _binary_test_data_start

所有这些符号在链接程序中都将作为带有 C 链接的符号可见。 size本身不可用(它也会被转换为地址)，但是 *start和 *end可以使用。这是一个最小的 C++ 程序:

#include <iostream>

extern "C" double _binary_test_data_start[];
extern "C" double _binary_test_data_end[0];

int main(void) {
    double *d = _binary_test_data_start;
    const double *end = _binary_test_data_end;

    std::cout << (end - d) << " doubles in total" << std::endl;
    while (d < end) {
        std::cout << *d++ << std::endl;
    }
}

_binary_test_data_end实际上将刚好超过数组中的最后一个元素 _binary_test_data_start .

用g++ test.cc testdata.o -o program编译+链接这个程序(使用上面 objcopy 中的 testdata.o)。

输出(cout 默认情况下似乎笨拙地截断小数):

% ./a.out 
32 doubles in total
0
0.0625
0.125
0.1875
0.25
0.3125
0.375
0.4375
0.5
0.5625
0.625
0.6875
0.75
0.8125
0.875
0.9375
1
1.0625
1.125
1.1875
1.25
1.3125
1.375
1.4375
1.5
1.5625
1.625
1.6875
1.75
1.8125
1.875
1.9375

您也可以很容易地将这些值分配到一个 vector 中； std::vector<double>接受 2 个迭代器，其中第一个指向第一个元素，第二个指向后面的一个；您可以在此处使用数组，因为它们会衰减为指针，并且指针可以用作迭代器:

std::vector<double> vec(_binary_test_data_start, _binary_test_data_end);

然而，对于大数组，这只是不必要的复制。此外，仅使用 C 数组还有一个额外的好处，即它是延迟加载； ELF 可执行文件不会读入内存，但会根据需要分页；二进制数组仅在访问时从文件加载到 RAM。

关于python - 在 C++ 中初始化非常大的 vector ，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39529799/

python - 在 C++ 中初始化非常大的 vector

上一篇：c++ - 比较字符串时出现意外结果

下一篇：c++ - 将模板类本身定义为一个类型