我有一个简单的 C++ 函数,我尝试用 pybind11 包装它( KMAC library 中的 ehvi3d_sliceupdate
函数)。它处于循环深处,在我的 Python 模块中被调用数十万到一百万次。不幸的是,它似乎正在泄漏内存(约 700k 调用后 12+GB),我不确定原因可能是什么。我编译的包装器如下所示:
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
#include <iostream>
#include "helper.h"
#include "ehvi_calculations.h"
#include "ehvi_sliceupdate.h"
namespace py = pybind11;
// Copied from main.cc
//Checks if p dominates P. Removes points dominated by p from P and return the number of points removed.
int checkdominance(deque<individual*> & P, individual* p){
int nr = 0;
for (int i=P.size()-1;i>=0;i--){
if (p->f[0] >= P[i]->f[0] && p->f[1] >= P[i]->f[1] && p->f[2] >= P[i]->f[2]){
cerr << "Individual " << (i+1) << " is dominated or the same as another point; removing." << endl;
P.erase(P.begin()+i);
nr++;
}
}
return nr;
}
// Wrap the ehvi3d_sliceupdate function - not sure how to pass straight in
double wrap_ehvi3d_sliceupdate(py::array_t<double> y_par, py::array_t<double> ref_point, py::array_t<double> mean_vector, py::array_t<double> std_dev) {
deque<individual*> nd_samples;
// Get y_par and feed by individual via numpy direct access
// https://pybind11.readthedocs.io/en/stable/advanced/pycpp/numpy.html
auto yp = y_par.unchecked<2>(); // y_par must have ndim = 2
for (py::ssize_t i = 0; i < yp.shape(0); i++) {
individual * tempvidual = new individual;
tempvidual->f[0] = yp(i, 0);
tempvidual->f[1] = yp(i, 1);
tempvidual->f[2] = yp(i, 2);
// cerr << i << ": " << yp(i, 0) << " " << yp(i, 1) << " " << yp(i, 2) << endl;
checkdominance(nd_samples, tempvidual);
nd_samples.push_back(tempvidual);
}
// Marshall ref_point, mean_vector, and std_dev into an array
// (might be better ways to do this..)
auto rp = ref_point.unchecked<1>(); // ref_point must have ndim = 1, len 3
double r [] = {rp(0), rp(1), rp(2)};
auto mv = mean_vector.unchecked<1>(); // mean_vector must have ndim = 1, len 3
double mu [] = {mv(0), mv(1), mv(2)};
auto sd = std_dev.unchecked<1>(); // std_dev must have ndim = 1, len 3
double s [] = {sd(0), sd(1), sd(2)};
double hvi = ehvi3d_sliceupdate(nd_samples, r, mu, s);
return hvi;
}
PYBIND11_MODULE(kmac, m) {
// module docstring
m.doc() = "EHVI using KMAC";
// definie EHVI slice update function
m.def("ehvi3d_sliceupdate", &wrap_ehvi3d_sliceupdate, "O(n^3) slice-update scheme for calculating the EHVI.");
}
可能有一种更简单的方法来解决这个问题,因为我只是将从 pybind11 文档和此处找到的一些内容拼凑在一起。我对 C++ 不太熟悉,所以我可能在使用我有限的知识创建数组或传递指针时犯了一些其他令人发指的编码错误。我是否创建了每次都需要清理的东西?起初,我认为我可能需要包含我的 numpy 数组,如 this 中所示。和 this上一篇文章,但我只返回一个 double,所以在 python 端没有需要处理的 numpy 数组。
编辑
我尝试更改 tempvidual
block 以使用堆内存(我相信它被调用了?),因为我读到它会自行清理,方法是:
individual tempvidual;
tempvidual.f[0] = yp(i, 0);
tempvidual.f[1] = yp(i, 1);
tempvidual.f[2] = yp(i, 2);
checkdominance(nd_samples, &tempvidual);
nd_samples.push_back(&tempvidual);
在最后返回hvi
之前,我尝试添加nd_samples.clear();
以在返回python之前清除deque
,但每次调用包装器时我仍然会增加内存。还有什么需要清理的吗?
编辑2
事实证明,部分问题在于库本身,根据 valgrind
,每次调用都会泄漏大约 4kb 左右。感谢(并大声疾呼)@ajum 在 pybind11 gitter 上的惊人帮助他实际上引导我重构了大部分代码,以使用 shared_ptr
和 make_shared
而不是原始指针来修复库中的所有泄漏。这还需要对包装器进行小幅更新,请参见下文。不幸的是,即使使用无泄漏(我认为)库和更新的包装器,我也会收到以下报告:
==1932812== LEAK SUMMARY:
==1932812== definitely lost: 676 bytes in 1 blocks
==1932812== indirectly lost: 0 bytes in 0 blocks
==1932812== possibly lost: 145,291 bytes in 80 blocks
==1932812== still reachable: 1,725,888 bytes in 1,013 blocks
比以前少了,但我不知道是什么原因造成的。
包装器的编辑部分:
// Copied from main.cc
//Checks if p dominates P. Removes points dominated by p from P and return the number of points removed.
int checkdominance(deque<shared_ptr<individual>> & P, shared_ptr<individual> p){
int nr = 0;
for (int i=P.size()-1;i>=0;i--){
if (p->f[0] >= P[i]->f[0] && p->f[1] >= P[i]->f[1] && p->f[2] >= P[i]->f[2]){
cerr << "Individual " << (i+1) << " is dominated or the same as another point; removing." << endl;
P.erase(P.begin()+i);
nr++;
}
}
return nr;
}
// Wrap the ehvi3d_sliceupdate function - not sure how to pass straight in
double wrap_ehvi3d_sliceupdate(py::array_t<double> y_par, py::array_t<double> ref_point, py::array_t<double> mean_vector, py::array_t<double> std_dev) {
// deque<individual*> nd_samples;
deque<shared_ptr<individual>> nd_samples;
// Get y_par and feed by individual via numpy direct access
// https://pybind11.readthedocs.io/en/stable/advanced/pycpp/numpy.html
auto yp = y_par.unchecked<2>(); // y_par must have ndim = 2
for (py::ssize_t i = 0; i < yp.shape(0); i++) {
auto tempvidual = make_shared<individual>();
// individual * tempvidual = new individual;
tempvidual->f[0] = yp(i, 0);
tempvidual->f[1] = yp(i, 1);
tempvidual->f[2] = yp(i, 2);
// cerr << i << ": " << yp(i, 0) << " " << yp(i, 1) << " " << yp(i, 2) << endl;
// cerr << i << ": " << tempvidual->f[0] << " " << tempvidual->f[1] << " " << tempvidual->f[2] << endl;
checkdominance(nd_samples, tempvidual);
nd_samples.push_back(tempvidual);
}
// Marshall ref_point, mean_vector, and std_dev into an array
// (might be better ways to do this..)
auto rp = ref_point.unchecked<1>(); // ref_point must have ndim = 1, len 3
double r [] = {rp(0), rp(1), rp(2)};
auto mv = mean_vector.unchecked<1>(); // mean_vector must have ndim = 1, len 3
double mu [] = {mv(0), mv(1), mv(2)};
auto sd = std_dev.unchecked<1>(); // std_dev must have ndim = 1, len 3
double s [] = {sd(0), sd(1), sd(2)};
double hvi = ehvi3d_sliceupdate(nd_samples, r, mu, s);
return hvi;
}
在 python 测试脚本上运行 valgrind
的输出中,我无法确定问题所在。包含 definitelylosed
block 的输出摘录如下所示:
==1932812== 676 bytes in 1 blocks are definitely lost in loss record 212 of 485
==1932812== at 0x4C30F0B: malloc (vg_replace_malloc.c:307)
==1932812== by 0x2D595F: _PyMem_RawWcsdup (obmalloc.c:592)
==1932812== by 0x166786: _PyCoreConfig_Copy.cold (main.c:2535)
==1932812== by 0x34C4C7: _Py_InitializeCore (pylifecycle.c:850)
==1932812== by 0x34CCB3: pymain_init (main.c:3041)
==1932812== by 0x3503EB: pymain_main (main.c:3063)
==1932812== by 0x35085B: _Py_UnixMain (main.c:3103)
==1932812== by 0x5A137B2: (below main) (in /usr/lib64/libc-2.28.so)
==1932812==
==1932812== 688 bytes in 1 blocks are possibly lost in loss record 214 of 485
==1932812== at 0x4C33419: realloc (vg_replace_malloc.c:834)
==1932812== by 0x21E8F8: _PyObject_GC_Resize (gcmodule.c:1758)
==1932812== by 0x2345DA: UnknownInlinedFun (frameobject.c:726)
==1932812== by 0x2345DA: UnknownInlinedFun (call.c:272)
==1932812== by 0x2345DA: _PyFunction_FastCallKeywords (call.c:408)
==1932812== by 0x2979C7: call_function (ceval.c:4616)
==1932812== by 0x2BE4AB: _PyEval_EvalFrameDefault (ceval.c:3124)
==1932812== by 0x233E93: UnknownInlinedFun (ceval.c:547)
==1932812== by 0x233E93: UnknownInlinedFun (call.c:283)
==1932812== by 0x233E93: _PyFunction_FastCallKeywords (call.c:408)
==1932812== by 0x2979C7: call_function (ceval.c:4616)
==1932812== by 0x2BE4AB: _PyEval_EvalFrameDefault (ceval.c:3124)
==1932812== by 0x233E93: UnknownInlinedFun (ceval.c:547)
==1932812== by 0x233E93: UnknownInlinedFun (call.c:283)
==1932812== by 0x233E93: _PyFunction_FastCallKeywords (call.c:408)
==1932812== by 0x2979C7: call_function (ceval.c:4616)
==1932812== by 0x2BE4AB: _PyEval_EvalFrameDefault (ceval.c:3124)
==1932812== by 0x233E93: UnknownInlinedFun (ceval.c:547)
==1932812== by 0x233E93: UnknownInlinedFun (call.c:283)
==1932812== by 0x233E93: _PyFunction_FastCallKeywords (call.c:408)
==1932812==
==1932812== 1,056 bytes in 2 blocks are possibly lost in loss record 350 of 485
==1932812== at 0x4C30F0B: malloc (vg_replace_malloc.c:307)
==1932812== by 0x221130: UnknownInlinedFun (obmalloc.c:520)
==1932812== by 0x221130: UnknownInlinedFun (obmalloc.c:1584)
==1932812== by 0x221130: UnknownInlinedFun (obmalloc.c:1576)
==1932812== by 0x221130: UnknownInlinedFun (obmalloc.c:633)
==1932812== by 0x221130: UnknownInlinedFun (gcmodule.c:1693)
==1932812== by 0x221130: UnknownInlinedFun (gcmodule.c:1715)
==1932812== by 0x221130: _PyObject_GC_NewVar (gcmodule.c:1744)
==1932812== by 0x2344F2: UnknownInlinedFun (frameobject.c:713)
==1932812== by 0x2344F2: UnknownInlinedFun (call.c:272)
==1932812== by 0x2344F2: _PyFunction_FastCallKeywords (call.c:408)
==1932812== by 0x2979C7: call_function (ceval.c:4616)
==1932812== by 0x2BE4AB: _PyEval_EvalFrameDefault (ceval.c:3124)
==1932812== by 0x206EAC: UnknownInlinedFun (ceval.c:547)
==1932812== by 0x206EAC: UnknownInlinedFun (call.c:283)
==1932812== by 0x206EAC: _PyFunction_FastCallDict (call.c:322)
==1932812== by 0x20F1BA: UnknownInlinedFun (call.c:98)
==1932812== by 0x20F1BA: object_vacall (call.c:1200)
==1932812== by 0x28E2E6: _PyObject_CallMethodIdObjArgs (call.c:1250)
==1932812== by 0x1FC4A6: UnknownInlinedFun (import.c:1652)
==1932812== by 0x1FC4A6: PyImport_ImportModuleLevelObject (import.c:1764)
==1932812== by 0x2C069F: UnknownInlinedFun (ceval.c:4770)
==1932812== by 0x2C069F: _PyEval_EvalFrameDefault (ceval.c:2600)
==1932812== by 0x205AF1: UnknownInlinedFun (ceval.c:547)
==1932812== by 0x205AF1: _PyEval_EvalCodeWithName (ceval.c:3930)
==1932812== by 0x206D08: PyEval_EvalCodeEx (ceval.c:3959)
这是由于 pybind11
本身还是我调用它的方式造成的?
附注不确定添加编辑或用(长)更新替换原始问题是否是SO风格。谢谢!
最佳答案
事实证明,删除了 new
的使用在可能的情况下(并在没有的情况下添加 delete
)加上用 make_shared
替换所有原始指针和shared_ptr
在基础库和包装器中实际上解决了这个问题。一旦变量超出范围,使用这些而不是原始指针似乎会自动释放内存(知识渊博的 C++ 用户可以在评论中纠正我。)
对于 C++ 编码人员来说,这可能是基本/显而易见的,但对于非 C++ 用户/初学者(以及我的记录,如果我忘记了),修复方法是:
//Change declarations like these:
// vector<mus*> pdf;
vector<shared_ptr<mus>> pdf;
// mus * tempmus = new mus;
auto tempmus = make_shared<mus>();
// newind = new specialind;
auto newind = make_shared<specialind>();
// deque<specialind*> Px, Py, Pz;
deque<shared_ptr<specialind>> Px, Py, Pz;
// Replace function signatures and headers like this
// int checkdominance(deque<individual*> & P, individual* p);
int checkdominance(deque<shared_ptr<individual>> & P, shared_ptr<individual> p);
// Parts of structs like this
struct specialind{
// individual *point;
std::shared_ptr<individual> point;
};
// Couldn't figure out how to change this one to remove new as it was needed in a later scope...
// Added delete at the end after it looked like it wasn't needed
Pstruct = new thingy[n*n];
// ...
delete [] Pstruct; // Addded this at the end.
在这样做时,我最初遇到了很多段错误。我可以通过使用此 SO post 来追踪导致它们的线路。 .
虽然打电话valgrind --leak-check=full --track-origins=yes python test.py
导致EDIT 2泄漏消息,其中 test.py
只是一个简单的循环(加上输入 numpy
ndarrays):
while True:
hvi = kmac.ehvi3d_sliceupdate(dat, ref_point, mean_vector, std_dev)
--实际上看起来内存消耗很稳定并且不再增长。 (我不确定为什么有来自 valgrind
的虚假消息,但它们似乎在运行过程中没有明显影响内存。)现在我可以运行 python test.py
几分钟后,它稳定在 15 MB 左右。
感谢 pybind11
的人们以及 Adam Thompson 带领我了解了基础知识。
关于python - pybind11 - 识别并删除 C++ 包装器中的内存泄漏,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67110549/