我有一些分配大量内存的程序,我希望通过在线程上拆分任务来提高它的速度,但这只会让我的程序变慢。
我做了这个最小的例子,它与我的真实代码无关,除了它在不同的线程中分配内存。
class ThreadStartInfo
{
public:
unsigned char *arr_of_5m_elems;
bool TaskDoneFlag;
ThreadStartInfo()
{
this->TaskDoneFlag = false;
this->arr_of_5m_elems = NULL;
}
~ThreadStartInfo()
{
if (this->arr_of_5m_elems)
free(this->arr_of_5m_elems);
}
};
unsigned long __stdcall CalcSomething(void *tsi_ptr)
{
ThreadStartInfo *tsi = (ThreadStartInfo*)tsi_ptr;
for (int i = 0; i < 5000000; i++)
{
double *test_ptr = (double*)malloc(tsi->arr_of_5m_elems[i] * sizeof(double));
memset(test_ptr, 0, tsi->arr_of_5m_elems[i] * sizeof(double));
free(test_ptr);
}
tsi->TaskDoneFlag = true;
return 0;
}
void main()
{
ThreadStartInfo *tsi1 = new ThreadStartInfo();
tsi1->arr_of_5m_elems = (unsigned char*)malloc(5000000 * sizeof(unsigned char));
ThreadStartInfo *tsi2 = new ThreadStartInfo();
tsi2->arr_of_5m_elems = (unsigned char*)malloc(5000000 * sizeof(unsigned char));
ThreadStartInfo **tsi_arr = (ThreadStartInfo**)malloc(2 * sizeof(ThreadStartInfo*));
tsi_arr[0] = tsi1;
tsi_arr[1] = tsi2;
time_t start_dt = time(NULL);
CalcSomething(tsi1);
CalcSomething(tsi2);
printf("Task done in %i seconds.\n", time(NULL) - start_dt);
//--
tsi1->TaskDoneFlag = false;
tsi2->TaskDoneFlag = false;
//--
start_dt = time(NULL);
unsigned long th1_id = 0;
void *th1h = CreateThread(NULL, 0, CalcSomething, tsi1, 0, &th1_id);
unsigned long th2_id = 0;
void *th2h = CreateThread(NULL, 0, CalcSomething, tsi2, 0, &th2_id);
retry:
for (int i = 0; i < 2; i++)
if (!tsi_arr[i]->TaskDoneFlag)
{
Sleep(100);
goto retry;
}
CloseHandle(th1h);
CloseHandle(th2h);
printf("MT Task done in %i seconds.\n", time(NULL) - start_dt);
}
它打印出这样的结果:
Task done in 16 seconds.
MT Task done in 19 seconds.
而且......我没想到会慢下来。有没有办法让多线程中的内存分配更快?
最佳答案
除了由于在 TaskDoneFlag
上缺乏同步而导致的一些未定义行为之外,所有线程都在调用 malloc
/free
反复。
Visual C++ CRT 堆是 单线程 1,如malloc
/free
委托(delegate)给 HeapAlloc
/HeapFree
它在关键部分执行(一次只有一个线程)。一次从多个线程调用它们永远不会比单个线程快,并且由于锁争用开销而通常更慢。
要么减少线程中的分配,要么切换到另一个内存分配器,如 jemalloc 或 tcmalloc。
1 见此注 HeapAlloc
:
Serialization ensures mutual exclusion when two or more threads attempt to simultaneously allocate or free blocks from the same heap. There is a small performance cost to serialization, but it must be used whenever multiple threads allocate and free memory from the same heap. Setting the
HEAP_NO_SERIALIZE
value eliminates mutual exclusion on the heap. Without serialization, two or more threads that use the same heap handle might attempt to allocate or free memory simultaneously, likely causing corruption in the heap.
关于c++ - 在 C++ (VC++ 2010 Express) 上,双线程应用程序比单线程应用程序慢。怎么解决?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59465580/