c++ - 在 C++ (VC++ 2010 Express) 上,双线程应用程序比单线程应用程序慢。怎么解决?

标签 c++ windows multithreading performance winapi

我有一些分配大量内存的程序,我希望通过在线程上拆分任务来提高它的速度,但这只会让我的程序变慢。

我做了这个最小的例子,它与我的真实代码无关,除了它在不同的线程中分配内存。

class ThreadStartInfo
{
public:
    unsigned char *arr_of_5m_elems;
    bool TaskDoneFlag;

    ThreadStartInfo()
    {
        this->TaskDoneFlag = false;
        this->arr_of_5m_elems = NULL;
    }

    ~ThreadStartInfo()
    {
        if (this->arr_of_5m_elems)
            free(this->arr_of_5m_elems);
    }
};

unsigned long __stdcall CalcSomething(void *tsi_ptr)
{
    ThreadStartInfo *tsi = (ThreadStartInfo*)tsi_ptr;

    for (int i = 0; i < 5000000; i++)
    {
        double *test_ptr = (double*)malloc(tsi->arr_of_5m_elems[i] * sizeof(double));
        memset(test_ptr, 0, tsi->arr_of_5m_elems[i] * sizeof(double));
        free(test_ptr);
    }

    tsi->TaskDoneFlag = true;
    return 0;
}

void main()
{
    ThreadStartInfo *tsi1 = new ThreadStartInfo();
    tsi1->arr_of_5m_elems = (unsigned char*)malloc(5000000 * sizeof(unsigned char));
    ThreadStartInfo *tsi2 = new ThreadStartInfo();
    tsi2->arr_of_5m_elems = (unsigned char*)malloc(5000000 * sizeof(unsigned char));
    ThreadStartInfo **tsi_arr = (ThreadStartInfo**)malloc(2 * sizeof(ThreadStartInfo*));
    tsi_arr[0] = tsi1;
    tsi_arr[1] = tsi2;

    time_t start_dt = time(NULL);
    CalcSomething(tsi1);
    CalcSomething(tsi2);
    printf("Task done in %i seconds.\n", time(NULL) - start_dt);
    //--

    tsi1->TaskDoneFlag = false;
    tsi2->TaskDoneFlag = false;
    //--

    start_dt = time(NULL);
    unsigned long th1_id = 0;
    void *th1h = CreateThread(NULL, 0, CalcSomething, tsi1, 0, &th1_id);
    unsigned long th2_id = 0;
    void *th2h = CreateThread(NULL, 0, CalcSomething, tsi2, 0, &th2_id);

    retry:
    for (int i = 0; i < 2; i++)
        if (!tsi_arr[i]->TaskDoneFlag)
        {
            Sleep(100);
            goto retry;
        }

    CloseHandle(th1h);
    CloseHandle(th2h);

    printf("MT Task done in %i seconds.\n", time(NULL) - start_dt);
}

它打印出这样的结果:
Task done in 16 seconds.
MT Task done in 19 seconds.

而且......我没想到会慢下来。有没有办法让多线程中的内存分配更快?

最佳答案

除了由于在 TaskDoneFlag 上缺乏同步而导致的一些未定义行为之外,所有线程都在调用 malloc/free反复。

Visual C++ CRT 堆是 单线程 1,如malloc/free委托(delegate)给 HeapAlloc /HeapFree它在关键部分执行(一次只有一个线程)。一次从多个线程调用它们永远不会比单个线程快,并且由于锁争用开销而通常更慢。

要么减少线程中的分配,要么切换到另一个内存分配器,如 jemalloc 或 tcmalloc。

1 见此注 HeapAlloc :

Serialization ensures mutual exclusion when two or more threads attempt to simultaneously allocate or free blocks from the same heap. There is a small performance cost to serialization, but it must be used whenever multiple threads allocate and free memory from the same heap. Setting the HEAP_NO_SERIALIZE value eliminates mutual exclusion on the heap. Without serialization, two or more threads that use the same heap handle might attempt to allocate or free memory simultaneously, likely causing corruption in the heap.

关于c++ - 在 C++ (VC++ 2010 Express) 上,双线程应用程序比单线程应用程序慢。怎么解决?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59465580/

相关文章:

c++ - 以相反的顺序将一个 vector 复制到另一个 vector

c++ - 动态 vector 的预期内容

c# - 如何使用windows应用程序在第三方网站上填写并提交网页表单?

windows - 是否有适用于 Windows 的 Glade3 安装程序?

c++ - 鼠标抖动/消息处理循环

c++ - boost::algorithm - 拆分字符串返回一个额外的标记

c++ - 如何编写出现在 Intellisense 中的 C++ 注释?

c++ - Asm CALL 指令 - 它是如何工作的?

java - 这段Java代码中的 `Thread`是什么?

java - 多线程Java服务器: allowing one thread to access another one