c - 多CPU作业的用户时间增加

我正在运行以下代码:

当我用 1 个子进程运行此代码时:我得到以下时间信息:

(我使用/usr/bin/time ./job 1 运行)

5.489u 0.090s 0:05.58 99.8%(1 个作业运行)

当我运行 6 个子进程时:我得到关注

74.731u 0.692s 0:12.59 599.0%(6 个作业并行运行)

我正在运行实验的机器有 6 个内核，198 GB RAM，并且没有其他任何东西在该机器上运行。

如果 6 个作业并行运行，我预计用户时间报告是 6 倍。但远不止于此(13.6 倍)。我的问题是用户时间的增加从何而来？是因为在并行运行 6 个作业的情况下，多个内核更频繁地从一个内存位置跳到另一个内存位置吗？或者我还缺少其他东西。

谢谢

#define MAX_SIZE 7000000
#define LOOP_COUNTER 100

#define simple_struct struct _simple_struct
simple_struct {
    int n;
    simple_struct *next;
};

#define ALLOCATION_SPLIT 5
#define CHAIN_LENGTH 1
void do_function3(void)
{
    int i = 0, j = 0, k = 0, l = 0;
    simple_struct **big_array = NULL;
    simple_struct *temp = NULL;

    big_array = calloc(MAX_SIZE + 1, sizeof(simple_struct*));


    for(k = 0; k < ALLOCATION_SPLIT; k ++) {
        for(i =k ; i < MAX_SIZE; i +=ALLOCATION_SPLIT) {
            big_array[i] = calloc(1, sizeof(simple_struct));
            if((CHAIN_LENGTH-1)) {
                for(l = 1; l < CHAIN_LENGTH; l++) {
                    temp = calloc(1, sizeof(simple_struct));
                    temp->next = big_array[i];
                    big_array[i] = temp;
                }
            }
        }
    }

    for (j = 0; j < LOOP_COUNTER; j++) {
        for(i=0 ; i < MAX_SIZE; i++) {
            if(big_array[i] == NULL) {
                big_array[i] = calloc(1, sizeof(simple_struct));
            }
            big_array[i]->n = i * 13;
            temp = big_array[i]->next;
            while(temp) {
                temp->n = i*13;
                temp = temp->next;
            }
        }
    }
}

int main(int argc, char **argv)
{
    int i, no_of_processes = 0;
    pid_t pid, wpid;
    int child_done = 0;
    int status;
    if(argc != 2) {
        printf("usage: this_binary number_of_processes");
        return 0;
    }

    no_of_processes = atoi(argv[1]);

    for(i = 0; i < no_of_processes; i ++) {
        pid = fork();

        switch(pid) {
            case -1:
                printf("error forking");
                exit(-1);
            case 0:
                do_function3();
                return 0;
            default:
                printf("\nchild %d launched with pid %d\n", i, pid);
                break;
        }
    }

    while(child_done != no_of_processes) {
        wpid = wait(&status);
        child_done++;
        printf("\nchild done with pid %d\n", wpid);
    }

    return 0;
}

最佳答案

首先，您的基准测试有点不寻常。通常，在对并发应用程序进行基准测试时，人们会比较两个实现:

解决大小为 S 的问题的单线程版本；
具有N个线程的多线程版本，协同解决大小S的问题；在你的例子中，每个人都解决了一个 S/N 大小的问题。

然后将执行时间除以得到 speedup .

如果你的加速是:

1左右:并行实现与单线程实现性能相似；
高于 1(通常在 1 和 N 之间)，并行化应用程序可提高性能；
低于 1:并行化应用会影响性能。

对性能的影响取决于多种因素:

您的算法的并行化程度如何。参见 Amdahl's law .不适用于此处。
线程间通信的开销。不适用于此处。
线程间同步的开销。不适用于此处。
争夺 CPU 资源。不应在此处应用(因为线程数等于内核数)。但是，超线程可能会造成伤害。
争夺内存缓存。由于线程不共享内存，这会降低性能。
争用主内存。这会降低性能。

您可以使用 profiler 测量最后 2 个.查找缓存未命中和停滞的指令。

关于c - 多CPU作业的用户时间增加，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35852127/

c - 多CPU作业的用户时间增加

上一篇：c - 在 GtkLabel 中插入图像？

下一篇：c - recvfrom 函数被阻止