java - 在许多 CPU 上扩展处理时的奇怪行为

标签 java performance scalability

我正在研究性能,同时在多个 CPU 上扩展 Java 代码。为此,我编写了一个简单的程序,在一个线程上运行 50000 个斐波那契,然后在两个线程上运行 2*50000,在三个线程上运行 3*50000 等等,直到达到目标主机的 CPU 数量。

这是我的代码:

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class MultiThreadScalability {

    static final int MAX_THREADS = 4;
    static final int NB_RUN_PER_THREAD = 50000;
    static final int FIBO_VALUE = 25;

    public static void main(String[] args) {
        MultiThreadScalability multiThreadScalability = new MultiThreadScalability();
        multiThreadScalability.runTest();
    }


    private void runTest() {
        int availableProcs = Runtime.getRuntime().availableProcessors();
        System.out.println(availableProcs + " processors available");

        for (int i = 1 ; i <= availableProcs ; i++) {
            System.out.println("Running scalability test for " + i + " threads");
            long timeInMillisecs = runTestForThreads(i);
            System.out.println("=> " + timeInMillisecs + " milli-seconds");
        }
    }


    private long runTestForThreads(int threadsNumber) {
        final int nbRun = NB_RUN_PER_THREAD * threadsNumber;
        ExecutorService executor = Executors.newFixedThreadPool(threadsNumber);

        long startTime = System.currentTimeMillis();

        for (int i = 0 ; i < nbRun ; i++) {
            Runnable worker = new Runnable()
            {
                public void run()
                {
                    fibo(FIBO_VALUE);
                }
            };

            executor.execute(worker);
        }

        executor.shutdown();

        while (!executor.isTerminated())
        {}

        return (System.currentTimeMillis() - startTime);
    }


    private static long fibo(int n) {
        if (n < 2) {
            return (n);
        }

        return (fibo(n - 1) + fibo(n - 2));
    }

}

在给定的条件下,我预计 - 与线程数无关 - 执行时间保持不变。

我在一台功能强大的机器上运行它,得到了以下输出:

48 processors available
Running scalability test for 1 threads
=> 34199 milli-seconds
Running scalability test for 2 threads
=> 34141 milli-seconds
Running scalability test for 3 threads
=> 34009 milli-seconds
Running scalability test for 4 threads
=> 34000 milli-seconds
Running scalability test for 5 threads
=> 34034 milli-seconds
Running scalability test for 6 threads
=> 34086 milli-seconds
Running scalability test for 7 threads
=> 34094 milli-seconds
Running scalability test for 8 threads
=> 34673 milli-seconds
Running scalability test for 9 threads
=> 35297 milli-seconds
Running scalability test for 10 threads
=> 35486 milli-seconds
Running scalability test for 11 threads
=> 35913 milli-seconds
Running scalability test for 12 threads
=> 36324 milli-seconds
Running scalability test for 13 threads
=> 35722 milli-seconds
Running scalability test for 14 threads
=> 35750 milli-seconds
Running scalability test for 15 threads
=> 35634 milli-seconds
Running scalability test for 16 threads
=> 35970 milli-seconds
Running scalability test for 17 threads
=> 37914 milli-seconds
Running scalability test for 18 threads
=> 36560 milli-seconds
Running scalability test for 19 threads
=> 36720 milli-seconds
Running scalability test for 20 threads
=> 37028 milli-seconds
Running scalability test for 21 threads
=> 37381 milli-seconds
Running scalability test for 22 threads
=> 37529 milli-seconds
Running scalability test for 23 threads
=> 37632 milli-seconds
Running scalability test for 24 threads
=> 39942 milli-seconds
Running scalability test for 25 threads
=> 40090 milli-seconds
Running scalability test for 26 threads
=> 41238 milli-seconds
Running scalability test for 27 threads
=> 42336 milli-seconds
Running scalability test for 28 threads
=> 43377 milli-seconds
Running scalability test for 29 threads
=> 44394 milli-seconds
Running scalability test for 30 threads
=> 46245 milli-seconds
Running scalability test for 31 threads
=> 45928 milli-seconds
Running scalability test for 32 threads
=> 47490 milli-seconds
Running scalability test for 33 threads
=> 47674 milli-seconds
Running scalability test for 34 threads
=> 48775 milli-seconds
Running scalability test for 35 threads
=> 56456 milli-seconds
Running scalability test for 36 threads
=> 50557 milli-seconds
Running scalability test for 37 threads
=> 51393 milli-seconds
Running scalability test for 38 threads
=> 52971 milli-seconds
Running scalability test for 39 threads
=> 53077 milli-seconds
Running scalability test for 40 threads
=> 54015 milli-seconds
Running scalability test for 41 threads
=> 55924 milli-seconds
Running scalability test for 42 threads
=> 55560 milli-seconds
Running scalability test for 43 threads
=> 56554 milli-seconds
Running scalability test for 44 threads
=> 57073 milli-seconds
Running scalability test for 45 threads
=> 65193 milli-seconds
Running scalability test for 46 threads
=> 58549 milli-seconds
Running scalability test for 47 threads
=> 59302 milli-seconds
Running scalability test for 48 threads
=> 60662 milli-seconds

在 24 个线程之前,时间保持几乎相同。它变得越来越慢 You can see it on this graph

我寻求帮助是为了理解为什么会发生这样的“中断”

最后但同样重要的是,我运行测试的主机的 CPU 配置如下:

$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 46
model name      : Intel(R) Xeon(R) CPU           E7540  @ 2.00GHz
stepping        : 6
cpu MHz         : 1997.885
cache size      : 18432 KB
physical id     : 0
siblings        : 12
core id         : 0
cpu cores       : 6
apicid          : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat p
se36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc id
a nonstop_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lah
f_lm
bogomips        : 3995.77
clflush size    : 64
cache_alignment : 64
address sizes   : 44 bits physical, 48 bits virtual
power management: [8]

在这里,我看到实际的核心数只有 6 个。Runtime.getRuntime().availableProcessors() 返回的不是物理 CPU 的数量,而是“超线程”的数量:48

您认为它可以解释我在 24 个线程中观察到的“中断”吗?

最佳答案

在我看来,您的机器好像有 4 个 Intel E7540 CPU,每个 CPU 有 6 个内核和 12 个线程,总共有 24 个内核和 48 个线程。所以它可以同时执行24条指令。

48 个线程指的是超线程功能,该功能旨在充分利用线程必须获取内存才能继续运行时发生的微暂停。由于您的测试不访问最内层循环中的任何新内存,因此您受到 24 个内核的限制。

是的,核心数与线程数可以解释这一点。

关于java - 在许多 CPU 上扩展处理时的奇怪行为,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35837650/

相关文章:

java - 用点符号命名的 JPA 实体

java - 使用 Thymeleaf 在脚本标签中发送安全参数

java - 特定文件处理需求的数据结构选择 - java

synchronization - 具有共享数据可扩展性的 CouchDB db-per-user

testing - 是否有明确的软件可伸缩性测试模式?

java - 如何将一个字符压入字符串堆栈?

java - 如何在 Spring boot Crud 存储库中编写条件自定义查询

performance - CHARINDEX 与 LIKE 搜索给出了非常不同的性能,为什么?

performance - PDF文档数据表示很慢...如何显示进度?

c# - 在 WPF 应用程序 C# 中设置所有矩形的背景