tensorflow - 什么是具有强度 1 边缘矩阵的设备互连 StreamExecutor

我有四个 NVIDIA GTX 1080 显卡，当我初始化 session 时，我看到以下控制台输出:

Adding visible gpu devices: 0, 1, 2, 3
 Device interconnect StreamExecutor with strength 1 edge matrix:
      0 1 2 3 
 0:   N Y N N 
 1:   Y N N N 
 2:   N N N Y 
 3:   N N Y N

而且我还有 2 个 NVIDIA M60 Tesla 显卡，初始化看起来像:

Adding visible gpu devices: 0, 1, 2, 3
 Device interconnect StreamExecutor with strength 1 edge matrix:
      0 1 2 3 
 0:   N N N N 
 1:   N N N N 
 2:   N N N N 
 3:   N N N N

我注意到自从上次更新 1080 gpu 的 1.6 到 1.8 以来，这个输出对我来说发生了变化。它看起来像这样(无法准确记住，只是内存):

 Adding visible gpu devices: 0, 1, 2, 3
Device interconnect StreamExecutor with strength 1 edge matrix:
     0 1 2 3            0 1 2 3
0:   Y N N N         0: N N Y N
1:   N Y N N    or   1: N N N Y
2:   N N Y N         2: Y N N N
3:   N N N Y         3: N Y N N

我的问题是:

这是什么设备互连？

它对计算能力有什么影响？

为什么不同的GPU会有所不同？

由于硬件原因(故障，驱动程序不一致......)，它会随着时间的推移而改变吗？

最佳答案

TL;博士

what is this Device interconnect?

正如 Almog David 在评论中所说，这告诉您一个 GPU 是否可以直接访问另一个 GPU。

what influence it has on computation power?

唯一的效果是多 GPU 训练。如果两个 GPU 具有设备互连，则数据传输速度会更快。

why it differ for different GPUs?

这取决于硬件设置的拓扑。一 block 主板只有这么多通过同一总线连接的 PCI-e 插槽。 (使用 nvidia-smi topo -m 检查拓扑)

can it change over time due to hardware reasons (failures, drivers inconsistency...)?

我不认为顺序会随着时间的推移而改变，除非 NVIDIA 改变了默认的枚举方案。还有一点细节here

解释

此消息在 BaseGPUDeviceFactory::CreateDevices 中生成功能。它遍历每对设备以给定的顺序并调用 cuDeviceCanAccessPeer .正如 Almog David 在评论中所说，这只是表明您是否可以在设备之间执行 DMA。

您可以执行一个小测试来检查订单是否重要。考虑以下代码段:

#test.py
import tensorflow as tf

#allow growth to take up minimal resources
config = tf.ConfigProto()
config.gpu_options.allow_growth = True

sess = tf.Session(config=config)

现在让我们检查 CUDA_VISIBLE_DEVICES 中不同设备顺序的输出。

$ CUDA_VISIBLE_DEVICES=0,1,2,3 python3 test.py
...
2019-03-26 15:26:16.111423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3
2019-03-26 15:26:18.635894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-26 15:26:18.635965: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1 2 3 
2019-03-26 15:26:18.635974: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N Y N N 
2019-03-26 15:26:18.635982: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   Y N N N 
2019-03-26 15:26:18.635987: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2:   N N N Y 
2019-03-26 15:26:18.636010: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3:   N N Y N 
...

$ CUDA_VISIBLE_DEVICES=2,0,1,3 python3 test.py
...
2019-03-26 15:26:30.090493: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3
2019-03-26 15:26:32.758272: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-26 15:26:32.758349: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1 2 3 
2019-03-26 15:26:32.758358: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N N N Y 
2019-03-26 15:26:32.758364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   N N Y N 
2019-03-26 15:26:32.758389: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2:   N Y N N 
2019-03-26 15:26:32.758412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3:   Y N N N
...

您可以通过运行 nvidia-smi topo -m 获得更详细的连接说明。 .例如:

       GPU0      GPU1    GPU2   GPU3    CPU Affinity
GPU0     X       PHB    SYS     SYS     0-7,16-23
GPU1    PHB       X     SYS     SYS     0-7,16-23
GPU2    SYS      SYS     X      PHB     8-15,24-31
GPU3    SYS      SYS    PHB      X      8-15,24-31

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing a single PCIe switch
  NV#  = Connection traversing a bonded set of # NVLinks

我相信你在名单上的位置越低，转移的速度就越快。

关于tensorflow - 什么是具有强度 1 边缘矩阵的设备互连 StreamExecutor，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52192461/

tensorflow - 什么是具有强度 1 边缘矩阵的设备互连 StreamExecutor

上一篇：outlook-2007 - iCalendar 强制 'Accept/Decline' 按钮出现在 Outlook 的更新事件实例中？

下一篇：apache-spark - 如何列出 Spark shell 中定义的 RDD？