python - 为什么在 CUDA 内核中没有断点的情况下执行相同的程序时,cuda-gdb 比 gdb 慢很多?

标签 python tensorflow cuda gdb cuda-gdb

我在使用 cuda-gdb 时遇到问题。我的程序从 python 启动,它加载一个包含 tensorflow 和 cuda 代码的共享库。我用来启动 cuda-gdb 的命令是 cuda-gdb --args python test_cr_bbp_tf2.py 。在cuda-gdb中输入run后,我等待了大约10分钟才执行完成,并且随着新线程信息的出现,程序挂起很长时间。这种长时间的挂起使得 cuda-gdb 在调试我的程序时毫无用处。使用 gdb --args python test_cr_bbp_tf2.py 的另一次执行仅花费了 10 秒。 cuda-gdb 的日志如下所示(cuda-gdb 和 gdb 显示相同的日志):

Starting program: /usr/bin/python test_cr_bbp_tf2.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".
2022-05-04 19:44:17.020132: I tensorflow/stream_executor/platform/default/dso_loader.cc:54] Successfully opened dynamic library libcudart.so.11.0
[Detaching after fork from child process 61729]
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
testmatching
2022-05-04 19:44:18.724780: I tensorflow/stream_executor/platform/default/dso_loader.cc:54] Successfully opened dynamic library libcudart.so.11.0
2022-05-04 19:44:19.270957: I tensorflow/stream_executor/platform/default/dso_loader.cc:54] Successfully opened dynamic library libcuda.so.1
[Detaching after fork from child process 61762]
[New Thread 0x7fff83ec3700 (LWP 61868)]
2022-05-04 19:44:23.413363: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA A10 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 72 deviceMemorySize: 22.20GiB deviceMemoryBandwidth: 558.88GiB/s
2022-05-04 19:44:23.414389: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 1 with properties: 
pciBusID: 0000:41:00.0 name: NVIDIA A10 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 72 deviceMemorySize: 22.20GiB deviceMemoryBandwidth: 558.88GiB/s
2022-05-04 19:44:23.415424: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 2 with properties: 
pciBusID: 0000:61:00.0 name: NVIDIA A10 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 72 deviceMemorySize: 22.20GiB deviceMemoryBandwidth: 558.88GiB/s
2022-05-04 19:44:23.416413: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 3 with properties: 
pciBusID: 0000:81:00.0 name: NVIDIA A10 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 72 deviceMemorySize: 22.20GiB deviceMemoryBandwidth: 558.88GiB/s
2022-05-04 19:44:23.417823: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 4 with properties: 
pciBusID: 0000:c1:00.0 name: NVIDIA A40 computeCapability: 8.6
coreClock: 1.74GHz coreCount: 84 deviceMemorySize: 44.56GiB deviceMemoryBandwidth: 648.29GiB/s
2022-05-04 19:44:23.417844: I tensorflow/stream_executor/platform/default/dso_loader.cc:54] Successfully opened dynamic library libcudart.so.11.0
2022-05-04 19:44:23.460000: I tensorflow/stream_executor/platform/default/dso_loader.cc:54] Successfully opened dynamic library libcublas.so.11
2022-05-04 19:44:23.460044: I tensorflow/stream_executor/platform/default/dso_loader.cc:54] Successfully opened dynamic library libcublasLt.so.11
2022-05-04 19:44:23.484264: I tensorflow/stream_executor/platform/default/dso_loader.cc:54] Successfully opened dynamic library libcufft.so.10
2022-05-04 19:44:23.507764: I tensorflow/stream_executor/platform/default/dso_loader.cc:54] Successfully opened dynamic library libcurand.so.10
2022-05-04 19:44:23.531913: I tensorflow/stream_executor/platform/default/dso_loader.cc:54] Successfully opened dynamic library libcutensor.so.1
2022-05-04 19:44:23.554745: I tensorflow/stream_executor/platform/default/dso_loader.cc:54] Successfully opened dynamic library libcusolver.so.11
2022-05-04 19:44:23.577344: I tensorflow/stream_executor/platform/default/dso_loader.cc:54] Successfully opened dynamic library libcusparse.so.11
2022-05-04 19:44:23.598496: I tensorflow/stream_executor/platform/default/dso_loader.cc:54] Successfully opened dynamic library libcudnn.so.8
2022-05-04 19:44:23.610402: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0, 1, 2, 3, 4
[New Thread 0x7fff8349a700 (LWP 61869)]
[New Thread 0x7fff82c99700 (LWP 61870)]
[New Thread 0x7fff82498700 (LWP 61871)]
[New Thread 0x7fff81c97700 (LWP 61872)]
[New Thread 0x7fff81496700 (LWP 61873)]
[New Thread 0x7fff80c95700 (LWP 61874)]
[New Thread 0x7fff0fa23700 (LWP 61875)]
[New Thread 0x7fff0f222700 (LWP 61876)]
[New Thread 0x7fff0ea21700 (LWP 61877)]
[New Thread 0x7fff0e220700 (LWP 61878)]
[New Thread 0x7fff0da1f700 (LWP 61879)]
[New Thread 0x7fff0d21e700 (LWP 61880)]
[New Thread 0x7fff0ca1d700 (LWP 61881)]
[New Thread 0x7ffedffff700 (LWP 61882)]
[New Thread 0x7ffedf7fe700 (LWP 61883)]
[New Thread 0x7ffedeffd700 (LWP 61884)]
[New Thread 0x7ffede7fc700 (LWP 61885)]
[New Thread 0x7ffeddffb700 (LWP 61886)]
[New Thread 0x7ffedd7fa700 (LWP 61887)]
[New Thread 0x7ffedcff9700 (LWP 61888)]
[New Thread 0x7ffebbfff700 (LWP 61889)]
[New Thread 0x7ffebb7fe700 (LWP 61890)]
[New Thread 0x7ffebaffd700 (LWP 61891)]
[New Thread 0x7ffeba7fc700 (LWP 61892)]
[New Thread 0x7ffeb9ffb700 (LWP 61893)]
[New Thread 0x7ffeb97fa700 (LWP 61894)]
[New Thread 0x7ffeb8ff9700 (LWP 61895)]
[New Thread 0x7ffe9bfff700 (LWP 61896)]
[New Thread 0x7ffe9b7fe700 (LWP 61897)]
[New Thread 0x7ffe9affd700 (LWP 61898)]
[New Thread 0x7ffe9a7fc700 (LWP 61899)]
[New Thread 0x7ffe99ffb700 (LWP 61900)]
[New Thread 0x7ffe997fa700 (LWP 61901)]
[New Thread 0x7ffe98ff9700 (LWP 61902)]
[New Thread 0x7ffe7bfff700 (LWP 61903)]
[New Thread 0x7ffe7b7fe700 (LWP 61904)]
[New Thread 0x7ffe7affd700 (LWP 61905)]
[New Thread 0x7ffe7a7fc700 (LWP 61906)]
[New Thread 0x7ffe79ffb700 (LWP 61907)]
[New Thread 0x7ffe797fa700 (LWP 61908)]
[New Thread 0x7ffe78ff9700 (LWP 61909)]
[New Thread 0x7ffe63fff700 (LWP 61910)]
[New Thread 0x7ffe637fe700 (LWP 61911)]
[New Thread 0x7ffe62ffd700 (LWP 61912)]
[New Thread 0x7ffe627fc700 (LWP 61913)]
[New Thread 0x7ffe61ffb700 (LWP 61914)]
[New Thread 0x7ffe617fa700 (LWP 61915)]
[New Thread 0x7ffe60ff9700 (LWP 61916)]
[New Thread 0x7ffe3ffff700 (LWP 61917)]
[New Thread 0x7ffe3f7fe700 (LWP 61918)]
[New Thread 0x7ffe3effd700 (LWP 61919)]
[New Thread 0x7ffe3e7fc700 (LWP 61920)]
[New Thread 0x7ffe3dffb700 (LWP 61921)]
[New Thread 0x7ffe3d7fa700 (LWP 61922)]
[New Thread 0x7ffe3cff9700 (LWP 61923)]
[New Thread 0x7ffe1bfff700 (LWP 61924)]
[New Thread 0x7ffe1b7fe700 (LWP 61925)]
[New Thread 0x7ffe1affd700 (LWP 61926)]
[New Thread 0x7ffe1a7fc700 (LWP 61927)]
[New Thread 0x7ffe19ffb700 (LWP 61928)]
[New Thread 0x7ffe197fa700 (LWP 61929)]
[New Thread 0x7ffe18ff9700 (LWP 61930)]
[New Thread 0x7ffdfbfff700 (LWP 61931)]
[New Thread 0x7ffdf3fff700 (LWP 61932)]
[New Thread 0x7ffdfb7fe700 (LWP 61933)]
[New Thread 0x7ffdfaffd700 (LWP 61934)]
[New Thread 0x7ffdfa7fc700 (LWP 61935)]
[New Thread 0x7ffdf9ffb700 (LWP 61936)]
[New Thread 0x7ffdf97fa700 (LWP 61937)]
[New Thread 0x7ffdf8ff9700 (LWP 61938)]
[New Thread 0x7ffdf37fe700 (LWP 61939)]
[New Thread 0x7ffdf2ffd700 (LWP 61940)]
[New Thread 0x7ffdf27fc700 (LWP 61941)]
[New Thread 0x7ffdf1ffb700 (LWP 61942)]
[New Thread 0x7ffdf17fa700 (LWP 61943)]
[New Thread 0x7ffdf0ff9700 (LWP 61944)]
[New Thread 0x7ffdbbfff700 (LWP 61945)]
[New Thread 0x7ffdbb7fe700 (LWP 61946)]
[New Thread 0x7ffdbaffd700 (LWP 61947)]
[New Thread 0x7ffdba7fc700 (LWP 61948)]
[New Thread 0x7ffdb9ffb700 (LWP 61949)]
[New Thread 0x7ffdb97fa700 (LWP 61950)]
[New Thread 0x7ffdb8ff9700 (LWP 61951)]
[New Thread 0x7ffd9bfff700 (LWP 61952)]
[New Thread 0x7ffd9b7fe700 (LWP 61953)]
[New Thread 0x7ffd9affd700 (LWP 61954)]
[New Thread 0x7ffd9a7fc700 (LWP 61955)]
[New Thread 0x7ffd99ffb700 (LWP 61956)]
[New Thread 0x7ffd997fa700 (LWP 61957)]
[New Thread 0x7ffd98ff9700 (LWP 61958)]
[New Thread 0x7ffd7bfff700 (LWP 61959)]
[New Thread 0x7ffd737fe700 (LWP 61960)]
[New Thread 0x7ffd7b7fe700 (LWP 61961)]
[New Thread 0x7ffd7affd700 (LWP 61962)]
[New Thread 0x7ffd7a7fc700 (LWP 61963)]
[New Thread 0x7ffd79ffb700 (LWP 61964)]
[New Thread 0x7ffd797fa700 (LWP 61965)]
[New Thread 0x7ffd78ff9700 (LWP 61966)]
[New Thread 0x7ffd73fff700 (LWP 61967)]
[New Thread 0x7ffd72ffd700 (LWP 61968)]
[New Thread 0x7ffd727fc700 (LWP 61986)]
[New Thread 0x7ffd71ffb700 (LWP 61987)]
[New Thread 0x7ffd717fa700 (LWP 61988)]
[New Thread 0x7ffd70ff9700 (LWP 61989)]
[New Thread 0x7ffd41fff700 (LWP 62007)]
[New Thread 0x7ffd417fe700 (LWP 62008)]
2022-05-04 19:44:32.857957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA A10 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 72 deviceMemorySize: 22.20GiB deviceMemoryBandwidth: 558.88GiB/s
2022-05-04 19:44:32.859130: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 1 with properties: 
pciBusID: 0000:41:00.0 name: NVIDIA A10 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 72 deviceMemorySize: 22.20GiB deviceMemoryBandwidth: 558.88GiB/s
2022-05-04 19:44:32.862621: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 2 with properties: 
pciBusID: 0000:61:00.0 name: NVIDIA A10 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 72 deviceMemorySize: 22.20GiB deviceMemoryBandwidth: 558.88GiB/s
2022-05-04 19:44:32.864706: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 3 with properties: 
pciBusID: 0000:81:00.0 name: NVIDIA A10 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 72 deviceMemorySize: 22.20GiB deviceMemoryBandwidth: 558.88GiB/s
2022-05-04 19:44:32.866416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 4 with properties: 
pciBusID: 0000:c1:00.0 name: NVIDIA A40 computeCapability: 8.6
coreClock: 1.74GHz coreCount: 84 deviceMemorySize: 44.56GiB deviceMemoryBandwidth: 648.29GiB/s
2022-05-04 19:44:32.877182: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0, 1, 2, 3, 4
2022-05-04 19:46:31.513555: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-05-04 19:46:31.513646: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 1 2 3 4 
2022-05-04 19:46:31.513655: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N Y Y Y Y 
2022-05-04 19:46:31.513660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 1:   Y N Y Y Y 
2022-05-04 19:46:31.513664: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 2:   Y Y N Y Y 
2022-05-04 19:46:31.513668: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 3:   Y Y Y N Y 
2022-05-04 19:46:31.513672: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 4:   Y Y Y Y N 
2022-05-04 19:46:32.375262: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 20566 MB memory) -> physical GPU (device: 0, name: NVIDIA A10, pci bus id: 0000:01:00.0, compute capability: 8.6)
[New Thread 0x7ffd40ffd700 (LWP 63758)]
[New Thread 0x7ffcf3fff700 (LWP 63759)]
2022-05-04 19:46:33.726341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 20566 MB memory) -> physical GPU (device: 1, name: NVIDIA A10, pci bus id: 0000:41:00.0, compute capability: 8.6)
[New Thread 0x7ffcf37fe700 (LWP 63788)]
[New Thread 0x7ffcf2f14700 (LWP 63789)]
2022-05-04 19:46:35.095419: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 20566 MB memory) -> physical GPU (device: 2, name: NVIDIA A10, pci bus id: 0000:61:00.0, compute capability: 8.6)
[New Thread 0x7ffcf2713700 (LWP 63813)]
[New Thread 0x7ffcf1f12700 (LWP 63814)]
2022-05-04 19:46:36.507977: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 20566 MB memory) -> physical GPU (device: 3, name: NVIDIA A10, pci bus id: 0000:81:00.0, compute capability: 8.6)
[New Thread 0x7ffcf1711700 (LWP 63878)]
[New Thread 0x7ffcf0f10700 (LWP 63879)]
2022-05-04 19:46:37.914435: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 43434 MB memory) -> physical GPU (device: 4, name: NVIDIA A40, pci bus id: 0000:c1:00.0, compute capability: 8.6)
[New Thread 0x7ffc9bfff700 (LWP 63908)]
[New Thread 0x7ffc9b7fe700 (LWP 63909)]
2022-05-04 19:46:39.270650: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 96. Tune using inter_op_parallelism_threads for best performance.
[New Thread 0x7ffc9affd700 (LWP 63910)]
[New Thread 0x7ffc9a7fc700 (LWP 63911)]
[New Thread 0x7ffc99ffb700 (LWP 63912)]
[New Thread 0x7ffc997fa700 (LWP 63913)]
[New Thread 0x7ffc98ff9700 (LWP 63914)]
[New Thread 0x7ffc7ffff700 (LWP 63915)]
[New Thread 0x7ffc7f7fe700 (LWP 63916)]
[New Thread 0x7ffc7effd700 (LWP 63917)]
[New Thread 0x7ffc7e7fc700 (LWP 63918)]
[New Thread 0x7ffc7dffb700 (LWP 63919)]
[New Thread 0x7ffc7d7fa700 (LWP 63920)]
[New Thread 0x7ffc7cff9700 (LWP 63921)]
[New Thread 0x7ffc53fff700 (LWP 63922)]
[New Thread 0x7ffc537fe700 (LWP 63923)]
[New Thread 0x7ffc52ffd700 (LWP 63924)]
[New Thread 0x7ffc527fc700 (LWP 63925)]
[New Thread 0x7ffc51ffb700 (LWP 63926)]
[New Thread 0x7ffc517fa700 (LWP 63927)]
[New Thread 0x7ffc50ff9700 (LWP 63928)]
[New Thread 0x7ffc3bfff700 (LWP 63929)]
[New Thread 0x7ffc3b7fe700 (LWP 63930)]
[New Thread 0x7ffc3affd700 (LWP 63931)]
[New Thread 0x7ffc3a7fc700 (LWP 63932)]
[New Thread 0x7ffc39ffb700 (LWP 63933)]
[New Thread 0x7ffc397fa700 (LWP 63934)]
[New Thread 0x7ffc38ff9700 (LWP 63935)]
[New Thread 0x7ffc17fff700 (LWP 63936)]
[New Thread 0x7ffc177fe700 (LWP 63937)]
[New Thread 0x7ffc16ffd700 (LWP 63938)]
[New Thread 0x7ffc167fc700 (LWP 63939)]
[New Thread 0x7ffc15ffb700 (LWP 63940)]
[New Thread 0x7ffc157fa700 (LWP 63941)]
[New Thread 0x7ffc14ff9700 (LWP 63942)]
[New Thread 0x7ffbf7fff700 (LWP 63943)]
[New Thread 0x7ffbef7fe700 (LWP 63944)]
[New Thread 0x7ffbf77fe700 (LWP 63945)]
[New Thread 0x7ffbf6ffd700 (LWP 63946)]
[New Thread 0x7ffbf67fc700 (LWP 63947)]
[New Thread 0x7ffbf5ffb700 (LWP 63948)]
[New Thread 0x7ffbf57fa700 (LWP 63949)]
[New Thread 0x7ffbf4ff9700 (LWP 63950)]
[New Thread 0x7ffbeffff700 (LWP 63951)]
[New Thread 0x7ffbeeffd700 (LWP 63952)]
[New Thread 0x7ffbee7fc700 (LWP 63953)]
[New Thread 0x7ffbedffb700 (LWP 63954)]
[New Thread 0x7ffbed7fa700 (LWP 63955)]
[New Thread 0x7ffbecff9700 (LWP 63956)]
[New Thread 0x7ffbb7fff700 (LWP 63957)]
[New Thread 0x7ffbaffff700 (LWP 63958)]
[New Thread 0x7ffbb77fe700 (LWP 63959)]
[New Thread 0x7ffbb6ffd700 (LWP 63960)]
[New Thread 0x7ffbb67fc700 (LWP 63961)]
[New Thread 0x7ffbb5ffb700 (LWP 63962)]
[New Thread 0x7ffbb57fa700 (LWP 63963)]
[New Thread 0x7ffbb4ff9700 (LWP 63964)]
[New Thread 0x7ffbaf7fe700 (LWP 63965)]
[New Thread 0x7ffbaeffd700 (LWP 63966)]
[New Thread 0x7ffbae7fc700 (LWP 63967)]
[New Thread 0x7ffbadffb700 (LWP 63968)]
[New Thread 0x7ffbad7fa700 (LWP 63969)]
[New Thread 0x7ffbacff9700 (LWP 63970)]
[New Thread 0x7ffb77fff700 (LWP 63971)]
[New Thread 0x7ffb777fe700 (LWP 63972)]
[New Thread 0x7ffb76ffd700 (LWP 63973)]
[New Thread 0x7ffb767fc700 (LWP 63974)]
[New Thread 0x7ffb75ffb700 (LWP 63975)]
[New Thread 0x7ffb757fa700 (LWP 63976)]
[New Thread 0x7ffb74ff9700 (LWP 63977)]
[New Thread 0x7ffb57fff700 (LWP 63978)]
[New Thread 0x7ffb4f7fe700 (LWP 63979)]
[New Thread 0x7ffb577fe700 (LWP 63980)]
[New Thread 0x7ffb56ffd700 (LWP 63981)]
[New Thread 0x7ffb567fc700 (LWP 63982)]
[New Thread 0x7ffb55ffb700 (LWP 63983)]
[New Thread 0x7ffb557fa700 (LWP 63984)]
[New Thread 0x7ffb54ff9700 (LWP 63985)]
[New Thread 0x7ffb4ffff700 (LWP 63986)]
[New Thread 0x7ffb4effd700 (LWP 63987)]
[New Thread 0x7ffb4e7fc700 (LWP 63988)]
[New Thread 0x7ffb4dffb700 (LWP 63989)]
[New Thread 0x7ffb4d7fa700 (LWP 63990)]
[New Thread 0x7ffb4cff9700 (LWP 63991)]
[New Thread 0x7ffb17fff700 (LWP 63992)]
[New Thread 0x7ffb0f7fe700 (LWP 63993)]
[New Thread 0x7ffb177fe700 (LWP 63994)]
[New Thread 0x7ffb16ffd700 (LWP 63995)]
[New Thread 0x7ffb167fc700 (LWP 63996)]
[New Thread 0x7ffb15ffb700 (LWP 63997)]
[New Thread 0x7ffb157fa700 (LWP 63998)]
[New Thread 0x7ffb14ff9700 (LWP 63999)]
[New Thread 0x7ffb0ffff700 (LWP 64000)]
[New Thread 0x7ffb0effd700 (LWP 64001)]
[New Thread 0x7ffb0e7fc700 (LWP 64002)]
[New Thread 0x7ffb0dffb700 (LWP 64003)]
[New Thread 0x7ffb0d7fa700 (LWP 64004)]
[New Thread 0x7ffb0cff9700 (LWP 64005)]
inBox: [   8 1527   56  839]
outBox: [   9 1526   57  838]
msdiffGPUBBP 9.787128717049706 8.836871411244479 8.792443641292834 1.0240484074218186
msdiffGPUBBP 8.284130262488215 8.504984482311862 8.468490694873381 0.9001938761471568
msdiffGPUBBP 7.607680601497159 8.326243672859956 8.302741359766436 0.7882571994561002
msdiffGPUBBP 9.01851150047646 8.826698188773078 8.800456119590438 0.852269245604869
msdiffGPUBBP 10.002452782253538 9.02661007092474 9.003068931959348 0.7974960688804729
msdiffGPUBBP 9.362822708408418 8.672302889066888 8.648869787310186 0.8399175541051185
msdiffGPUBBP 13.275880608755909 13.290950955327459 13.297561482034173 0.5154759081244825
mean (GPU-BBP) 0.8168083228200026
mean diff Median (GPU-BBP) 0.0
[Thread 0x7ffb4f7fe700 (LWP 63979) exited]
[Thread 0x7ffb0dffb700 (LWP 64003) exited]
[Thread 0x7ffb0e7fc700 (LWP 64002) exited]
[Thread 0x7ffb167fc700 (LWP 63996) exited]
[Thread 0x7ffb157fa700 (LWP 63998) exited]
[Thread 0x7ffb0ffff700 (LWP 64000) exited]
[Thread 0x7ffb177fe700 (LWP 63994) exited]
[Thread 0x7ffb0f7fe700 (LWP 63993) exited]
[Thread 0x7ffb0d7fa700 (LWP 64004) exited]
[Thread 0x7ffb0effd700 (LWP 64001) exited]
[Thread 0x7ffb15ffb700 (LWP 63997) exited]
[Thread 0x7ffb0cff9700 (LWP 64005) exited]
[Thread 0x7ffb14ff9700 (LWP 63999) exited]
[Thread 0x7ffb16ffd700 (LWP 63995) exited]
[Thread 0x7ffb17fff700 (LWP 63992) exited]
[Thread 0x7ffb4cff9700 (LWP 63991) exited]
[Thread 0x7ffb4d7fa700 (LWP 63990) exited]
[Thread 0x7ffb4dffb700 (LWP 63989) exited]
[Thread 0x7ffb4e7fc700 (LWP 63988) exited]
[Thread 0x7ffb4effd700 (LWP 63987) exited]
[Thread 0x7ffb4ffff700 (LWP 63986) exited]
[Thread 0x7ffb54ff9700 (LWP 63985) exited]
[Thread 0x7ffb557fa700 (LWP 63984) exited]
[Thread 0x7ffb55ffb700 (LWP 63983) exited]
[Thread 0x7ffb567fc700 (LWP 63982) exited]
[Thread 0x7ffb56ffd700 (LWP 63981) exited]
[Thread 0x7ffb577fe700 (LWP 63980) exited]
[Thread 0x7ffb57fff700 (LWP 63978) exited]
[Thread 0x7ffb74ff9700 (LWP 63977) exited]
[Thread 0x7ffb757fa700 (LWP 63976) exited]
[Thread 0x7ffb75ffb700 (LWP 63975) exited]
[Thread 0x7ffb767fc700 (LWP 63974) exited]
[Thread 0x7ffb76ffd700 (LWP 63973) exited]
[Thread 0x7ffb777fe700 (LWP 63972) exited]
[Thread 0x7ffb77fff700 (LWP 63971) exited]
[Thread 0x7ffbacff9700 (LWP 63970) exited]
[Thread 0x7ffbad7fa700 (LWP 63969) exited]
[Thread 0x7ffbadffb700 (LWP 63968) exited]
[Thread 0x7ffbae7fc700 (LWP 63967) exited]
[Thread 0x7ffbaeffd700 (LWP 63966) exited]
[Thread 0x7ffbaf7fe700 (LWP 63965) exited]
[Thread 0x7ffbb4ff9700 (LWP 63964) exited]
[Thread 0x7ffbb57fa700 (LWP 63963) exited]
[Thread 0x7ffbb5ffb700 (LWP 63962) exited]
[Thread 0x7ffbb67fc700 (LWP 63961) exited]
[Thread 0x7ffbb6ffd700 (LWP 63960) exited]
[Thread 0x7ffbb77fe700 (LWP 63959) exited]
[Thread 0x7ffbaffff700 (LWP 63958) exited]
[Thread 0x7ffbb7fff700 (LWP 63957) exited]
[Thread 0x7ffbecff9700 (LWP 63956) exited]
[Thread 0x7ffbed7fa700 (LWP 63955) exited]
[Thread 0x7ffbedffb700 (LWP 63954) exited]
[Thread 0x7ffbee7fc700 (LWP 63953) exited]
[Thread 0x7ffbeeffd700 (LWP 63952) exited]
[Thread 0x7ffbeffff700 (LWP 63951) exited]
[Thread 0x7ffbf4ff9700 (LWP 63950) exited]
[Thread 0x7ffbf57fa700 (LWP 63949) exited]
[Thread 0x7ffbf5ffb700 (LWP 63948) exited]
[Thread 0x7ffbf67fc700 (LWP 63947) exited]
[Thread 0x7ffbf6ffd700 (LWP 63946) exited]
[Thread 0x7ffbf77fe700 (LWP 63945) exited]
[Thread 0x7ffbef7fe700 (LWP 63944) exited]
[Thread 0x7ffbf7fff700 (LWP 63943) exited]
[Thread 0x7ffc14ff9700 (LWP 63942) exited]
[Thread 0x7ffc157fa700 (LWP 63941) exited]
[Thread 0x7ffc15ffb700 (LWP 63940) exited]
[Thread 0x7ffc167fc700 (LWP 63939) exited]
[Thread 0x7ffc16ffd700 (LWP 63938) exited]
[Thread 0x7ffc177fe700 (LWP 63937) exited]
[Thread 0x7ffc17fff700 (LWP 63936) exited]
[Thread 0x7ffc38ff9700 (LWP 63935) exited]
[Thread 0x7ffc397fa700 (LWP 63934) exited]
[Thread 0x7ffc39ffb700 (LWP 63933) exited]
[Thread 0x7ffc3a7fc700 (LWP 63932) exited]
[Thread 0x7ffc3affd700 (LWP 63931) exited]
[Thread 0x7ffc3b7fe700 (LWP 63930) exited]
[Thread 0x7ffc3bfff700 (LWP 63929) exited]
[Thread 0x7ffc50ff9700 (LWP 63928) exited]
[Thread 0x7ffc517fa700 (LWP 63927) exited]
[Thread 0x7ffc51ffb700 (LWP 63926) exited]
[Thread 0x7ffc527fc700 (LWP 63925) exited]
[Thread 0x7ffc52ffd700 (LWP 63924) exited]
[Thread 0x7ffc537fe700 (LWP 63923) exited]
[Thread 0x7ffc53fff700 (LWP 63922) exited]
[Thread 0x7ffc7cff9700 (LWP 63921) exited]
[Thread 0x7ffc7d7fa700 (LWP 63920) exited]
[Thread 0x7ffc7dffb700 (LWP 63919) exited]
[Thread 0x7ffc7e7fc700 (LWP 63918) exited]
[Thread 0x7ffc7effd700 (LWP 63917) exited]
[Thread 0x7ffc7f7fe700 (LWP 63916) exited]
[Thread 0x7ffc7ffff700 (LWP 63915) exited]
[Thread 0x7ffc98ff9700 (LWP 63914) exited]
[Thread 0x7ffc997fa700 (LWP 63913) exited]
[Thread 0x7ffc99ffb700 (LWP 63912) exited]
[Thread 0x7ffc9a7fc700 (LWP 63911) exited]
[Thread 0x7ffc9affd700 (LWP 63910) exited]
[Thread 0x7ffc9b7fe700 (LWP 63909) exited]
[Thread 0x7ffc9bfff700 (LWP 63908) exited]
[Thread 0x7ffcf0f10700 (LWP 63879) exited]
[Thread 0x7ffcf1711700 (LWP 63878) exited]
[Thread 0x7ffcf1f12700 (LWP 63814) exited]
[Thread 0x7ffcf2713700 (LWP 63813) exited]
[Thread 0x7ffcf2f14700 (LWP 63789) exited]
[Thread 0x7ffcf37fe700 (LWP 63788) exited]
[Thread 0x7ffcf3fff700 (LWP 63759) exited]
[Thread 0x7ffd40ffd700 (LWP 63758) exited]
[Thread 0x7ffd417fe700 (LWP 62008) exited]
[Thread 0x7ffd41fff700 (LWP 62007) exited]
[Thread 0x7ffd70ff9700 (LWP 61989) exited]
[Thread 0x7ffd717fa700 (LWP 61988) exited]
[Thread 0x7ffd71ffb700 (LWP 61987) exited]
[Thread 0x7ffd727fc700 (LWP 61986) exited]
[Thread 0x7ffd72ffd700 (LWP 61968) exited]
[Thread 0x7ffd73fff700 (LWP 61967) exited]
[Thread 0x7ffd78ff9700 (LWP 61966) exited]
[Thread 0x7ffd797fa700 (LWP 61965) exited]
[Thread 0x7ffd79ffb700 (LWP 61964) exited]
[Thread 0x7ffd7a7fc700 (LWP 61963) exited]
[Thread 0x7ffd7affd700 (LWP 61962) exited]
[Thread 0x7ffd7b7fe700 (LWP 61961) exited]
[Thread 0x7ffd737fe700 (LWP 61960) exited]
[Thread 0x7ffd7bfff700 (LWP 61959) exited]
[Thread 0x7ffd98ff9700 (LWP 61958) exited]
[Thread 0x7ffd997fa700 (LWP 61957) exited]
[Thread 0x7ffd99ffb700 (LWP 61956) exited]
[Thread 0x7ffd9a7fc700 (LWP 61955) exited]
[Thread 0x7ffd9affd700 (LWP 61954) exited]
[Thread 0x7ffd9b7fe700 (LWP 61953) exited]
[Thread 0x7ffd9bfff700 (LWP 61952) exited]
[Thread 0x7ffdb8ff9700 (LWP 61951) exited]
[Thread 0x7ffdb97fa700 (LWP 61950) exited]
[Thread 0x7ffdb9ffb700 (LWP 61949) exited]
[Thread 0x7ffdba7fc700 (LWP 61948) exited]
[Thread 0x7ffdbaffd700 (LWP 61947) exited]
[Thread 0x7ffdbb7fe700 (LWP 61946) exited]
[Thread 0x7ffdbbfff700 (LWP 61945) exited]
[Thread 0x7ffdf0ff9700 (LWP 61944) exited]
[Thread 0x7ffdf17fa700 (LWP 61943) exited]
[Thread 0x7ffdf1ffb700 (LWP 61942) exited]
[Thread 0x7ffdf27fc700 (LWP 61941) exited]
[Thread 0x7ffdf2ffd700 (LWP 61940) exited]
[Thread 0x7ffdf37fe700 (LWP 61939) exited]
[Thread 0x7ffdf8ff9700 (LWP 61938) exited]
[Thread 0x7ffdf97fa700 (LWP 61937) exited]
[Thread 0x7ffdf9ffb700 (LWP 61936) exited]
[Thread 0x7ffdfa7fc700 (LWP 61935) exited]
[Thread 0x7ffdfaffd700 (LWP 61934) exited]
[Thread 0x7ffdfb7fe700 (LWP 61933) exited]
[Thread 0x7ffdf3fff700 (LWP 61932) exited]
[Thread 0x7ffdfbfff700 (LWP 61931) exited]
[Thread 0x7ffe18ff9700 (LWP 61930) exited]
[Thread 0x7ffe197fa700 (LWP 61929) exited]
[Thread 0x7ffe19ffb700 (LWP 61928) exited]
[Thread 0x7ffe1a7fc700 (LWP 61927) exited]
[Thread 0x7ffe1affd700 (LWP 61926) exited]
[Thread 0x7ffe1b7fe700 (LWP 61925) exited]
[Thread 0x7ffe1bfff700 (LWP 61924) exited]
[Thread 0x7ffe3cff9700 (LWP 61923) exited]
[Thread 0x7ffe3d7fa700 (LWP 61922) exited]
[Thread 0x7ffe3dffb700 (LWP 61921) exited]
[Thread 0x7ffe3e7fc700 (LWP 61920) exited]
[Thread 0x7ffe3effd700 (LWP 61919) exited]
[Thread 0x7ffe3ffff700 (LWP 61917) exited]
[Thread 0x7ffe60ff9700 (LWP 61916) exited]
[Thread 0x7ffe617fa700 (LWP 61915) exited]
[Thread 0x7ffe61ffb700 (LWP 61914) exited]
[Thread 0x7ffe627fc700 (LWP 61913) exited]
[Thread 0x7ffe62ffd700 (LWP 61912) exited]
[Thread 0x7ffe637fe700 (LWP 61911) exited]
[Thread 0x7ffe63fff700 (LWP 61910) exited]
[Thread 0x7ffe78ff9700 (LWP 61909) exited]
[Thread 0x7ffe797fa700 (LWP 61908) exited]
[Thread 0x7ffe79ffb700 (LWP 61907) exited]
[Thread 0x7ffe7a7fc700 (LWP 61906) exited]
[Thread 0x7ffe7affd700 (LWP 61905) exited]
[Thread 0x7ffe7b7fe700 (LWP 61904) exited]
[Thread 0x7ffe7bfff700 (LWP 61903) exited]
[Thread 0x7ffe98ff9700 (LWP 61902) exited]
[Thread 0x7ffe997fa700 (LWP 61901) exited]
[Thread 0x7ffe99ffb700 (LWP 61900) exited]
[Thread 0x7ffe9a7fc700 (LWP 61899) exited]
[Thread 0x7ffe9affd700 (LWP 61898) exited]
[Thread 0x7ffe9b7fe700 (LWP 61897) exited]
[Thread 0x7ffe9bfff700 (LWP 61896) exited]
[Thread 0x7ffeb8ff9700 (LWP 61895) exited]
[Thread 0x7ffeb97fa700 (LWP 61894) exited]
[Thread 0x7ffeb9ffb700 (LWP 61893) exited]
[Thread 0x7ffeba7fc700 (LWP 61892) exited]
[Thread 0x7ffebaffd700 (LWP 61891) exited]
[Thread 0x7ffebb7fe700 (LWP 61890) exited]
[Thread 0x7ffebbfff700 (LWP 61889) exited]
[Thread 0x7ffedcff9700 (LWP 61888) exited]
[Thread 0x7ffedd7fa700 (LWP 61887) exited]
[Thread 0x7ffeddffb700 (LWP 61886) exited]
[Thread 0x7ffede7fc700 (LWP 61885) exited]
[Thread 0x7ffedeffd700 (LWP 61884) exited]
[Thread 0x7ffedf7fe700 (LWP 61883) exited]
[Thread 0x7ffedffff700 (LWP 61882) exited]
[Thread 0x7fff0ca1d700 (LWP 61881) exited]
[Thread 0x7fff0d21e700 (LWP 61880) exited]
[Thread 0x7fff0da1f700 (LWP 61879) exited]
[Thread 0x7fff0e220700 (LWP 61878) exited]
[Thread 0x7fff0ea21700 (LWP 61877) exited]
[Thread 0x7fff0f222700 (LWP 61876) exited]
[Thread 0x7fff0fa23700 (LWP 61875) exited]
[Thread 0x7fff80c95700 (LWP 61874) exited]
[Thread 0x7fff81496700 (LWP 61873) exited]
[Thread 0x7fff81c97700 (LWP 61872) exited]
[Thread 0x7fff82498700 (LWP 61871) exited]
[Thread 0x7fff82c99700 (LWP 61870) exited]
[Thread 0x7fff8349a700 (LWP 61869) exited]
[Thread 0x7fff83ec3700 (LWP 61868) exited]
[Thread 0x7ffff7c09740 (LWP 61655) exited]

我很困惑,根据我的理解,如果不在 CUDA 内核中设置断点,gdb 和 cuda-gdb 应该大致相同。那么为什么cuda-gdb执行程序比gdb慢很多呢?另外,有没有办法提高性能?

最佳答案

.... without setting breakpoints in CUDA kernels, gdb and cuda-gdb should be approximately the same.

事实证明您自己的假设是错误的。

CUDA GPU 上调试和分析实现的内部工作原理并不公开,但有大量经验证据表明调试需要与正常内核执行不同的执行模式。这包括使用硬件或软件抢占,以便主机上运行的调试器可以逐条指令地控制 GPU 的执行、捕获错误并捕获 GPU 堆栈跟踪,并执行寄存器文件镜像,以便可以了解 GPU 状态。作为示例,在运行时进行检查。

这一切都不是免费的。如果您运行 cuda-gdb,您将获得所有这些额外的检测和诊断功能,但会付出性能代价。正如您的问题中所述,您自己的基准测试已证明这是否取决于 GPU 端的断点。

如果您不运行 cuda-gdb,而是使用无法询问或控制 GPU 执行的纯主机端调试器,则不会产生这些性能损失。选择权在你。

关于python - 为什么在 CUDA 内核中没有断点的情况下执行相同的程序时,cuda-gdb 比 gdb 慢很多?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/72126775/

相关文章:

python - 什么是动态调度和鸭子类型?

tensorflow - 将张量分为训练集和测试集

python - Keras ImageDataGenerator Fit 导致内存泄漏

cuda - 2D 纹理的间距对齐

c++ - 尝试为图像缓冲区分配内存时错误的 ptr 值

python - 递归合并字典,以便将具有共享键的元素组合到一个列表中

python - 如何在 matplotlib 中的等高线图上绘制矢量场?

python - 用python写内存扫描器

python - 在 keras 中制作自定义损失函数

opencv - CUDA - 总线错误