python - 进程完成,退出代码为 -1073741571 (0xC00000FD) Tensorflow

标签 python tensorflow pycharm anaconda

我知道这个问题被问了很多,但就我而言,它有点奇怪。我刚拿到 RTX 3080 并尝试根据我在 reddit 上找到的教程安装 Tensorflow .我按照那里的描述做了一切: 安装 Anaconda --> Python 3.8 --> TF-nightly v. 2.5.0 --> Visual Studio C++ --> Cuda 11.1.0 --> cuDNN 8.0.4 --> 添加路径 --> 重启电脑。一开始似乎一切正常。我尝试了以下命令:

import tensorflow as tf
tf.config.list_physical_devices()

正如您在输出中看到的那样,这没有任何错误:

C:\Users\loose\.conda\envs\tf2\python.exe C:/Users/loose/PycharmProjects/GenerateAutomatedEMail/python/test.py
2021-01-16 00:40:45.043205: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-16 00:40:46.676446: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-01-16 00:40:46.699117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1760] Found device 0 with properties: 
pciBusID: 0000:2d:00.0 name: GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.785GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s
2021-01-16 00:40:46.699285: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-16 00:40:46.713523: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-16 00:40:46.713626: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-16 00:40:46.717017: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-01-16 00:40:46.718013: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-01-16 00:40:46.725508: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-01-16 00:40:46.728010: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-01-16 00:40:46.728534: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-16 00:40:46.728660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1898] Adding visible gpu devices: 0

Process finished with exit code 0

我目前尝试从 TF tutorials 训练 Seq2Seq 模型.代码几乎完全相同,但我使用 PyCharm 而不是 Jupyter,我将所有内容都放在一个类中,但代码本身是相同的。我的完整代码在 GitHub 中可用.当我想训练模型时,出现错误 “进程已完成,退出代码为 -1073741571 (0xC00000FD)”。但是没有真正的错误显示程序刚刚结束并使用此退出代码:

C:\Users\loose\.conda\envs\tf2\python.exe C:/Users/loose/PycharmProjects/GenerateAutomatedEMail/python/train_model.py
2021-01-16 00:50:34.337791: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-16 00:50:36.873698: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-01-16 00:50:36.894834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1760] Found device 0 with properties: 
pciBusID: 0000:2d:00.0 name: GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.785GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s
2021-01-16 00:50:36.895004: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-16 00:50:36.909453: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-16 00:50:36.909542: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-16 00:50:36.912954: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-01-16 00:50:36.914024: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-01-16 00:50:36.921476: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-01-16 00:50:36.924059: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-01-16 00:50:36.924660: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-16 00:50:36.924807: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1898] Adding visible gpu devices: 0
2021-01-16 00:50:36.925280: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-16 00:50:36.926213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1760] Found device 0 with properties: 
pciBusID: 0000:2d:00.0 name: GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.785GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s
2021-01-16 00:50:36.926418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1898] Adding visible gpu devices: 0
2021-01-16 00:50:37.388811: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1300] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-16 00:50:37.388901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1306]      0 
2021-01-16 00:50:37.388947: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1319] 0:   N 
2021-01-16 00:50:37.389134: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1446] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7447 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3080, pci bus id: 0000:2d:00.0, compute capability: 8.6)
2021-01-16 00:50:38.006971: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-16 00:50:38.586194: I tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Loaded cuDNN version 8004
2021-01-16 00:50:38.709516: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-16 00:50:39.312210: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-16 00:50:39.313013: I tensorflow/stream_executor/cuda/cuda_bl

as.cc:1838] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.

Process finished with exit code -1073741571 (0xC00000FD)

所以我试图在程序崩溃时定位到那一行。我发现它在我初始化“BahdanauAttention”类后立即崩溃,如本 picture 所示。 .

经过几个小时的测试后,我可以假设/确认一些事情:

  • 我可以在这个 venv 中正常运行(非 tensorflow)代码而没有这个错误
  • 我没有用完 ram(最多只有 17GB 的 32GB ram 在使用中)
  • 我没有打开任何可能导致冲突的程序(例如 NVIDIA Broadcast 或 Jupyter Lab 等)

我为解决该问题而进行的测试:

  • 重新安装 Conda
  • 创建新的 venv
  • 重新安装 TF 以及所有 NVIVIDA 驱动程序
  • 尝试不同的 Python 版本(3.7 而不是 3.8)
  • 重启我的电脑

此时我有点别无选择。有谁知道如何解决这个问题?

最佳答案

您可以将 Tensorflow 升级到最新的稳定版本,因为 Tensorflow 2.4 版本支持新的 Nvidia Ampere 架构是 RTX 30 系列和 CUDA 11 支持也可用。
您可以查看此图表了解详细信息,并按照指南进行安装。
https://www.tensorflow.org/install/source_windows#tested_build_configurations

关于 GPU 上的内存使用,您始终可以像提到的那样在代码的开头设置内存增长 here .

关于python - 进程完成,退出代码为 -1073741571 (0xC00000FD) Tensorflow,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65745157/

相关文章:

tensorflow - 如何在 Keras 中的两个不同 GPU 上并行运行两个模型

python - 如何在 PyCharm 中反转控制台历史记录顺序以进行复制粘贴?

python - 如何在不在 PyCharm 中重新加载包的情况下在 Python 控制台中进行调试?

python - 使用 scrapy 选择单选按钮

python - 无法从 plt.bar 生成条形图

python - 如何重写 tensorflow 图以使用 CPU 进行所有操作

python - 带有远程ssh解释器的pycharm中的matplotlib

python - Protocol Buffers 重复字段

python - 在 Keras 的 tokenizer 类中使用 num_words

python - 将元数据添加到 tensorflow 卡住图 pb