python - 进程完成，退出代码为 -1073741571 (0xC00000FD) Tensorflow

我知道这个问题被问了很多，但就我而言，它有点奇怪。我刚拿到 RTX 3080 并尝试根据我在 reddit 上找到的教程安装 Tensorflow .我按照那里的描述做了一切: 安装 Anaconda --> Python 3.8 --> TF-nightly v. 2.5.0 --> Visual Studio C++ --> Cuda 11.1.0 --> cuDNN 8.0.4 --> 添加路径 --> 重启电脑。一开始似乎一切正常。我尝试了以下命令:

import tensorflow as tf
tf.config.list_physical_devices()

正如您在输出中看到的那样，这没有任何错误:

C:\Users\loose\.conda\envs\tf2\python.exe C:/Users/loose/PycharmProjects/GenerateAutomatedEMail/python/test.py
2021-01-16 00:40:45.043205: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-16 00:40:46.676446: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-01-16 00:40:46.699117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1760] Found device 0 with properties: 
pciBusID: 0000:2d:00.0 name: GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.785GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s
2021-01-16 00:40:46.699285: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-16 00:40:46.713523: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-16 00:40:46.713626: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-16 00:40:46.717017: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-01-16 00:40:46.718013: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-01-16 00:40:46.725508: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-01-16 00:40:46.728010: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-01-16 00:40:46.728534: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-16 00:40:46.728660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1898] Adding visible gpu devices: 0

Process finished with exit code 0

我目前尝试从 TF tutorials 训练 Seq2Seq 模型.代码几乎完全相同，但我使用 PyCharm 而不是 Jupyter，我将所有内容都放在一个类中，但代码本身是相同的。我的完整代码在 GitHub 中可用.当我想训练模型时，出现错误 “进程已完成，退出代码为 -1073741571 (0xC00000FD)”。但是没有真正的错误显示程序刚刚结束并使用此退出代码:

C:\Users\loose\.conda\envs\tf2\python.exe C:/Users/loose/PycharmProjects/GenerateAutomatedEMail/python/train_model.py
2021-01-16 00:50:34.337791: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-16 00:50:36.873698: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-01-16 00:50:36.894834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1760] Found device 0 with properties: 
pciBusID: 0000:2d:00.0 name: GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.785GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s
2021-01-16 00:50:36.895004: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-16 00:50:36.909453: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-16 00:50:36.909542: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-16 00:50:36.912954: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-01-16 00:50:36.914024: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-01-16 00:50:36.921476: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-01-16 00:50:36.924059: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-01-16 00:50:36.924660: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-16 00:50:36.924807: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1898] Adding visible gpu devices: 0
2021-01-16 00:50:36.925280: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-16 00:50:36.926213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1760] Found device 0 with properties: 
pciBusID: 0000:2d:00.0 name: GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.785GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s
2021-01-16 00:50:36.926418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1898] Adding visible gpu devices: 0
2021-01-16 00:50:37.388811: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1300] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-16 00:50:37.388901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1306]      0 
2021-01-16 00:50:37.388947: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1319] 0:   N 
2021-01-16 00:50:37.389134: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1446] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7447 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3080, pci bus id: 0000:2d:00.0, compute capability: 8.6)
2021-01-16 00:50:38.006971: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-16 00:50:38.586194: I tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Loaded cuDNN version 8004
2021-01-16 00:50:38.709516: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-16 00:50:39.312210: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-16 00:50:39.313013: I tensorflow/stream_executor/cuda/cuda_bl

as.cc:1838] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.

Process finished with exit code -1073741571 (0xC00000FD)

所以我试图在程序崩溃时定位到那一行。我发现它在我初始化“BahdanauAttention”类后立即崩溃，如本 picture 所示。 .

经过几个小时的测试后，我可以假设/确认一些事情:

我可以在这个 venv 中正常运行(非 tensorflow)代码而没有这个错误
我没有用完 ram(最多只有 17GB 的 32GB ram 在使用中)
我没有打开任何可能导致冲突的程序(例如 NVIDIA Broadcast 或 Jupyter Lab 等)

我为解决该问题而进行的测试:

重新安装 Conda
创建新的 venv
重新安装 TF 以及所有 NVIVIDA 驱动程序
尝试不同的 Python 版本(3.7 而不是 3.8)
重启我的电脑

此时我有点别无选择。有谁知道如何解决这个问题？

最佳答案

您可以将 Tensorflow 升级到最新的稳定版本，因为 Tensorflow 2.4 版本支持新的 Nvidia Ampere 架构是 RTX 30 系列和 CUDA 11 支持也可用。
您可以查看此图表了解详细信息，并按照指南进行安装。
https://www.tensorflow.org/install/source_windows#tested_build_configurations

关于 GPU 上的内存使用，您始终可以像提到的那样在代码的开头设置内存增长 here .

关于python - 进程完成，退出代码为 -1073741571 (0xC00000FD) Tensorflow，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/65745157/

python - 进程完成，退出代码为 -1073741571 (0xC00000FD) Tensorflow

上一篇：reactjs - 在父组件中，有什么方法可以从子组件访问 Prop ？

下一篇：c# - Asp.net core WebApi 中空值的自定义序列化