python - 在多进程中使用一张图运行 tensorflow

标签 python tensorflow multiprocessing python-multiprocessing ensemble-learning

我正在尝试使用 5 个集成网络训练分类器。我决定用不同的批处理训练它们,所以我想创建多进程以节省我的时间。

这是我的算法设计:

import multiprocessing as mp
import tensorflow as tf

# create() function returns 5 optimizer for 5 network, i.e. len(opt_list) = 5
opt_list = create()

def sub_process(sess, opt, feed_batch):
    sess.run(opt, feed_dict=feed_batch)

batch_list = []
for i in range(5):
    batch = generate_batch(batch_size=100)
    batch_list.append(batch)

for i in range(5):
    p = mp.Process(target=sub_process, args=(sess, opt_list[i], batch_list[i]))
    p.start()

for i in range(5):
    p.join()

首先,我构建图表并将每个网络部署在 5 个不同的设备上(我总共有 5 个 GPU)。

然后,我从数据集中抽取样本(例如,如果我想将 100 张图像提供给一个网络,那么我将生成 500 个样本)

接下来,我使用 python3 包多处理创建 5 个进程。每个进程在给定参数输入的情况下运行一个 sub_process 函数。

但是,当我运行代码时,出现以下错误

2018-08-14 18:13:56.776853: E tensorflow/core/grappler/clusters/utils.cc:82] Failed to get device properties, error code: 3
2018-08-14 18:13:56.776940: E tensorflow/core/grappler/clusters/utils.cc:82] Failed to get device properties, error code: 3
2018-08-14 18:13:56.776978: E tensorflow/core/grappler/clusters/utils.cc:82] Failed to get device properties, error code: 3
2018-08-14 18:13:56.777004: E tensorflow/core/grappler/clusters/utils.cc:82] Failed to get device properties, error code: 3
2018-08-14 18:13:56.830762: E tensorflow/core/grappler/clusters/utils.cc:82] Failed to get device properties, error code: 3
2018-08-14 18:13:56.831239: E tensorflow/core/grappler/clusters/utils.cc:82] Failed to get device properties, error code: 3
2018-08-14 18:13:56.831262: E tensorflow/core/grappler/clusters/utils.cc:82] Failed to get device properties, error code: 3
2018-08-14 18:13:56.831285: E tensorflow/core/grappler/clusters/utils.cc:82] Failed to get device properties, error code: 3
2018-08-14 18:13:56.902612: E tensorflow/core/grappler/clusters/utils.cc:82] Failed to get device properties, error code: 3
2018-08-14 18:13:57.654653: E tensorflow/stream_executor/cuda/cuda_driver.cc:1227] failed to enqueue async memcpy from host to device: CUDA_ERROR_NOT_INITIALIZED; GPU dst: 0x1085d87f000; host src: 0x1083783f700; size: 4=0x4
2018-08-14 18:13:57.660200: E tensorflow/stream_executor/cuda/cuda_driver.cc:1227] failed to enqueue async memcpy from host to device: CUDA_ERROR_NOT_INITIALIZED; GPU dst: 0x1085d87f000; host src: 0x1083783f700; size: 4=0x4
2018-08-14 18:13:57.758658: E tensorflow/stream_executor/cuda/cuda_driver.cc:1227] failed to enqueue async memcpy from host to device: CUDA_ERROR_NOT_INITIALIZED; GPU dst: 0x1085d87f000; host src: 0x1083783f700; size: 4=0x4
2018-08-14 18:13:57.808281: E tensorflow/stream_executor/cuda/cuda_driver.cc:1227] failed to enqueue async memcpy from host to device: CUDA_ERROR_NOT_INITIALIZED; GPU dst: 0x1085d87f000; host src: 0x1083783f700; size: 4=0x4

谁能告诉我为什么会出现这样的错误?应该更改我的代码中的哪些内容才能获得我想要的内容?

谢谢!

最佳答案

我建议看一下 tf.contrib.distribute ,它有一个很好的 API,可以从多个 GPU 获得良好的性能。

关于python - 在多进程中使用一张图运行 tensorflow,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51851916/

相关文章:

python - 为直方图添加自定义摘要 protobuf

python - tensorflow :过滤器与内核和步幅

c++ - uint64_t 写入 32 位机器

python - 同时训练两个模型

python - 检查 psycopg2/postgresql 中是否存在 python 值列表

python - 动态大小多维数组

python - 如何在 Sqlalchemy 中使用 not in 和两个字段创建子查询?

Tensorflow 使用 Kubernetes 为 OOMKilled 或 Evicted pod 提供服务

python - 在 Python 中划分大文件以进行多处理的最佳方法是什么?

python - Sqlalchemy 返回元组