machine-learning - 在 Tensorflow 中后台运行队列会导致奇怪的异常

标签 machine-learning tensorflow

我正在 Tensorflow 中实现这样的图:有一个队列 Q,后台线程正在将张量排入队列。在主线程中,我按顺序将 Q 中的元素出队。

我的代码可以简化如下:

import time
import threading
import tensorflow as tf

sess = tf.InteractiveSession()
coord = tf.train.Coordinator()

q = tf.FIFOQueue(32, dtypes=tf.int32)

def loop(g):
    with g.as_default():
        enqueue_op = q.enqueue(1, name="example_enqueue")

        for i in range(20):
            if coord.should_stop():
                return

            try:
                sess.run(enqueue_op)
            except tf.errors.CancelledError:
                print("enqueue canncelled")

threads = [
    threading.Thread(target=loop, args=(tf.get_default_graph(),))
]

sess.run(tf.initialize_all_variables())

for t in threads: t.start()

# If I sleep 1 seconds, it will be fine!
# time.sleep(1)

print(sess.run(q.dequeue()))

coord.request_stop()
coord.join(threads)

sess.close()

我评论过,如果我在运行出队操作之前睡一秒钟,事情就会好起来。但是,如果立即运行,将引发以下异常:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 715, in _do_call
    return fn(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 697, in _run_fn
    status, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/framework/errors.py", line 450, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found

在处理上述异常的过程中,又发生了一个异常:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/hanxu/Downloads/BrainSeg/playgrounds/7.py", line 32, in <module>
    print(sess.run(q.dequeue()))
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 372, in run
    run_metadata_ptr)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 636, in _run
    feed_dict_string, options, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 708, in _do_run
    target_list, options, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 728, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found
HanXus-MacBook-Pro:BrainSeg hanxu$ python3 -m playgrounds.7
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 715, in _do_call
    return fn(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 697, in _run_fn
    status, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/framework/errors.py", line 450, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found

在处理上述异常的过程中,又发生了一个异常:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/hanxu/Downloads/BrainSeg/playgrounds/7.py", line 34, in <module>
    print(sess.run(q.dequeue()))
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 372, in run
    run_metadata_ptr)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 636, in _run
    feed_dict_string, options, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 708, in _do_run
    target_list, options, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 728, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found

有人可以帮忙吗?非常感谢!!

更新

我使用的是 Tensorflow 9.0rc0。

我的真实情况有点复杂。事实上,排队的张量每次都不同,比如

def loop(g):
    with g.as_default():
        for i in range(20):
            if coord.should_stop():
                return

            # Look here!
            enqueue_op = q.enqueue(i, name="example_enqueue")

            try:
                sess.run(enqueue_op)
            except tf.errors.CancelledError:
                print("enqueue canncelled")

因此,将入队操作移至主线程并非易事:(我不知道如何操作。请帮忙:)

最佳答案

这是 an issue使用旧版本(0.9 之前)的 TensorFlow,即 fixed在0.9版本中。问题在于,当其他线程(即您的 q.dequeue() 线程)使用该图时,向图中添加节点(即在对 q.enqueue()loop() 的调用中)不是线程安全的。

您需要修复两个问题才能避免竞争条件(在 0.9 之前的版本中):

  1. 请勿调用q.enqueue()loop()线。而是在主线程中创建它。例如:

    q = tf.FIFOQueue(32, dtypes=tf.int32)
    enqueue_op = q.enqueue(1, name="example_enqueue")
    
    def loop(g):
        for i in range(20):
            if coord.should_stop():
                return
            try:
                sess.run(enqueue_op)
            except tf.errors.CancelledError:
                print("enqueue canncelled")
    
  2. 将调用转移至 q.dequeue() (这会向图表添加一个节点)在您开始 loop() 的位置之前线程:

    dequeued_t = q.dequeue()
    
    for t in threads: t.start()
    
    print(sess.run(deqeueued_t))
    

关于machine-learning - 在 Tensorflow 中后台运行队列会导致奇怪的异常,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37797751/

相关文章:

tensorflow - 在 Windows 上从源代码构建 Tensorflow 2.0 失败 "Could not find bazel-bin"

python - PyTorch 相当于 tf.dynamic_partition

python - as_list() 未在未知 TensorShape 上定义

machine-learning - 有 NEAT 上的 Encog 文档吗?

python - 如何解决方法不可迭代的问题?

python - Keras 中拟合生成器输出的精度与手动计算的精度不同

machine-learning - 未知的底部 Blob 'data'(层 'conv1' ,底部索引 0)

python - 为什么 tflite 模型的准确性与 keras 模型如此不同?

python - 如何在 Tensorflow 中计算整个数据集的统计数据(总和、均值、方差等)

machine-learning - Convnet Tensorflow 中层的尺寸 : ValueError: Shape of a new variable (logistic_regression/weights) must be fully defined