我正在 Tensorflow 中实现这样的图:有一个队列 Q,后台线程正在将张量排入队列。在主线程中,我按顺序将 Q 中的元素出队。
我的代码可以简化如下:
import time
import threading
import tensorflow as tf
sess = tf.InteractiveSession()
coord = tf.train.Coordinator()
q = tf.FIFOQueue(32, dtypes=tf.int32)
def loop(g):
with g.as_default():
enqueue_op = q.enqueue(1, name="example_enqueue")
for i in range(20):
if coord.should_stop():
return
try:
sess.run(enqueue_op)
except tf.errors.CancelledError:
print("enqueue canncelled")
threads = [
threading.Thread(target=loop, args=(tf.get_default_graph(),))
]
sess.run(tf.initialize_all_variables())
for t in threads: t.start()
# If I sleep 1 seconds, it will be fine!
# time.sleep(1)
print(sess.run(q.dequeue()))
coord.request_stop()
coord.join(threads)
sess.close()
我评论过,如果我在运行出队操作之前睡一秒钟,事情就会好起来。但是,如果立即运行,将引发以下异常:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 715, in _do_call
return fn(*args)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 697, in _run_fn
status, run_metadata)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/contextlib.py", line 66, in __exit__
next(self.gen)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/framework/errors.py", line 450, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found
在处理上述异常的过程中,又发生了一个异常:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 170, in _run_module_as_main
"__main__", mod_spec)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/hanxu/Downloads/BrainSeg/playgrounds/7.py", line 32, in <module>
print(sess.run(q.dequeue()))
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 372, in run
run_metadata_ptr)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 636, in _run
feed_dict_string, options, run_metadata)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 708, in _do_run
target_list, options, run_metadata)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 728, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found
HanXus-MacBook-Pro:BrainSeg hanxu$ python3 -m playgrounds.7
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 715, in _do_call
return fn(*args)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 697, in _run_fn
status, run_metadata)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/contextlib.py", line 66, in __exit__
next(self.gen)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/framework/errors.py", line 450, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found
在处理上述异常的过程中,又发生了一个异常:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 170, in _run_module_as_main
"__main__", mod_spec)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/hanxu/Downloads/BrainSeg/playgrounds/7.py", line 34, in <module>
print(sess.run(q.dequeue()))
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 372, in run
run_metadata_ptr)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 636, in _run
feed_dict_string, options, run_metadata)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 708, in _do_run
target_list, options, run_metadata)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 728, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found
有人可以帮忙吗?非常感谢!!
更新
我使用的是 Tensorflow 9.0rc0。
我的真实情况有点复杂。事实上,排队的张量每次都不同,比如
def loop(g):
with g.as_default():
for i in range(20):
if coord.should_stop():
return
# Look here!
enqueue_op = q.enqueue(i, name="example_enqueue")
try:
sess.run(enqueue_op)
except tf.errors.CancelledError:
print("enqueue canncelled")
因此,将入队操作移至主线程并非易事:(我不知道如何操作。请帮忙:)
最佳答案
这是 an issue使用旧版本(0.9 之前)的 TensorFlow,即 fixed在0.9版本中。问题在于,当其他线程(即您的 q.dequeue()
线程)使用该图时,向图中添加节点(即在对 q.enqueue()
和 loop()
的调用中)不是线程安全的。
您需要修复两个问题才能避免竞争条件(在 0.9 之前的版本中):
请勿调用
q.enqueue()
在loop()
线。而是在主线程中创建它。例如:q = tf.FIFOQueue(32, dtypes=tf.int32) enqueue_op = q.enqueue(1, name="example_enqueue") def loop(g): for i in range(20): if coord.should_stop(): return try: sess.run(enqueue_op) except tf.errors.CancelledError: print("enqueue canncelled")
将调用转移至
q.dequeue()
(这会向图表添加一个节点)在您开始loop()
的位置之前线程:dequeued_t = q.dequeue() for t in threads: t.start() print(sess.run(deqeueued_t))
关于machine-learning - 在 Tensorflow 中后台运行队列会导致奇怪的异常,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37797751/