python - 为什么多线程和不同的函数/范围共享单个导入过程

标签 python multithreading python-3.x python-import python-2.x

自从我几年前使用 python 以来,这个陷阱是第一个很难发现的错误。

让我展示一个过于简单的示例,我有这个文件/目录:

[xiaobai@xiaobai import_pitfall]$ tree -F -C -a
.
├── import_all_pitall/
│   ├── hello.py
│   └── __init__.py
└── thread_test.py

1 directory, 3 files
[xiaobai@xiaobai import_pitfall]$

thread_test.py的内容:

[xiaobai@xiaobai import_pitfall]$ cat thread_test.py 
import time
import threading

def do_import1():
    print( "do_import 1A" )
    from import_all_pitall import hello
    print( "do_import 1B", id(hello), locals() )

def do_import2():
    print( "do_import 2A" )
    from import_all_pitall import hello as h
    print( "do_import 2B", id(h), locals() )

def do_import3():
    print( "do_import 3A" )
    import import_all_pitall.hello as h2
    #no problem if import different module #import urllib as h2
    print( "do_import 3B", id(h2), locals() )

print( "main 1" )
t = threading.Thread(target=do_import1)
print( "main 2" )
t.start()
print( "main 3" )
t2 = threading.Thread(target=do_import2)
print( "main 4" )
t2.start()
print( "main 5" )
print(globals()) #no such hello
#time.sleep(2) #slightly wait for do_import 1A import finished to test print hello below.
#print( "main 6", id(hello), locals() ) #"name 'hello' not defined" error even do_import1 was success
do_import3()
print( "main -1" )
[xiaobai@xiaobai import_pitfall]$

hello.py的内容:

[xiaobai@xiaobai import_pitfall]$ cat import_all_pitall/hello.py
print( "haha0" )
import time
t = time.time()
print( "haha1" )
def do_task():
    success = 0
    while not success:
        try:
            time.sleep(1)
            undefined_func( "Done haha" )
            success = 1
        except Exception as e:
            print("exception occur", e)
            print( "haha time is ", t )
do_task()
print( "haha -1" )
[xiaobai@xiaobai import_pitfall]$

而 import_all_pitall/init.py 是一个空文件。

让我们运行它:

[xiaobai@xiaobai import_pitfall]$ python thread_test.py 
main 1
main 2
do_import 1A
 main 3
haha0
haha1
main 4
do_import 2A
main 5
{'do_import1': <function do_import1 at 0x7f9d884760c8>, 'do_import3': <function do_import3 at 0x7f9d884a6758>, 'do_import2': <function do_import2 at 0x7f9d884a66e0>, '__builtins__': <module '__builtin__' (built-in)>, '__file__': 'thread_test.py', 't2': <Thread(Thread-2, started 140314429765376)>, '__package__': None, 'threading': <module 'threading' from '/usr/lib64/python2.7/threading.pyc'>, 't': <Thread(Thread-1, started 140314438158080)>, 'time': <module 'time' from '/usr/lib64/python2.7/lib-dynload/timemodule.so'>, '__name__': '__main__', '__doc__': None}
do_import 3A
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
^C('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
^C('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
^C^C('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
... #Forever

仔细看,“do_import 2B”和“do_import 3B”在哪里?它只是卡在导入指令上,甚至没有进入导入的第一行,因为只有一次 time.time() 将被运行。它挂起只是因为第一次在“未完成”循环状态下的另一个线程/函数上导入相同的模块。我的整个系统很大并且是多线程的,在我了解情况之前调试起来非常困难。

在 hello.py 中注释掉 '#undefined_func( "Done haha​​")' 后:

print( "haha0" )
import time
t = time.time()
print( "haha1" )
def do_task():
    success = 0
    while not success:
        try:
            time.sleep(1)
            #undefined_func( "Done haha" )
            success = 1
        except Exception as e:
            print("exception occur", e)
            print( "haha time is ", t )
do_task()
print( "haha -1" )

并运行它:

[xiaobai@xiaobai import_pitfall]$ python3 thread_test.py 
main 1
main 2
do_import 1A
main 3
main 4
do_import 2A
main 5
{'do_import3': <function do_import3 at 0x7f31a462c048>, '__package__': None, 't2': <Thread(Thread-2, started 139851179529984)>, '__name__': '__main__', '__cached__': None, 'threading': <module 'threading' from '/usr/lib64/python3.4/threading.py'>, '__doc__': None, 'do_import2': <function do_import2 at 0x7f31ac1d56a8>, 'do_import1': <function do_import1 at 0x7f31ac2c0bf8>, '__spec__': None, 't': <Thread(Thread-1, started 139851187922688)>, '__file__': 'thread_test.py', 'time': <module 'time' from '/usr/lib64/python3.4/lib-dynload/time.cpython-34m.so'>, '__loader__': <_frozen_importlib.SourceFileLoader object at 0x7f31ac297048>, '__builtins__': <module 'builtins' (built-in)>}
do_import 3A
haha0
haha1
haha -1
do_import 1B 139851188124312 {'hello': <module 'import_all_pitall.hello' from '/home/xiaobai/note/python/import_pitfall/import_all_pitall/hello.py'>}
do_import 2B 139851188124312 {'h': <module 'import_all_pitall.hello' from '/home/xiaobai/note/python/import_pitfall/import_all_pitall/hello.py'>}
do_import 3B 139851188124312 {'h2': <module 'import_all_pitall.hello' from '/home/xiaobai/note/python/import_pitfall/import_all_pitall/hello.py'>}
main -1
[xiaobai@xiaobai import_pitfall]$ 

我打印了 id 并发现它们都共享相同的 id 139851188124312。因此 3 个函数共享相同的导入对象/进程。但这对我来说没有意义,我认为对象是函数的本地对象,因为如果我尝试在全局范围内打印导入的“hello”对象,它将抛出错误:

编辑 thread_test.py 以在全局范围内打印 hello 对象:

...
print( "main 5" )
print(globals()) #no such hello
time.sleep(2) #slightly wait for do_import 1A import finished to test print hello below.
print( "main 6", id(hello), locals() ) #"name 'hello' not defined" error even do_import1 was success
do_import3()
print( "main -1" )

让我们运行它:

[xiaobai@xiaobai import_pitfall]$ python3 thread_test.py 
main 1
main 2
do_import 1A
main 3
main 4
do_import 2A
main 5
{'t': <Thread(Thread-1, started 140404878976768)>, '__spec__': None, 'time': <module 'time' from '/usr/lib64/python3.4/lib-dynload/time.cpython-34m.so'>, '__cached__': None, '__loader__': <_frozen_importlib.SourceFileLoader object at 0x7fb296b87048>, 'do_import2': <function do_import2 at 0x7fb296ac56a8>, 'do_import1': <function do_import1 at 0x7fb296bb0bf8>, '__doc__': None, '__file__': 'thread_test.py', 'do_import3': <function do_import3 at 0x7fb28ef19f28>, 't2': <Thread(Thread-2, started 140404870584064)>, '__name__': '__main__', '__package__': None, '__builtins__': <module 'builtins' (built-in)>, 'threading': <module 'threading' from '/usr/lib64/python3.4/threading.py'>}
haha0
haha1
haha -1
do_import 1B 140404879178392 {'hello': <module 'import_all_pitall.hello' from '/home/xiaobai/note/python/import_pitfall/import_all_pitall/hello.py'>}
do_import 2B 140404879178392 {'h': <module 'import_all_pitall.hello' from '/home/xiaobai/note/python/import_pitfall/import_all_pitall/hello.py'>}
Traceback (most recent call last):
  File "thread_test.py", line 31, in <module>
    print( "main 6", id(hello), locals() ) #"name 'hello' not defined" error even do_import1 was success
NameError: name 'hello' is not defined
[xiaobai@xiaobai import_pitfall]$ 

hello 不是全局的,但为什么它可以被不同函数中的不同线程共享?为什么 python 不允许唯一的本地导入?为什么Python共享导入过程,并且仅仅因为一个线程在导入过程中挂起,它就让所有其他线程无缘无故地“等待”?

最佳答案

回答其中一个问题 -

I print the id and figure they all share the same id 140589697897480. So 3 functions share the same import object/process.

是的,当您导入模块时,python 会导入模块对象并将其缓存在 sys.modules 中。然后,对于该模块的任何后续导入,python 从 sys.modules 获取模块对象并返回该对象,它不会再次导入。

对于同一问题的第二部分 -

But this doesn't make sense to me, i though object is local to the function, because if i try to print imported "hello" object on global scope, it will throw error

好吧,sys.modules 不是本地的,但是名称 hello 是该函数的本地名称。如上所述,如果你尝试再次导入该模块,python将首先查找 sys.modules 来查看它是否已经导入,如果包含该模块则返回,否则导入它并添加到sys.modules


对于第一个程序,当导入 python 模块时,它从顶层运行,在您的 hello.py 中,您有一个无限循环 - while 1: ,因为 1 始终为真。因此导入永远不会完成。

如果你不想无限循环运行,你应该把导入模块时不想运行的代码放在里面 -

if __name__ == '__main__':

上面if语句里面的代码只会运行,如果直接运行脚本,导入模块时不会运行。


我猜当你说 -

After i comment out the '#undefined_func( "Done haha" )' in hello.py

您实际上注释掉了完整的无限循环,因此导入成功。

关于python - 为什么多线程和不同的函数/范围共享单个导入过程,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31982561/

相关文章:

python - 如何将多条记录添加到谷歌应用引擎数据存储区

python - 在项目本地安装 Python 依赖

python - 与其按键读取字典,不如按值读取 (mydict[key] = value >>> mydict[value] = key)

python - 使用Django Rest Framework时如何将字段数据类型信息传递给前端?

java - 线程池中要创建多少个线程

java - 有没有办法设置两个或多个事件派发线程(EDT)?

multithreading - Tcl线程: How to access global variables in thread

python - Collat​​z Conjecture Python - 超过 2 万亿的错误输出(仅限!)

python-3.x - 期望最大化算法(高斯混合模型): ValueError: the input matrix must be positive semidefinite

python - 在列表中的对字符串之前用数字替换 Pandas 中的字符串