此回溯来自多线程应用程序中的死锁情况。 其他死锁线程在调用 malloc() 时锁定,并出现 等待这个线程。
我不明白是什么创建了这个线程,因为它在调用之前死锁了 我的应用程序中的任何功能:
Thread 6 (Thread 0x7ff69d43a700 (LWP 14191)):
#0 0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1 0x00007ff6a299460d in _L_lock_27 () from /usr/lib64/libc.so.6
#2 0x00007ff6a29945bd in arena_thread_freeres () from /usr/lib64/libc.so.6
#3 0x00007ff6a2994662 in __libc_thread_freeres () from /usr/lib64/libc.so.6
#4 0x00007ff6a3875e38 in start_thread () from /usr/lib64/libpthread.so.0
#5 0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6
clone() 用于实现 fork()、pthread_create() 以及其他功能。参见 here和 here .
如何确定此跟踪是来自 fork()
、pthread_create()
、信号处理程序还是其他?我只需要深入了解 glibc 代码,还是可以使用 gdb 或其他工具?为什么这个线程需要内部 glibc 锁?这将有助于确定死锁的原因。
其他信息和研究:
malloc() 是线程安全的,但不是可重入的(递归安全的)(参见 this 和 this ,所以 malloc() 也不是异步信号安全的。我们没有为这个过程,所以我知道我们不从信号处理程序调用 malloc()。死锁线程永远不会调用递归函数,并且回调在新线程中处理,所以我认为我们不需要担心关于这里的重入。(也许我错了?)
这种死锁发生在许多回调被生成以通知(最终终止)不同的进程时。回调在它们自己的线程中产生。
我们是否可能以不安全的方式使用 malloc?
可能相关:
Malloc inside of signal handler causes deadlock.
How are signal handlers delivered in a multi-threaded application?
glibc fork/malloc deadlock bug这已在 glibc-2.17-162.el7 中修复。这看起来很相似,但这不是我的错误 - 我使用的是固定版本的 glibc。
(我一直未能成功创建一个最小的、完整的、可验证的示例。不幸的是,唯一的重现方法是使用应用程序 (Slurm),而且重现相当困难。)
编辑: 这是所有线程的回溯。线程 6 是我最初发布的跟踪。线程 1 只是在等待 pthread_join()。线程 2-5 在调用 malloc() 后被锁定。线程 7 正在监听消息并在新线程(线程 2-5)中生成回调。这些将是最终会向其他进程发出信号的回调。
Thread 7 (Thread 0x7ff69e672700 (LWP 12650)):
#0 0x00007ff6a291aa3d in poll () from /usr/lib64/libc.so.6
#1 0x00007ff6a3c09064 in _poll_internal (shutdown_time=<optimized out>, nfds=2,
pfds=0x7ff6980009f0) at ../../../../slurm/src/common/eio.c:364
#2 eio_handle_mainloop (eio=0xf1a970) at ../../../../slurm/src/common/eio.c:328
#3 0x000000000041ce78 in _msg_thr_internal (job_arg=0xf07760)
at ../../../../../slurm/src/slurmd/slurmstepd/req.c:245
#4 0x00007ff6a3875e25 in start_thread () from /usr/lib64/libpthread.so.0
#5 0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6
Thread 6 (Thread 0x7ff69d43a700 (LWP 14191)):
#0 0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1 0x00007ff6a299460d in _L_lock_27 () from /usr/lib64/libc.so.6
#2 0x00007ff6a29945bd in arena_thread_freeres () from /usr/lib64/libc.so.6
#3 0x00007ff6a2994662 in __libc_thread_freeres () from /usr/lib64/libc.so.6
#4 0x00007ff6a3875e38 in start_thread () from /usr/lib64/libpthread.so.0
#5 0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6
Thread 5 (Thread 0x7ff69e773700 (LWP 22471)):
#0 0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1 0x00007ff6a28af7d8 in _L_lock_1579 () from /usr/lib64/libc.so.6
#2 0x00007ff6a28a7ca0 in arena_get2.isra.3 () from /usr/lib64/libc.so.6
#3 0x00007ff6a28ad0fe in malloc () from /usr/lib64/libc.so.6
#4 0x00007ff6a3c02e60 in slurm_xmalloc (size=size@entry=24, clear=clear@entry=false,
file=file@entry=0x7ff6a3c1f1f0 "../../../../slurm/src/common/pack.c",
line=line@entry=152, func=func@entry=0x7ff6a3c1f4a6 <__func__.7843> "init_buf")
at ../../../../slurm/src/common/xmalloc.c:86
#5 0x00007ff6a3b2e5b7 in init_buf (size=16384)
at ../../../../slurm/src/common/pack.c:152
#6 0x000000000041caab in _handle_accept (arg=0x0)
at ../../../../../slurm/src/slurmd/slurmstepd/req.c:384
#7 0x00007ff6a3875e25 in start_thread () from /usr/lib64/libpthread.so.0
#8 0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6
Thread 4 (Thread 0x7ff6a4086700 (LWP 5633)):
#0 0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1 0x00007ff6a28af7d8 in _L_lock_1579 () from /usr/lib64/libc.so.6
#2 0x00007ff6a28a7ca0 in arena_get2.isra.3 () from /usr/lib64/libc.so.6
#3 0x00007ff6a28ad0fe in malloc () from /usr/lib64/libc.so.6
#4 0x00007ff6a3c02e60 in slurm_xmalloc (size=size@entry=24, clear=clear@entry=false,
file=file@entry=0x7ff6a3c1f1f0 "../../../../slurm/src/common/pack.c",
line=line@entry=152, func=func@entry=0x7ff6a3c1f4a6 <__func__.7843> "init_buf")
at ../../../../slurm/src/common/xmalloc.c:86
#5 0x00007ff6a3b2e5b7 in init_buf (size=16384)
at ../../../../slurm/src/common/pack.c:152
#6 0x000000000041caab in _handle_accept (arg=0x0)
at ../../../../../slurm/src/slurmd/slurmstepd/req.c:384
#7 0x00007ff6a3875e25 in start_thread () from /usr/lib64/libpthread.so.0
#8 0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6
Thread 3 (Thread 0x7ff69d53b700 (LWP 12963)):
#0 0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1 0x00007ff6a28af7d8 in _L_lock_1579 () from /usr/lib64/libc.so.6
#2 0x00007ff6a28a7ca0 in arena_get2.isra.3 () from /usr/lib64/libc.so.6
#3 0x00007ff6a28ad0fe in malloc () from /usr/lib64/libc.so.6
#4 0x00007ff6a3c02e60 in slurm_xmalloc (size=size@entry=24, clear=clear@entry=false,
file=file@entry=0x7ff6a3c1f1f0 "../../../../slurm/src/common/pack.c",
line=line@entry=152, func=func@entry=0x7ff6a3c1f4a6 <__func__.7843> "init_buf")
at ../../../../slurm/src/common/xmalloc.c:86
#5 0x00007ff6a3b2e5b7 in init_buf (size=16384)
at ../../../../slurm/src/common/pack.c:152
#6 0x000000000041caab in _handle_accept (arg=0x0)
at ../../../../../slurm/src/slurmd/slurmstepd/req.c:384
#7 0x00007ff6a3875e25 in start_thread () from /usr/lib64/libpthread.so.0
#8 0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6
Thread 2 (Thread 0x7ff69f182700 (LWP 19734)):
#0 0x00007ff6a2932eec in __lll_lock_wait_private () from /usr/lib64/libc.so.6
#1 0x00007ff6a28af7d8 in _L_lock_1579 () from /usr/lib64/libc.so.6
#2 0x00007ff6a28a7ca0 in arena_get2.isra.3 () from /usr/lib64/libc.so.6
#3 0x00007ff6a28ad0fe in malloc () from /usr/lib64/libc.so.6
#4 0x00007ff6a3c02e60 in slurm_xmalloc (size=size@entry=24, clear=clear@entry=false,
file=file@entry=0x7ff6a3c1f1f0 "../../../../slurm/src/common/pack.c",
line=line@entry=152, func=func@entry=0x7ff6a3c1f4a6 <__func__.7843> "init_buf")
at ../../../../slurm/src/common/xmalloc.c:86
#5 0x00007ff6a3b2e5b7 in init_buf (size=16384)
at ../../../../slurm/src/common/pack.c:152
#6 0x000000000041caab in _handle_accept (arg=0x0)
at ../../../../../slurm/src/slurmd/slurmstepd/req.c:384
#7 0x00007ff6a3875e25 in start_thread () from /usr/lib64/libpthread.so.0
#8 0x00007ff6a292534d in clone () from /usr/lib64/libc.so.6
Thread 1 (Thread 0x7ff6a4088880 (LWP 12616)):
#0 0x00007ff6a3876f57 in pthread_join () from /usr/lib64/libpthread.so.0
#1 0x000000000041084a in _wait_for_io (job=0xf07760)
at ../../../../../slurm/src/slurmd/slurmstepd/mgr.c:2219
#2 job_manager (job=job@entry=0xf07760)
at ../../../../../slurm/src/slurmd/slurmstepd/mgr.c:1397
#3 0x000000000040ca07 in main (argc=1, argv=0x7fffacab93d8)
at ../../../../../slurm/src/slurmd/slurmstepd/slurmstepd.c:172
最佳答案
回溯中存在 start_thread()
表明这是一个 pthread_create()
线程。
__libc_thread_freeres()
是 glibc 在线程退出时调用的一个函数,它调用一组回调来释放每个线程的内部状态。这表示您突出显示的线程正在退出。
arena_thread_freeres()
是其中一个回调。它用于 malloc arena 分配器,它将空闲列表从退出线程的私有(private) arena 移动到全局空闲列表。为此,它必须使用保护全局空闲列表的锁(这是 arena.c
中的 list_lock
)。
突出显示的线程(线程 6)似乎是在这个锁上被阻塞的。
arena 分配器安装 pthread_atfork()
处理程序,在 fork()
处理开始时锁定列表锁,并在结束时解锁。这意味着当 other pthread_atfork()
处理程序正在运行时,所有其他线程将阻塞此锁。
您要安装自己的 pthread_atfork()
处理程序吗?似乎其中之一可能会导致您的僵局。
关于c - 我怎样才能找到源自 clone() 的 glibc 回溯的来源?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49701888/