我可以排除SIGBUS由 "minor page fault"引发吗？ (内核日志没有分配失败)

动机

我正在努力提高对SIGBUS error in Xwayland的理解。自2018年2月20日左右以来，Fedora Linux的一些用户已经看到了这一点，这些用户使用Xwayland 1.19.6-5.fc27.x86_64和Linux内核4.15.3-300.fc27.x86-64。

可悲的是我没有kernel "segfault" log message(或SIGBUS的等效项)。 Xwayland有一些毫无意义的代码，可以捕获致命信号。但是我可以通过调试coredump看到 siginfo ，这似乎差不多。

定义

我了解当虚拟内存的页面在RAM中不可用并且必须从磁盘读取时会发生“主要页面错误”。我想我对ext4文件系统支持的页面特别感兴趣(例如，无法直接访问块设备)。

因此，当不需要磁盘访问时就是“较小的页面错误”。我认为差异是相当明确的，因为Linux公开了主要和次要页面错误的计数器。

我的问题

如果内核发送了一个SIGBUS程序，我想知道我是否通常应该认为这将是主要的页面错误。

根据核心转储和反汇编，程序在接收到SIGBUS时正在读取内存，而不是对其进行写入。 siginfo->si_addr中的错误地址在映射的系统可执行文件内，该文件不能由用户写入，并且该地址似乎在当前文件长度的范围内。实际上，在调试coredump时，我已经从内存地址中读取了非常令人信服的值。似乎coredump生成过程在读取此地址时没有困难:-(。

我也有信心排除“无效的地址对齐”情况(BUS_ADRALN)，因为siginfo->si_code为2，即BUS_ADRERR，“不存在的物理地址”。同样是因为我使用的是x86，在大多数情况下它允许未对齐的访问，并且陷阱不在任何SSE扩展指令中。

我考虑了内核在处理确定为“次要”的页面错误时通常负责的工作。我想小故障可能无法分配内存，从而引发SIGBUS。但是，我相信我会注意到这样的分配失败:

我有很多免费的交换操作可以将用户页面逐出，并且我没有注意到系统开始交换时通常发生的明显减速。崩溃发生在笔记本电脑从挂起状态变为ram状态后的几秒钟内，即使以大约100MB/s的速度，它也不足以填满8GB的交换空间。
我也没有看到可怕的内存不足(OOM)杀手出现在内核日志中，就像我期望的那样，如果内核未能成功分配页面框架或页面表。

还有其他可能性，一个轻微的页面错误可能会失败并导致SIGBUS？ IE。在内核日志中查找错误时，有什么我不会注意到的原因吗？哪个会很快发作？

同样，多个核心转储将其显示为页面错误，这是通过从文件系统上的映射文件读取而触发的。

别有用心

我真的很想念一个小页面错误的情况。因为这令人恐惧的反面是，我不明白这种SIGBUS可能是由硬页面错误的一面引起的。从几个月前开始，我们中的一些用户有非常相似的错误。我的内核日志中没有IO错误。在正常操作期间，读取指示的文件时没有IO错误。在rpm --verify --all上运行或在HDD上运行扩展的SMART测试时，我没有任何错误。不幸的是，我似乎很少有犯罪嫌疑人。我最怀疑的是内核升级，显然我不希望排除该升级。日期并不能完全证明这一点，但是还不能完全排除它。日期最近的是今年的微代码更新。这似乎更难确定。

轻微页面错误的已知原因

从逻辑上讲，当实现MAP_PRIVATE映射的写时复制时，听起来好像会出现较小的页面错误。

它还应该包括/dev/zero或MAP_ANONYMOUS上的读取错误，假设内核将not将其实现为reading a shared zero page且未实现以立即为整个映射分配页面。

但更一般而言，它可以是对页面的任何首次访问。这是因为似乎通常按需填充用于内存映射的页表。 (这将由页面错误来完成，并且如果文件页面已在高速缓存中，则只会是次要的页面错误)。

MAP_NONBLOCK (since Linux 2.5.46)

This flag is meaningful only in conjunction with MAP_POPULATE. Don't perform read-ahead: create page tables entries only for pages that are already present in RAM. Since Linux 2.6.23, this flag causes MAP_POPULATE to do nothing. One day, the combina‐ tion of MAP_POPULATE and MAP_NONBLOCK may be reimplemented.

编辑:详细摘录以上的进一步摘录

评论者要求提供更具体的细节，以澄清错误的地址和说明。初始链接https://bugzilla.redhat.com/show_bug.cgi?id=1557682中有很多摘录

故障根据错误链接中的描述而变化。这是最近事例的摘录。

$ gdb 2018-03-21.core
...
Core was generated by `/usr/bin/Xwayland :0 -rootless -terminate -core -listen 4 -listen 5 -displayfd'.
Program terminated with signal SIGBUS, Bus error.
#0  _dl_fixup (l=0x7fc0be2e0130, reloc_arg=203) at ../elf/dl-runtime.c:73
73    const ElfW(Sym) *sym = &symtab[ELFW(R_SYM) (reloc->r_info)];
[Current thread is 1 (Thread 0x7fc0be29fa80 (LWP 1918))]
(gdb) p $_siginfo.si_signum
$1 = 7
(gdb) p $_siginfo.si_code
$2 = 2
(gdb) p $_siginfo._sifields._sigfault.si_addr
$3 = (void *) 0x41bd80
(gdb) disassemble
Dump of assembler code for function _dl_fixup:
   0x00007fc0be0c8bd0 <+0>: push   %rbx
   0x00007fc0be0c8bd1 <+1>: mov    %rdi,%r10
   0x00007fc0be0c8bd4 <+4>: mov    %esi,%esi
   0x00007fc0be0c8bd6 <+6>: lea    (%rsi,%rsi,2),%rdx
   0x00007fc0be0c8bda <+10>:    sub    $0x10,%rsp
   0x00007fc0be0c8bde <+14>:    mov    0x68(%rdi),%rax
   0x00007fc0be0c8be2 <+18>:    mov    0x8(%rax),%rdi
   0x00007fc0be0c8be6 <+22>:    mov    0xf8(%r10),%rax
   0x00007fc0be0c8bed <+29>:    mov    0x8(%rax),%rax
   0x00007fc0be0c8bf1 <+33>:    lea    (%rax,%rdx,8),%r8
   0x00007fc0be0c8bf5 <+37>:    mov    0x70(%r10),%rax
=> 0x00007fc0be0c8bf9 <+41>:    mov    0x8(%r8),%rcx
(gdb) p/x $r8
$4 = 0x41bd78
(gdb) p/x $r8 + 8
$5 = 0x41bd80

请注意，此指令将按照突出显示的源代码行获取值reloc->r_info。

(gdb) p reloc
$6 = (const Elf64_Rela * const) 0x41bd78
(gdb) p &reloc->r_info
$7 = (Elf64_Xword *) 0x41bd80
(gdb) p *reloc
$8 = {r_offset = 8443504, r_info = 936302870535, r_addend = 0}

错误地址位于下面的文本映射内(来自maps捕获的abrtd文件):

00400000-0060b000 r-xp 00000000 fd:00 1708508                            /usr/bin/Xwayland
0080a000-0080d000 r--p 0020a000 fd:00 1708508                            /usr/bin/Xwayland
0080d000-00817000 rw-p 0020d000 fd:00 1708508                            /usr/bin/Xwayland

$ size -x /usr/bin/Xwayland
   text    data     bss     dec     hex filename
0x209ffb     0xbe9d 0x1f3e0 2314872  235278 /usr/bin/Xwayland

最佳答案

我当然在内核中有一些错误，除非它是内核自检中的错误。

编辑:嗯，实际上似乎其他人最近也注意到了GS自检失败，但是它已经存在于较早的内核中，并且也出现在AMD cpus上。目前似乎还没有关于如何修复它的结论。 https://lkml.org/lkml/2018/1/26/436

因此，尽管我不能排除在启用PTI之类的情况下此GS错误会导致更明显的损坏，但这并不是它本身。

$ uname -r
4.15.10-300.fc27.x86_64

$ git describe --all
heads/4.15.10
$ cat ./Documentation/x86/pti.txt
...
2. Run several copies of all of the tools/testing/selftests/x86/ tests
   (excluding MPX and protection_keys) in a loop on multiple CPUs for
   several minutes.  These tests frequently uncover corner cases in the
   kernel entry code.  In general, old kernels might cause these tests
   themselves to crash, but they should never crash the kernel.

$ cd tools/testing/selftests/x86
$ make
...

在4x终端中以匹配我的4x硬件线程:

sh -c ' while true; do for i in *; do if test -x $i; then ./$i || exit; fi ; done; done '

故障迅速出现:

[RUN]   ARCH_SET_GS(0x200000000), then schedule to 0x200000000
    Before schedule, set selector to 0x3
    other thread: ARCH_SET_GS(0x200000000) -- sel is 0x0
[FAIL]  GS/BASE changed from 0x3/0x0 to 0x0/0x0

还

[RUN]   Executing 6-argument 32-bit syscall via VDSO
[WARN]  Flags before=0000000000200ed7 id 0 00 o d i s z 0 a 0 p 1 c
[WARN]  Flags  after=0000000000200682 id 0 00 d i s 0 0 1 
[WARN]  Flags change=0000000000000855 0 00 o z 0 a 0 p 0 c
[OK]    Arguments are preserved across syscall
[NOTE]  R11 has changed:0000000000200682 - assuming clobbered by SYSRET insn
[OK]    R8..R15 did not leak kernel data
[RUN]   Executing 6-argument 32-bit syscall via INT 80
[OK]    Arguments are preserved across syscall
[OK]    R8..R15 did not leak kernel data
[RUN]   Running tests under ptrace
[RUN]   Executing 6-argument 32-bit syscall via VDSO
[WARN]  Flags before=0000000000200ed7 id 0 00 o d i s z 0 a 0 p 1 c
[WARN]  Flags  after=0000000000200686 id 0 00 d i s 0 0 p 1 
[WARN]  Flags change=0000000000000851 0 00 o z 0 a 0 0 c
[OK]    Arguments are preserved across syscall
[NOTE]  R11 has changed:0000000000200686 - assuming clobbered by SYSRET insn
[OK]    R8..R15 did not leak kernel data
[RUN]   Executing 6-argument 32-bit syscall via INT 80
[OK]    Arguments are preserved across syscall
[OK]    R8..R15 did not leak kernel data
Warning: failed to find getcpu in vDSO
[RUN]   Testing getcpu...
[OK]    CPU 0: syscall: cpu 0, node 0
[OK]    CPU 1: syscall: cpu 1, node 0
[OK]    CPU 2: syscall: cpu 2, node 0
[OK]    CPU 3: syscall: cpu 3, node 0
[RUN]   Testing getcpu...
[OK]    CPU 0: syscall: cpu 0, node 0 vdso: cpu 0, node 0 vsyscall: cpu 0, node 0
[OK]    CPU 1: syscall: cpu 1, node 0 vdso: cpu 1, node 0 vsyscall: cpu 1, node 0
[OK]    CPU 2: syscall: cpu 2, node 0 vdso: cpu 2, node 0 vsyscall: cpu 2, node 0
[OK]    CPU 3: syscall: cpu 3, node 0 vdso: cpu 3, node 0 vsyscall: cpu 3, node 0
[NOTE]  failed to find getcpu in vDSO
[RUN]   test gettimeofday()
    vDSO time offsets: 0.000006 0.000000
[OK]    vDSO gettimeofday()'s timeval was okay
[RUN]   test time()
[FAIL]  vDSO returned the wrong time (1522063297 1522063296 1522063297)

关于我可以排除SIGBUS由 "minor page fault"引发吗？ (内核日志没有分配失败)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49477340/

我可以排除SIGBUS由 "minor page fault"引发吗？ (内核日志没有分配失败)

上一篇：linux - Ansible 备份文件名

下一篇：C# .net Core - 获取磁盘上的文件大小 - 跨平台解决方案