linux - 解释修补/保护 POP SS 后跟#BP 中断 (INT3) 的 Linux 提交消息

引用CVE-2018-8897 (与 CVE-2018-1087 相关)，描述如下:

A statement in the System Programming Guide of the Intel 64 and IA-32 Architectures Software Developer's Manual (SDM) was mishandled in the development of some or all operating-system kernels, resulting in unexpected behavior for #DB exceptions that are deferred by MOV SS or POP SS, as demonstrated by (for example) privilege escalation in Windows, macOS, some Xen configurations, or FreeBSD, or a Linux kernel crash. The MOV to SS and POP SS instructions inhibit interrupts (including NMIs), data breakpoints, and single step trap exceptions until the instruction boundary following the next instruction (SDM Vol. 3A; section 6.8.3). (The inhibited data breakpoints are those on memory accessed by the MOV to SS or POP to SS instruction itself.) Note that debug exceptions are not inhibited by the interrupt enable (EFLAGS.IF) system flag (SDM Vol. 3A; section 2.3). If the instruction following the MOV to SS or POP to SS instruction is an instruction like SYSCALL, SYSENTER, INT 3, etc. that transfers control to the operating system at CPL < 3, the debug exception is delivered after the transfer to CPL < 3 is complete. OS kernels may not expect this order of events and may therefore experience unexpected behavior when it occurs.

阅读时this related git commit to the Linux kernel ，我注意到提交消息指出:

x86/entry/64: Don't use IST entry for #BP stack

There's nothing IST-worthy about #BP/int3. We don't allow kprobes in the small handful of places in the kernel that run at CPL0 with an invalid stack, and 32-bit kernels have used normal interrupt gates for #BP forever.

Furthermore, we don't allow kprobes in places that have usergs while in kernel mode, so "paranoid" is also unnecessary.

鉴于漏洞，我试图理解提交消息中的最后一句话/段落。我知道 IST 条目指的是中断堆栈表中可用于处理中断的(据称)“已知良好”堆栈指针之一。我还了解到 #BP 指的是断点异常(相当于 INT3)，而 kprobes 是一种调试机制，据称只在内核的 ring 0 的几个地方运行(CPL0) 权限级别。

但我在下一部分完全迷失了，这可能是因为“usergs”是一个错字，我只是错过了预期的内容:

Furthermore, we don't allow kprobes in places that have usergs while in kernel mode, so "paranoid" is also unnecessary.

这句话是什么意思？

最佳答案

usergs 指的是 x86-64 swapgs instruction ，交换 gs使用内部保存的 GS 值供内核从系统调用入口点查找内核堆栈。交换还交换缓存的 gsbase 段信息，而不是根据 gs 从 GDT 重新加载。值(value)本身。 ( wrgsbase 可以独立于 GDT/LDT 更改 GS 基数)

AMD 的设计是 syscall不更改 RSP 以指向内核堆栈，也不读/写任何内存，所以 syscall本身可以很快。但是随后您进入内核，所有寄存器都保存着它们的用户空间值。参见 Why does Windows64 use a different calling convention from all other OSes on x86-64?有关 ~2000 年内核开发人员和 AMD 架构师之间邮件列表讨论的一些链接，调整 syscall 的设计和 swapgs使其在任何 AMD64 CPU 售出之前可用。

显然跟踪 GS 当前是内核值还是用户值对于错误处理来说很棘手:没有办法说“我现在想要 kerneglgs”；你必须知道是否运行 swapgs或者不在任何错误处理路径中。唯一的指令是交换，而不是将其设置为一个与另一个。

阅读 arch/x86/entry/entry_64.S 中的评论例如https://github.com/torvalds/linux/blob/9fb71c2f230df44bdd237e9a4457849a3909017d/arch/x86/entry/entry_64.S#L1267 (来自当前的 Linux)提到了 usergs，下一个评论 block 描述了做 swapgs在使用内核 gsbase 跳转到一些错误处理代码之前。

IIRC，Linux 内核 [gs:0]在该线程的内核堆栈的最低地址处保存一个线程信息 block 。该 block 包括内核堆栈指针(作为绝对地址，与 gs 无关)。

如果这个错误基本上是在欺骗内核加载内核，我不会感到惊讶 rsp来自用户控制的 gsbase，或以其他方式搞砸 swapgs 的航位推算所以它有错误gs在某个时候。

关于linux - 解释修补/保护 POP SS 后跟#BP 中断 (INT3) 的 Linux 提交消息，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50286277/

linux - 解释修补/保护 POP SS 后跟#BP 中断 (INT3) 的 Linux 提交消息

上一篇：linux - 绑定(bind)到快捷键的 xdotool 命令不起作用

下一篇：可以在 Linux 的用户空间中实现 native 代码的抢占式多任务处理吗？