gcc - x86_64 : Is it possible to "in-line substitute" PLT/GOT references?

标签 gcc assembly x86-64 ld elf

我不确定这个问题的主题行是什么,但我们开始吧:

为了强制代码关键部分的代码局部性/紧凑性,我正在寻找一种通过“跳转槽”(ELF R_X86_64_JUMP_SLOT 重定位)直接在调用站点 - 链接器通常将其放入 PLT/GOT 中,但将它们直接内联在调用站点。

如果我模拟这样的调用:

#include <stdio.h>
int main(int argc, char **argv)
{
        asm ("push $1f\n\t"
             "jmp *0f\n\t"
             "0: .quad %P0\n"
             "1:\n\t"
             : : "i"(printf), "D"("Hello, World!\n"));
        return 0;
}

为了获得 64 位单词的空间,调用本身可以工作(请不要评论这是幸运的巧合,因为这违反了某些 ABI 规则 - 所有这些都不是这个问题的主题。

对于我的情况,可以通过其他方式解决/解决,我试图使这个示例保持简短)。

它创建以下程序集:

0000000000000000 <main>:
0:   bf 00 00 00 00          mov    $0x0,%edi
1: R_X86_64_32  .rodata.str1.1
5:   68 00 00 00 00          pushq  $0x0
6: R_X86_64_32  .text+0x19
a:   ff 24 25 00 00 00 00    jmpq   *0x0
d: R_X86_64_32S .text+0x11
...
11: R_X86_64_64 printf
19:   31 c0                   xor    %eax,%eax
1b:   c3                      retq

But (due to using printf as the immediate, I guess ... ?) the target address here is still that of the PLT hook - the same R_X86_64_64 reloc. Linking the object file against libc into an actual executable results in:

0000000000400428 <printf@plt>:
  400428:       ff 25 92 04 10 00       jmpq   *1049746(%rip)        # 5008c0 <_GLOBAL_OFFSET_TABLE_+0x20>
[ ... ]
0000000000400500 <main>:
  400500:       bf 0c 06 40 00          mov    $0x40060c,%edi
  400505:       68 19 05 40 00          pushq  $0x400519
  40050a:       ff 24 25 11 05 40 00    jmpq   *0x400511
  400511:       [ .quad 400428 ]
  400519:       31 c0                   xorl   %eax, %eax
  40051b:       c3                      retq
[ ... ]
DYNAMIC RELOCATION RECORDS
OFFSET           TYPE              VALUE
[ ... ]
00000000005008c0 R_X86_64_JUMP_SLOT  printf

I.e. this still gives the two-step redirection, first transfer execution to the PLT hook, then jump into the library entry point.

Is there a way how I can instruct the compiler/assembler/linker to - in this example - "inline" the jump slot target at address 0x400511?

I.e. replace the "local" (resolved at program link time by ld) R_X86_64_64 reloc with the "remote" (resolved at program load time by ld.so) R_X86_64_JUMP_SLOT one (and force non-lazy-load for this section of code) ? Maybe linker mapfiles might make this possible - if so, how?

Edit:
To make this clear, the question is about how to achieve this in a dynamically-linked executable / for an external function that's only available in a dynamic library. Yes, it's true static linking resolves this in a simpler way, but:

  • There are systems (like Solaris) where static libraries are generally not shipped by the vendor
  • There are libraries that aren't available as either source code or static versions

Hence static linking is not helpful here :(

Edit2:
I've found that in some architectures (SPARC, noticeably, see section on SPARC relocations in the GNU as manual), GNU is able to create certain types of relocation references for the linker in-place using modifiers. The quoted SPARC one would use %gdop(symbolname) to make the assembler emit instructions to the linker stating "create that relocation right here". Intel's assembler on Itanium knows the @fptr(symbol) link-relocation operator for the same kind of thing (see also section 4 in the Itanium psABI). But does an equivalent mechanism - something to instruct the assembler to emit a specific linker relocation type at a specific position in the code - exist for x86_64?

I've also found that the GNU assembler has a .reloc directive which supposedly is to be used for this purpose; still, if I try:

#include <stdio.h>
int main(int argc, char **argv)
{
        asm ("push %%rax\n\t"
             "lea 1f(%%rip), %%rax\n\t"
             "xchg %%rax, (%rsp)\n\t"
             "jmp *0f\n\t"
             ".reloc 0f, R_X86_64_JUMP_SLOT, printf\n\t"
             "0: .quad 0\n"
             "1:\n\t"
             : : "D"("Hello, World!\n"));
        return 0;
}

我从链接器收到错误(请注意7 == R_X86_64_JUMP_SLOT):

error: /tmp/cc6BUEZh.o: unexpected reloc 7 in object file
汇编器创建一个目标文件,其中 readelf 表示:
Relocation section '.rela.text.startup' at offset 0x5e8 contains 2 entries:
Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000000001  000000050000000a R_X86_64_32            0000000000000000 .rodata.str1.1 + 0
0000000000000017  0000000b00000007 R_X86_64_JUMP_SLOT     0000000000000000 printf + 0

这就是我想要的 - 但链接器不接受它。
链接器确实接受仅使用R_X86_64_64代替上面的内容;这样做会创建与第一种情况相同类型的二进制文件...重定向到 printf@plt,而不是“已解析”的二进制文件。

最佳答案

此优化已在 GCC 中实现。可以通过 -fno-plt option 启用它和 noplt function attribute :

Do not use the PLT for external function calls in position-independent code. Instead, load the callee address at call sites from the GOT and branch to it. This leads to more efficient code by eliminating PLT stubs and exposing GOT loads to optimizations. On architectures such as 32-bit x86 where PLT stubs expect the GOT pointer in a specific register, this gives more register allocation freedom to the compiler. Lazy binding requires use of the PLT; with -fno-plt all external symbols are resolved at load time.

Alternatively, the function attribute noplt can be used to avoid calls through the PLT for specific external functions.

In position-dependent code, a few targets also convert calls to functions that are marked to not use the PLT to use the GOT instead.

关于gcc - x86_64 : Is it possible to "in-line substitute" PLT/GOT references?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10849308/

相关文章:

c++ - 当我的程序退出时静态数据被初始化

gcc - 如何编译和运行 GCC 4.9.x?

linux - 带 vs 不带 $ 的 x64 汇编文字

c - 单个 sqrt() 的运行速度如何比放入 for 循环时慢两倍

assembly - 为什么 Clang 只从 Sandy Bridge 开始做这个优化技巧?

c++ - 视觉C++ : buggy towupper

Python C 绑定(bind)错误

assembly - 将 16 位添加到 64 位寄存器

assembly - VLD1 中的对齐

计算克隆函数的可变参数数量