c - VMA和ELF段之间的关系

标签 c unix memory execution elf

我需要确定 ELF 可执行文件的可加载段的 VMA。 VMA 可以从 /proc/pid/maps 打印. maps显示的VMA之间的关系可加载段对我来说也很清楚。每个段由一个或多个 VMA 组成。内核用于从 ELF 段形成 VMA 的方法是什么:它只考虑权限/标志还是还需要其他东西?根据我的理解,带有标志的段 Read, Execute (代码)将进入具有相同权限的单独 VMA。而下一段具有权限 读、写(数据)应该进入另一个 VMA。但这不是第二个可加载段的情况,它通常分为两个或多个 VMA:一些带有 read and write而其他与 read only .所以我认为标志是 VMA 生成的唯一罪魁祸首的假设似乎是错误的。我需要帮助来理解段和 VMA 之间的这种关系。

我想要做的是以编程方式确定 ELF 可加载段的 VMA,而不将其加载到内存中。所以这个方向的任何指示/帮助都是这篇文章的主要目标。

最佳答案

VMA 是虚拟内存的同构区域,具有:

  • 相同的权限( PROT_EXEC 等);
  • 相同类型( MAP_SHARED/MAP_PRIVATE );
  • 相同的后备文件(如果有);
  • 文件中的一致偏移量。

  • 例如,如果您的 VMA 是 RW而你mprotect PROT_READ (您删除了写入权限)VMA 中间的一部分,内核会将 VMA 拆分为三个 VMA(第一个是 RW ,第二个是 R 和最后一个 RW )。

    让我们看一下来自可执行文件的典型 VMA:
    $ cat /proc/$$/maps
    00400000-004f2000 r-xp 00000000 08:01 524453     /bin/bash
    006f1000-006f2000 r--p 000f1000 08:01 524453     /bin/bash
    006f2000-006fb000 rw-p 000f2000 08:01 524453     /bin/bash
    006fb000-00702000 rw-p 00000000 00:00 0
    [...]
    

    The first VMA is the text segment. The second, third and fourth VMAs are the data segment.

    Anonymous mapping for .bss

    At the beginning of the process, you will have something like this:

    $ cat /proc/$$/maps
    00400000-004f2000 r-xp 00000000 08:01 524453     /bin/bash
    006f1000-006fb000 rw-p 000f1000 08:01 524453     /bin/bash
    006fb000-00702000 rw-p 00000000 00:00 0
    [...]
    
    • 006f1000-006fb000 is the part of the text segment which comes from the executable file.

    • 006fb000-00702000 is not present in the executable file because it is initially filled with zeroes. The non-initialized variables of the process are all grouped together (in the .bss segment) and are not represented in the executable file in order to save space (1).

    This come from the PT_LOAD entries of the program header table of the executable file (readelf -l) which describe the segments to map into memory:

    Type    Offset             VirtAddr           PhysAddr
            FileSiz            MemSiz              Flags  Align
    [...]
    LOAD    0x0000000000000000 0x0000000000400000 0x0000000000400000
            0x00000000000f1a74 0x00000000000f1a74  R E    200000
    LOAD    0x00000000000f1de0 0x00000000006f1de0 0x00000000006f1de0
            0x0000000000009068 0x000000000000f298  RW     200000
    [...]
    

    If you look at the corresponding PT_LOAD entry, you will notice that a part of the the segment is not represented in the file (because the file size is smaller than the memory size).

    The part of the data segment which is not in the executable file is initialized with zeros: the dynamic linker uses a MAP_ANONYMOUS mapping for this part of the data segment. This is why is appears as a separate VMA (it does not have the same backing file).

    Relocation protection (PT_GNU_RELRO)

    When the dynamic, linker has finished doing the relocations (2), it might mark some part of the data segment (the .got section among others) as read-only in order to avoid GOT-poisoning attacks or bugs. The section of the data segment which should be protected after the relocations in described by the PT_GNU_RELRO entry of the program header table: the dynamic linker mprotect(addr, len, PROT_READ) the given region after finishing the relocations (3). This mprotect call splits the second VMA in two VMAs (the first one R and the second one RW).

    Type        Offset             VirtAddr           PhysAddr
                FileSiz            MemSiz             Flags  Align
    [...]
    GNU_RELRO   0x00000000000f1de0 0x00000000006f1de0 0x00000000006f1de0
                0x0000000000000220 0x0000000000000220  R
    [...]
    

    Summary

    The VMAs

    00400000-004f2000 r-xp 00000000 08:01 524453     /bin/bash
    006f1000-006f2000 r--p 000f1000 08:01 524453     /bin/bash
    006f2000-006fb000 rw-p 000f2000 08:01 524453     /bin/bash
    006fb000-00702000 rw-p 00000000 00:00 0
    

    are derived from the VirtAddr, MemSiz and Flags fields of the PT_LOAD and PT_GNU_RELRO entries:

    Type       Offset             VirtAddr           PhysAddr
               FileSiz            MemSiz              Flags  Align
    [...]
    LOAD       0x0000000000000000 0x0000000000400000 0x0000000000400000
               0x00000000000f1a74 0x00000000000f1a74  R E    200000
    LOAD       0x00000000000f1de0 0x00000000006f1de0 0x00000000006f1de0
               0x0000000000009068 0x000000000000f298  RW     200000
    [...]
    GNU_RELRO 0x00000000000f1de0 0x00000000006f1de0 0x00000000006f1de0
              0x0000000000000220 0x0000000000000220  R
    [...]
    
    1. First all PT_LOAD entries are processes. Each of them triggers the creation of one VMA by using a mmap(). In addition, if MemSiz > FileSiz, it might create an additional anonymous VMA.

    2. Then all (well there is only once in pratice) PT_GNU_RELRO are processes. Each of them triggers a mprotect() call which might split an existing VMA into different VMAs.

    In order to do what you want, the correct way is probably to simulate the mmap and mprotect calls:

    // Virtual Memory Area:
    struct Vma {
      std::uint64_t addr, length;
      std::string file_name;
      int prot;
      int flags;
      std::uint64_t offset;
    };
    
    // Virtual Address Space:
    class Vas {
    private:
      std::list<Vma> vmas_;
    public:
      Vma& mmap(
        std::uint64_t addr, std::uint64_t length, int prot,
        int flags, int fd, off_t offset);
      int mprotect(std::uint64_t addr, std::uint64_t len, int prot);
      std::list<Vma> const& vmas() const { return vmas_; }
    };
    
    for (Elf32_Phdr const& h : phdrs)
      if (h.p_type == PT_LOAD) {
        vas.mmap(...);
        if (anon_size)
          vas.mmap(...); 
      }  
    for (Elf32_Phdr const& h : phdrs)
      if (h.p_type == PT_GNU_RELRO)
        vas.mprotect(...);  
    

    一些计算示例

    地址略有不同,因为 VMA 是页对齐的 (3)(对于 x86 和 x86_64,使用 4Kio = 0x1000 页):

    第一个 VMA 由第一个 PT_LOAD 描述入口:
    vma[0].start = page_floor(load[0].virt_addr)
                 = 0x400000
    
    vma[0].end = page_ceil(load[1].virt_addr + load[1].phys_size)
               = page_ceil(0x400000 + 0xf1a74)
               = page_ceil(0x4f1a74)
               = 0x4f2000
    

    下一个 VMA 是数据段中 protected 部分,由 PT_GNU_RELRO 描述。 :
    vma[1].start = page_floor(relro[0].virt_addr)
                 = page_floor(0xf1de0)
                 = 0x6f1000
    
    vma[1].end = page_ceil(relro[0].virt_addr + relo[0].mem_size)
               = page_ceil(0x6f1de0 + 0x220)
               = page_ceil(0x6f2000)
               = 0x6f2000
    

    [...]

    与各科的对应

    部分标题:
    [Nr] 名称类型地址偏移量
    大小 EntSize 标志链接信息对齐
    [ 0] NULL 0000000000000000 00000000
    0000000000000000 0000000000000000 0 0 0
    [ 1] .interp PROGBITS 0000000000400238 00000238
    000000000000001c 0000000000000000 A 0 0 1
    [ 2] .note.ABI-tag NOTE 0000000000400254 00000254
    0000000000000020 0000000000000000 A 0 0 4
    [ 3] .note.gnu.build-i 注意 0000000000400274 00000274
    0000000000000024 0000000000000000 A 0 0 4
    [4].gnu.hash GNU_HASH 0000000000400298 00000298
    0000000000004894 0000000000000000 A 5 0 8
    [ 5] .dynsym DYNSYM 0000000000404b30 00004b30
    000000000000d6c8 0000000000000018 A 6 1 8
    [6].dynstr STRTAB 00000000004121f8 000121f8
    0000000000008c25 0000000000000000 A 0 0 1
    [7].gnu.version VERSYM 000000000041ae1e 0001ae1e
    00000000000011e6 0000000000000002 A 5 0 2
    [8].gnu.version_r VERNEED 000000000041c008 0001c008
    00000000000000b0 0000000000000000 A 6 2 8
    [ 9] .rela.dyn RELA 000000000041c0b8 0001c0b8
    00000000000000c0 0000000000000018 A 5 0 8
    [10] .rela.plt RELA 000000000041c178 0001c178
    00000000000013f8 0000000000000018 人工智能 5 12 8
    [11] .init 程序 000000000041d570 0001d570
    000000000000001a 0000000000000000 AX 0 0 4
    [12] .plt 程序 000000000041d590 0001d590
    0000000000000d60 0000000000000010 AX 0 0 16
    [13] .text PROGBITS 000000000041e2f0 0001e2f0
    0000000000099c42 0000000000000000 AX 0 0 16
    [14] .fini PROGBITS 00000000004b7f34 000b7f34
    0000000000000009 0000000000000000 AX 0 0 4
    [15] .rodata 程序 00000000004b7f40 000b7f40
    000000000001ebb0 0000000000000000 A 0 0 64
    [16] .eh_frame_hdr PROGBITS 00000000004d6af0 000d6af0
    000000000000407c 0000000000000000 A 0 0 4
    [17] .eh_frame PROGBITS 00000000004dab70 000dab70
    0000000000016f04 0000000000000000 A 0 0 8
    [18] .init_array INIT_ARRAY 00000000006f1de0 000f1de0
    0000000000000008 0000000000000000 WA 0 0 8
    [19] .fini_array FINI_ARRAY 00000000006f1de8 000f1de8
    0000000000000008 0000000000000000 WA 0 0 8
    [20] .jcr 程序 00000000006f1df0 000f1df0
    0000000000000008 0000000000000000 WA 0 0 8
    [21].动态动态00000000006f1df8 000f1df8
    0000000000000200 0000000000000010 WA 6 0 8
    [22] .got PROGBITS 00000000006f1ff8 000f1ff8
    0000000000000008 0000000000000008 WA 0 0 8
    [23] .got.plt PROGBITS 00000000006f2000 000f2000
    00000000000006c0 0000000000000008 WA 0 0 8
    [24] .data PROGBITS 00000000006f26c0 000f26c0
    0000000000008788 0000000000000000 WA 0 0 64
    [25] .bss NOBITS 00000000006fae80 000fae48
    00000000000061f8 0000000000000000 WA 0 0 64
    [26] .shstrtab STRTAB 0000000000000000 000fae48
    00000000000000ef 0000000000000000 0 0 1

    如果将部分的地址 ( readelf -S ) 与 VMA 的范围进行比较,则会找到映射:

    00400000-004f2000 r-xp/bin/bash:.interp、.note.ABI-tag、.note.gnu.build-id、.gnu.hash、.dynsym、.dynstr、.gnu.version、.gnu.version_r 、.rela.dyn、.rela.plt、.init、.plt、.text、.fini、.rodata.eh_frame_hdr、.eh_frame
    006f1000-006f2000 r--p/bin/bash : .init_array, .fini_array, .jcr, .dynamic, .got
    006f2000-006fb000 rw-p/bin/bash : .got.plt, .data, .bss 的开头
    006fb000-00702000 rw-p - : .bss 的其余部分

    笔记

    (1):其实它更复杂:.bss的一部分出于页面对齐的原因,部分可能会在可执行文件中表示。

    (2):实际上,当它完成了非惰性重定位时。

    (3):MMU 操作使用页粒度,所以内存范围为 mmap() , mprotect() , munmap()调用扩展到涵盖整页。

    关于c - VMA和ELF段之间的关系,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33756119/

    相关文章:

    c - 按值调用和按引用调用都有效吗?

    c - 文件读取错误值

    c - 无限期地扩展内部 for 循环调用

    bash - 使用 join 进行笛卡尔积

    ios - iOS 上的 SIGTRAP 错误 – AutoreleasePoolPage::busted

    android - 如何在C for Android中正确连接蓝牙服务器?

    c++ - mmap 大小调整为上一页边界

    java - FireFox.exe 路径(根据操作系统)

    c++ - delete[] 的析构函数问题

    python - 如何在 python mechanize 模块中禁用历史记录?