我需要确定 ELF 可执行文件的可加载段的 VMA。 VMA 可以从 /proc/pid/maps
打印. maps
显示的VMA之间的关系可加载段对我来说也很清楚。每个段由一个或多个 VMA 组成。内核用于从 ELF 段形成 VMA 的方法是什么:它只考虑权限/标志还是还需要其他东西?根据我的理解,带有标志的段 Read, Execute
(代码)将进入具有相同权限的单独 VMA。而下一段具有权限 读、写(数据)应该进入另一个 VMA。但这不是第二个可加载段的情况,它通常分为两个或多个 VMA:一些带有 read and write
而其他与 read only
.所以我认为标志是 VMA 生成的唯一罪魁祸首的假设似乎是错误的。我需要帮助来理解段和 VMA 之间的这种关系。
我想要做的是以编程方式确定 ELF 可加载段的 VMA,而不将其加载到内存中。所以这个方向的任何指示/帮助都是这篇文章的主要目标。
最佳答案
VMA 是虚拟内存的同构区域,具有:
PROT_EXEC
等); MAP_SHARED/MAP_PRIVATE
); 例如,如果您的 VMA 是
RW
而你mprotect
PROT_READ
(您删除了写入权限)VMA 中间的一部分,内核会将 VMA 拆分为三个 VMA(第一个是 RW
,第二个是 R
和最后一个 RW
)。让我们看一下来自可执行文件的典型 VMA:
$ cat /proc/$$/maps 00400000-004f2000 r-xp 00000000 08:01 524453 /bin/bash 006f1000-006f2000 r--p 000f1000 08:01 524453 /bin/bash 006f2000-006fb000 rw-p 000f2000 08:01 524453 /bin/bash 006fb000-00702000 rw-p 00000000 00:00 0 [...]
The first VMA is the text segment. The second, third and fourth VMAs are the data segment.
Anonymous mapping for .bss
At the beginning of the process, you will have something like this:
$ cat /proc/$$/maps 00400000-004f2000 r-xp 00000000 08:01 524453 /bin/bash 006f1000-006fb000 rw-p 000f1000 08:01 524453 /bin/bash 006fb000-00702000 rw-p 00000000 00:00 0 [...]
006f1000-006fb000
is the part of the text segment which comes from the executable file.006fb000-00702000
is not present in the executable file because it is initially filled with zeroes. The non-initialized variables of the process are all grouped together (in the.bss
segment) and are not represented in the executable file in order to save space (1).
This come from the PT_LOAD
entries of the program header table of the executable file (readelf -l
) which describe the segments to map into memory:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align [...] LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000 0x00000000000f1a74 0x00000000000f1a74 R E 200000 LOAD 0x00000000000f1de0 0x00000000006f1de0 0x00000000006f1de0 0x0000000000009068 0x000000000000f298 RW 200000 [...]
If you look at the corresponding PT_LOAD
entry, you will notice that a part of the the segment is not represented in the file (because the file size is smaller than the memory size).
The part of the data segment which is not in the executable file is initialized with zeros: the dynamic linker uses a MAP_ANONYMOUS
mapping for this part of the data segment. This is why is appears as a separate VMA (it does not have the same backing file).
Relocation protection (PT_GNU_RELRO
)
When the dynamic, linker has finished doing the relocations (2), it might mark some part of the data segment (the .got
section among others) as read-only in order to avoid GOT-poisoning attacks or bugs. The section of the data segment which should be protected after the relocations in described by the PT_GNU_RELRO
entry of the program header table: the dynamic linker mprotect(addr, len, PROT_READ)
the given region after finishing the relocations (3). This mprotect
call splits the second VMA in two VMAs (the first one R
and the second one RW
).
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align [...] GNU_RELRO 0x00000000000f1de0 0x00000000006f1de0 0x00000000006f1de0 0x0000000000000220 0x0000000000000220 R [...]
Summary
The VMAs
00400000-004f2000 r-xp 00000000 08:01 524453 /bin/bash 006f1000-006f2000 r--p 000f1000 08:01 524453 /bin/bash 006f2000-006fb000 rw-p 000f2000 08:01 524453 /bin/bash 006fb000-00702000 rw-p 00000000 00:00 0
are derived from the VirtAddr
, MemSiz
and Flags
fields of the PT_LOAD
and PT_GNU_RELRO
entries:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align [...] LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000 0x00000000000f1a74 0x00000000000f1a74 R E 200000 LOAD 0x00000000000f1de0 0x00000000006f1de0 0x00000000006f1de0 0x0000000000009068 0x000000000000f298 RW 200000 [...] GNU_RELRO 0x00000000000f1de0 0x00000000006f1de0 0x00000000006f1de0 0x0000000000000220 0x0000000000000220 R [...]
First all
PT_LOAD
entries are processes. Each of them triggers the creation of one VMA by using ammap()
. In addition, ifMemSiz > FileSiz
, it might create an additional anonymous VMA.Then all (well there is only once in pratice)
PT_GNU_RELRO
are processes. Each of them triggers amprotect()
call which might split an existing VMA into different VMAs.
In order to do what you want, the correct way is probably to simulate the mmap
and mprotect
calls:
// Virtual Memory Area:
struct Vma {
std::uint64_t addr, length;
std::string file_name;
int prot;
int flags;
std::uint64_t offset;
};
// Virtual Address Space:
class Vas {
private:
std::list<Vma> vmas_;
public:
Vma& mmap(
std::uint64_t addr, std::uint64_t length, int prot,
int flags, int fd, off_t offset);
int mprotect(std::uint64_t addr, std::uint64_t len, int prot);
std::list<Vma> const& vmas() const { return vmas_; }
};
for (Elf32_Phdr const& h : phdrs)
if (h.p_type == PT_LOAD) {
vas.mmap(...);
if (anon_size)
vas.mmap(...);
}
for (Elf32_Phdr const& h : phdrs)
if (h.p_type == PT_GNU_RELRO)
vas.mprotect(...);
一些计算示例
地址略有不同,因为 VMA 是页对齐的 (3)(对于 x86 和 x86_64,使用 4Kio = 0x1000 页):
第一个 VMA 由第一个
PT_LOAD
描述入口:vma[0].start = page_floor(load[0].virt_addr)
= 0x400000
vma[0].end = page_ceil(load[1].virt_addr + load[1].phys_size)
= page_ceil(0x400000 + 0xf1a74)
= page_ceil(0x4f1a74)
= 0x4f2000
下一个 VMA 是数据段中 protected 部分,由
PT_GNU_RELRO
描述。 :vma[1].start = page_floor(relro[0].virt_addr)
= page_floor(0xf1de0)
= 0x6f1000
vma[1].end = page_ceil(relro[0].virt_addr + relo[0].mem_size)
= page_ceil(0x6f1de0 + 0x220)
= page_ceil(0x6f2000)
= 0x6f2000
[...]
与各科的对应
部分标题:
[Nr] 名称类型地址偏移量
大小 EntSize 标志链接信息对齐
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .interp PROGBITS 0000000000400238 00000238
000000000000001c 0000000000000000 A 0 0 1
[ 2] .note.ABI-tag NOTE 0000000000400254 00000254
0000000000000020 0000000000000000 A 0 0 4
[ 3] .note.gnu.build-i 注意 0000000000400274 00000274
0000000000000024 0000000000000000 A 0 0 4
[4].gnu.hash GNU_HASH 0000000000400298 00000298
0000000000004894 0000000000000000 A 5 0 8
[ 5] .dynsym DYNSYM 0000000000404b30 00004b30
000000000000d6c8 0000000000000018 A 6 1 8
[6].dynstr STRTAB 00000000004121f8 000121f8
0000000000008c25 0000000000000000 A 0 0 1
[7].gnu.version VERSYM 000000000041ae1e 0001ae1e
00000000000011e6 0000000000000002 A 5 0 2
[8].gnu.version_r VERNEED 000000000041c008 0001c008
00000000000000b0 0000000000000000 A 6 2 8
[ 9] .rela.dyn RELA 000000000041c0b8 0001c0b8
00000000000000c0 0000000000000018 A 5 0 8
[10] .rela.plt RELA 000000000041c178 0001c178
00000000000013f8 0000000000000018 人工智能 5 12 8
[11] .init 程序 000000000041d570 0001d570
000000000000001a 0000000000000000 AX 0 0 4
[12] .plt 程序 000000000041d590 0001d590
0000000000000d60 0000000000000010 AX 0 0 16
[13] .text PROGBITS 000000000041e2f0 0001e2f0
0000000000099c42 0000000000000000 AX 0 0 16
[14] .fini PROGBITS 00000000004b7f34 000b7f34
0000000000000009 0000000000000000 AX 0 0 4
[15] .rodata 程序 00000000004b7f40 000b7f40
000000000001ebb0 0000000000000000 A 0 0 64
[16] .eh_frame_hdr PROGBITS 00000000004d6af0 000d6af0
000000000000407c 0000000000000000 A 0 0 4
[17] .eh_frame PROGBITS 00000000004dab70 000dab70
0000000000016f04 0000000000000000 A 0 0 8
[18] .init_array INIT_ARRAY 00000000006f1de0 000f1de0
0000000000000008 0000000000000000 WA 0 0 8
[19] .fini_array FINI_ARRAY 00000000006f1de8 000f1de8
0000000000000008 0000000000000000 WA 0 0 8
[20] .jcr 程序 00000000006f1df0 000f1df0
0000000000000008 0000000000000000 WA 0 0 8
[21].动态动态00000000006f1df8 000f1df8
0000000000000200 0000000000000010 WA 6 0 8
[22] .got PROGBITS 00000000006f1ff8 000f1ff8
0000000000000008 0000000000000008 WA 0 0 8
[23] .got.plt PROGBITS 00000000006f2000 000f2000
00000000000006c0 0000000000000008 WA 0 0 8
[24] .data PROGBITS 00000000006f26c0 000f26c0
0000000000008788 0000000000000000 WA 0 0 64
[25] .bss NOBITS 00000000006fae80 000fae48
00000000000061f8 0000000000000000 WA 0 0 64
[26] .shstrtab STRTAB 0000000000000000 000fae48
00000000000000ef 0000000000000000 0 0 1
如果将部分的地址 (
readelf -S
) 与 VMA 的范围进行比较,则会找到映射:00400000-004f2000 r-xp/bin/bash:.interp、.note.ABI-tag、.note.gnu.build-id、.gnu.hash、.dynsym、.dynstr、.gnu.version、.gnu.version_r 、.rela.dyn、.rela.plt、.init、.plt、.text、.fini、.rodata.eh_frame_hdr、.eh_frame
006f1000-006f2000 r--p/bin/bash : .init_array, .fini_array, .jcr, .dynamic, .got
006f2000-006fb000 rw-p/bin/bash : .got.plt, .data, .bss 的开头
006fb000-00702000 rw-p - : .bss 的其余部分
笔记
(1):其实它更复杂:
.bss
的一部分出于页面对齐的原因,部分可能会在可执行文件中表示。(2):实际上,当它完成了非惰性重定位时。
(3):MMU 操作使用页粒度,所以内存范围为
mmap()
, mprotect()
, munmap()
调用扩展到涵盖整页。
关于c - VMA和ELF段之间的关系,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33756119/