updated:

《程序员的自我修养》- ELF文件预备知识


基本工具与ELF文件基本结构

基本工具包括radare2,objdump,readelf

ELF文件的基本结构由文件头(Header)和段(section)组成

sections 与 segments

在维基百科中,对这两个概念的区分如下:

The segments contain information that is necessary for runtime execution of the file, while sections contain important data for linking and relocation. Any byte in the entire file can be owned by at most one section, and there can be orphan bytes which are not owned by any section.

可以看到,sections和segments最大的区别在于前者所包含的是链接所需要的信息,用于和链接器互动;而后者所包含的则是运行时所需要的数据,与操作系统互动。注意这里的链接可以是可执行文件产生之前就已经完成了的静态链接,也可以是运行时的动态链接。因此sections和segments并不冲突,只是概念上有不同之处。为了避免歧义,本文中对于这类概念使用英文表示。 我们使用readelf工具来探索一下/bin/sh程序的sections和segments:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# /bin/sh 的sections的信息
$ readelf -S /bin/sh
There are 28 section headers, starting at offset 0x1d378:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .interp PROGBITS 00000000000002a8 000002a8
000000000000001c 0000000000000000 A 0 0 1
...
[26] .gnu_debuglink PROGBITS 0000000000000000 0001d240
0000000000000034 0000000000000000 0 0 4
[27] .shstrtab STRTAB 0000000000000000 0001d274
0000000000000101 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# /bin/sh的segments的信息
$ readelf -l /bin/sh
Elf file type is DYN (Shared object file)
Entry point 0x4760
There are 11 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x0000000000000268 0x0000000000000268 R 0x8
INTERP 0x00000000000002a8 0x00000000000002a8 0x00000000000002a8
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x00000000000033d0 0x00000000000033d0 R 0x1000
LOAD 0x0000000000004000 0x0000000000004000 0x0000000000004000
0x0000000000011e6d 0x0000000000011e6d R E 0x1000
LOAD 0x0000000000016000 0x0000000000016000 0x0000000000016000
0x0000000000005798 0x0000000000005798 R 0x1000
LOAD 0x000000000001bf30 0x000000000001cf30 0x000000000001cf30
0x0000000000001310 0x0000000000003f40 RW 0x1000
DYNAMIC 0x000000000001cb08 0x000000000001db08 0x000000000001db08
0x00000000000001f0 0x00000000000001f0 RW 0x8
NOTE 0x00000000000002c4 0x00000000000002c4 0x00000000000002c4
0x0000000000000044 0x0000000000000044 R 0x4
GNU_EH_FRAME 0x0000000000017e44 0x0000000000017e44 0x0000000000017e44
0x00000000000007fc 0x00000000000007fc R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
GNU_RELRO 0x000000000001bf30 0x000000000001cf30 0x000000000001cf30
0x00000000000010d0 0x00000000000010d0 R 0x1
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
03 .init .plt .plt.got .text .fini
04 .rodata .eh_frame_hdr .eh_frame
05 .init_array .fini_array .data.rel.ro .dynamic .got .data .bss
06 .dynamic
07 .note.gnu.build-id .note.ABI-tag
08 .eh_frame_hdr
09
10 .init_array .fini_array .data.rel.ro .dynamic .got

注意两者输出中的Sections Headers和Program Headers,其中前者是指sections的入口,而后者则是segments的入口。从readelf -l的输出中同样可以看到一个segment可以包含多个section。其实正是在链接期间,链接器将一个或多个sections放进了一个segment。 同时在readelf -S的输出中我们可以看到表头同时存在Address和Offset,要了解这两者的区别需要先了解在内核中,Section Headers的信息是如何被存储的。

Section Header Table

Section Headers是由一种特定的数据结构组成的数组,称为section header table,用于索引文件中所有sections的位置。这个数组的下标被称为section header table index。这个数据的详细信息保存在ELF文件的文件头中,可以使用readelf -h查看:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ readelf -h /bin/sh
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Shared object file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x4760
Start of program headers: 64 (bytes into file)
Start of section headers: 119672 (bytes into file) # offset of section header table
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 11 # number of segments
Size of section headers: 64 (bytes)
Number of section headers: 28 # number of sections
Section header string table index: 27

有一些section header table index是被保留的,其中比较重要的是以下几个:

Value

Name

Explanation

0x0000

SHN_UNDEF

Marks a meaningless section reference

0xfff1

SHN_ABS

Specifies absolute values for the corresponding reference

0xfff2

SHN_COMMON

Symbols defined relative to it are common symbols

注意:common symbols仅存在于relocatable object file中 到这里为止,我们已经简单了解了section的相关数据在ELF文件中是如何被组织起来的。下面是一个简单的示意图:

Section Header 的数据结构

在内核中,组成section header数据结构的代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
typedef struct elf64_shdr {
Elf64_Word sh_name; /* Section name, index in string tbl */
Elf64_Word sh_type; /* Type of section */
Elf64_Xword sh_flags; /* Miscellaneous section attributes */
Elf64_Addr sh_addr; /* Section virtual addr at execution */
Elf64_Off sh_offset; /* Section file offset */
Elf64_Xword sh_size; /* Size of section in bytes */
Elf64_Word sh_link; /* Index of another section */
Elf64_Word sh_info; /* Additional section information */
Elf64_Xword sh_addralign; /* Section alignment */
Elf64_Xword sh_entsize; /* Entry size if section holds table */
} Elf64_Shdr;

下面的/bin/sh的section header table可以让我们直观地看到这一点:

sh_type的值有以下几种:

Value

Name

Explanation

0

SHT_NULL

The Section Header is inactive

1

SHT_PROGBITS

Information defined completely by the program

2

SHT_SYMTAB

All symbols needed during linking and some unnecessary ones

3

SHT_STRTAB

A string table (can be mutiple)

4

SHT_RELA

Relocation entries with explict addends (can be mutiple)

9

SHT_REL

Relocation entries without explict addends (can be mutiple)

5

SHT_HASH

A symbol hash table used in dynamic linking

6

SHT_DYNAMIC

Information for dynamic linking

11

SHT_DYNSYM

Only the symbols needed during linking

7

SHT_NOTE

Information that marks the file in some way

8

SHT_NOBITS

Looks like SHT_PROGBITS but occupies no space

10

SHT_SHLIB

(Reserved but semantics not specified)

0x70000000

SHT_LOPROC

Reserved for processor-specific semantics

0x7fffffff

SHT_HIPROC

Reserved for processor-specific semantics

0x80000000

SHT_LOUSER

The lower bound of the range of index reserved for application programs

0xffffffff

SHT_HIUSER

The upper bound of the range of index reserved for application programs

sh_flag的值如下所示:

Nama

Value

Explanation

SHF_WRITE

0x1

Data in this section should be writable during process execution

SHF_ALLOC

0x2

The section occupies memory during process execution

SHF_EXECINSTR

0x4

This section contains executable machine instructions

SHF_MASKPROC

0xf0000000

Reserved section

sh_link与sh_info的值取决于sh_type:

sh_type

sh_link

sh_info

SHT_DYNAMIC

The section header index of the string table used by entries in the section

0

SHT_HASH

The section header index of the symbol table to which the hash table applies

0

SHT_REL
SHT_RELA

The section header index of the associated symbol table

The section header index of the section to which the relocation applies

SHT_SYMTAB
SHT_DYNSYM

The section header index of the associated string table

One greater than the symbol table index of the last local symbol (binding STB_LOCAL).

other

SHN_UNDEF

0

可以看到,sh_link的作用在于告诉特定类型的section需要的信息的位置。 有了上面的知识,我们最后来看一下sh_addr与sh_offset的区别 sh_addr为0时,代表这个section不会出现在程序运行时的地址空间中(也就是说不会出现),否则它的值就是程序运行时这个section的地址。 而sh_offset指的是在这个ELF文件中该section的位置相对于文件首的偏移。上面sh_type的可能取值告诉我们,SHT_NOBITS类型的section并不占实际的空间,这时sh_offset指的是一个概念性的位置。即这个section理论上的偏移。 下面截取的一部分readelf -S的输出结果有助于我们理解上面的区别:

1
2
3
4
5
6
7
8
9
10
11
12
[Nr] Name              Type             Address           Offset
Size EntSize Flags Link Info Align
[20] .bss NOBITS 0000000000004020 00003020
0000000000000008 0000000000000000 WA 0 0 1
[21] .comment PROGBITS 0000000000000000 00003020
0000000000000026 0000000000000001 MS 0 0 1
[22] .symtab SYMTAB 0000000000000000 00003048
0000000000000450 0000000000000018 23 41 8
[23] .strtab STRTAB 0000000000000000 00003498
0000000000000160 0000000000000000 0 0 1
[24] .shstrtab STRTAB 0000000000000000 000035f8
00000000000000cb 0000000000000000 0 0 1

.bss段的类型是NOBITS,Flags中包含A(Alloc),这意味着这个段在ELF文件中不占空间,但是在程序运行时却需要为它分配空间,它的Address便是程序运行时该section的地址,后面的Offset便是ELF文件中的概念性的偏移地址。 下面的三个段都有一个共同特点:不具有A的Flag,也就是说不会出现在文件执行时的地址空间中,因此Address的值都为0,而Offset存在。 以上便是了解ELF文件的section的预备知识,下次将详细讲解ELF文件中的section


← Prev 数据结构 - Lesson 2.1 | 《程序员的自我修养》- ELF文件分类 Next →