Home
updated:

《程序员的自我修养》- ELF文件预备知识


基本工具与ELF文件基本结构

基本工具包括radare2,objdump,readelf <!-- toc -->ELF文件的基本结构由文件头(Header)和段(section)组成

sections 与 segments

在维基百科中,对这两个概念的区分如下:

The segments contain information that is necessary for runtime execution of the file, while sections contain important data for linking and relocation. Any byte in the entire file can be owned by at most one section, and there can be orphan bytes which are not owned by any section.

可以看到,sections和segments最大的区别在于前者所包含的是链接所需要的信息,用于和链接器互动;而后者所包含的则是运行时所需要的数据,与操作系统互动。注意这里的链接可以是可执行文件产生之前就已经完成了的静态链接,也可以是运行时的动态链接。因此sections和segments并不冲突,只是概念上有不同之处。为了避免歧义,本文中对于这类概念使用英文表示。 我们使用readelf工具来探索一下/bin/sh程序的sections和segments:

# /bin/sh 的sections的信息
$ readelf -S /bin/sh
There are 28 section headers, starting at offset 0x1d378:
Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         00000000000002a8  000002a8
       000000000000001c  0000000000000000   A       0     0     1
  ...
  [26] .gnu_debuglink    PROGBITS         0000000000000000  0001d240
       0000000000000034  0000000000000000           0     0     4
  [27] .shstrtab         STRTAB           0000000000000000  0001d274
       0000000000000101  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)
# /bin/sh的segments的信息
$ readelf -l /bin/sh
Elf file type is DYN (Shared object file)
Entry point 0x4760
There are 11 program headers, starting at offset 64
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                 0x0000000000000268 0x0000000000000268  R      0x8
  INTERP         0x00000000000002a8 0x00000000000002a8 0x00000000000002a8
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x00000000000033d0 0x00000000000033d0  R      0x1000
  LOAD           0x0000000000004000 0x0000000000004000 0x0000000000004000
                 0x0000000000011e6d 0x0000000000011e6d  R E    0x1000
  LOAD           0x0000000000016000 0x0000000000016000 0x0000000000016000
                 0x0000000000005798 0x0000000000005798  R      0x1000
  LOAD           0x000000000001bf30 0x000000000001cf30 0x000000000001cf30
                 0x0000000000001310 0x0000000000003f40  RW     0x1000
  DYNAMIC        0x000000000001cb08 0x000000000001db08 0x000000000001db08
                 0x00000000000001f0 0x00000000000001f0  RW     0x8
  NOTE           0x00000000000002c4 0x00000000000002c4 0x00000000000002c4
                 0x0000000000000044 0x0000000000000044  R      0x4
  GNU_EH_FRAME   0x0000000000017e44 0x0000000000017e44 0x0000000000017e44
                 0x00000000000007fc 0x00000000000007fc  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x000000000001bf30 0x000000000001cf30 0x000000000001cf30
                 0x00000000000010d0 0x00000000000010d0  R      0x1
 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
   03     .init .plt .plt.got .text .fini
   04     .rodata .eh_frame_hdr .eh_frame
   05     .init_array .fini_array .data.rel.ro .dynamic .got .data .bss
   06     .dynamic
   07     .note.gnu.build-id .note.ABI-tag
   08     .eh_frame_hdr
   09
   10     .init_array .fini_array .data.rel.ro .dynamic .got

注意两者输出中的Sections Headers和Program Headers,其中前者是指sections的入口,而后者则是segments的入口。从readelf -l的输出中同样可以看到一个segment可以包含多个section。其实正是在链接期间,链接器将一个或多个sections放进了一个segment。 同时在readelf -S的输出中我们可以看到表头同时存在Address和Offset,要了解这两者的区别需要先了解在内核中,Section Headers的信息是如何被存储的。

Section Header Table

Section Headers是由一种特定的数据结构组成的数组,称为section header table,用于索引文件中所有sections的位置。这个数组的下标被称为section header table index。这个数据的详细信息保存在ELF文件的文件头中,可以使用readelf -h查看:

$ readelf -h /bin/sh
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Shared object file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x4760
  Start of program headers:          64 (bytes into file)
  Start of section headers:          119672 (bytes into file)  # offset of section header table
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         11  # number of segments
  Size of section headers:           64 (bytes)
  Number of section headers:         28  # number of sections
  Section header string table index: 27

有一些section header table index是被保留的,其中比较重要的是以下几个:

Value

Name

Explanation

0x0000

SHN_UNDEF

Marks a meaningless section reference

0xfff1

SHN_ABS

Specifies absolute values for the corresponding reference

0xfff2

SHN_COMMON

Symbols defined relative to it are common symbols

注意:common symbols仅存在于relocatable object file中 到这里为止,我们已经简单了解了section的相关数据在ELF文件中是如何被组织起来的。下面是一个简单的示意图:

Section Header 的数据结构

在内核中,组成section header数据结构的代码如下:

typedef struct elf64_shdr {
    Elf64_Word sh_name;     /* Section name, index in string tbl */
    Elf64_Word sh_type;     /* Type of section */
    Elf64_Xword sh_flags;       /* Miscellaneous section attributes */
    Elf64_Addr sh_addr;     /* Section virtual addr at execution */
    Elf64_Off sh_offset;        /* Section file offset */
    Elf64_Xword sh_size;        /* Size of section in bytes */
    Elf64_Word sh_link;     /* Index of another section */
    Elf64_Word sh_info;     /* Additional section information */
    Elf64_Xword sh_addralign;   /* Section alignment */
    Elf64_Xword sh_entsize; /* Entry size if section holds table */
} Elf64_Shdr;

下面的/bin/sh的section header table可以让我们直观地看到这一点:

sh_type的值有以下几种:

Value

Name

Explanation

0

SHT_NULL

The Section Header is inactive

1

SHT_PROGBITS

Information defined completely by the program

2

SHT_SYMTAB

All symbols needed during linking and some unnecessary ones

3

SHT_STRTAB

A string table (can be mutiple)

4

SHT_RELA

Relocation entries with explict addends (can be mutiple)

9

SHT_REL

Relocation entries without explict addends (can be mutiple)

5

SHT_HASH

A symbol hash table used in dynamic linking

6

SHT_DYNAMIC

Information for dynamic linking

11

SHT_DYNSYM

Only the symbols needed during linking

7

SHT_NOTE

Information that marks the file in some way

8

SHT_NOBITS

Looks like SHT_PROGBITS but occupies no space

10

SHT_SHLIB

(Reserved but semantics not specified)

0x70000000

SHT_LOPROC

Reserved for processor-specific semantics

0x7fffffff

SHT_HIPROC

Reserved for processor-specific semantics

0x80000000

SHT_LOUSER

The lower bound of the range of index reserved for application programs

0xffffffff

SHT_HIUSER

The upper bound of the range of index reserved for application programs

sh_flag的值如下所示:

Nama

Value

Explanation

SHF_WRITE

0x1

Data in this section should be writable during process execution

SHF_ALLOC

0x2

The section occupies memory during process execution

SHF_EXECINSTR

0x4

This section contains executable machine instructions

SHF_MASKPROC

0xf0000000

Reserved section

sh_link与sh_info的值取决于sh_type:

sh_type

sh_link

sh_info

SHT_DYNAMIC

The section header index of the string table used by entries in the section

0

SHT_HASH

The section header index of the symbol table to which the hash table applies

0

SHT_REL
SHT_RELA

The section header index of the associated symbol table

The section header index of the section to which the relocation applies

SHT_SYMTAB
SHT_DYNSYM

The section header index of the associated string table

One greater than the symbol table index of the last local symbol (binding STB_LOCAL).

other

SHN_UNDEF

0

可以看到,sh_link的作用在于告诉特定类型的section需要的信息的位置。 有了上面的知识,我们最后来看一下sh_addr与sh_offset的区别 sh_addr为0时,代表这个section不会出现在程序运行时的地址空间中(也就是说不会出现),否则它的值就是程序运行时这个section的地址。 而sh_offset指的是在这个ELF文件中该section的位置相对于文件首的偏移。上面sh_type的可能取值告诉我们,SHT_NOBITS类型的section并不占实际的空间,这时sh_offset指的是一个概念性的位置。即这个section理论上的偏移。 下面截取的一部分readelf -S的输出结果有助于我们理解上面的区别:

  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [20] .bss              NOBITS           0000000000004020  00003020
       0000000000000008  0000000000000000  WA       0     0     1
  [21] .comment          PROGBITS         0000000000000000  00003020
       0000000000000026  0000000000000001  MS       0     0     1
  [22] .symtab           SYMTAB           0000000000000000  00003048
       0000000000000450  0000000000000018          23    41     8
  [23] .strtab           STRTAB           0000000000000000  00003498
       0000000000000160  0000000000000000           0     0     1
  [24] .shstrtab         STRTAB           0000000000000000  000035f8
       00000000000000cb  0000000000000000           0     0     1

.bss段的类型是NOBITS,Flags中包含A(Alloc),这意味着这个段在ELF文件中不占空间,但是在程序运行时却需要为它分配空间,它的Address便是程序运行时该section的地址,后面的Offset便是ELF文件中的概念性的偏移地址。 下面的三个段都有一个共同特点:不具有A的Flag,也就是说不会出现在文件执行时的地址空间中,因此Address的值都为0,而Offset存在。 以上便是了解ELF文件的section的预备知识,下次将详细讲解ELF文件中的section