pwnlib.elf.elf
— ELF Files
Exposes functionality for manipulating ELF files
Stop hard-coding things! Look them up at runtime with pwnlib.elf
.
Example Usage
>>> e = ELF('/bin/cat')
>>> print(hex(e.address))
0x400000
>>> print(hex(e.symbols['write']))
0x401680
>>> print(hex(e.got['write']))
0x60b070
>>> print(hex(e.plt['write']))
0x401680
You can even patch and save the files.
>>> e = ELF('/bin/cat')
>>> e.read(e.address+1, 3)
b'ELF'
>>> e.asm(e.address, 'ret')
>>> e.save('/tmp/quiet-cat')
>>> disasm(open('/tmp/quiet-cat','rb').read(1))
' 0: c3 ret'
Module Members
- class pwnlib.elf.elf.ELF(path, checksec=True)[source]
Bases:
ELFFile
Encapsulates information about an ELF file.
Example
>>> bash = ELF(which('bash')) >>> hex(bash.symbols['read']) 0x41dac0 >>> hex(bash.plt['read']) 0x41dac0 >>> u32(bash.read(bash.got['read'], 4)) 0x41dac6 >>> print(bash.disasm(bash.plt.read, 16)) 0: ff 25 1a 18 2d 00 jmp QWORD PTR [rip+0x2d181a] # 0x2d1820 6: 68 59 00 00 00 push 0x59 b: e9 50 fa ff ff jmp 0xfffffffffffffa60
- static _decompress_dwarf_section(section)[source]
Returns the uncompressed contents of the provided DWARF section.
- _get_section_header_stringtable()[source]
Get the string table section corresponding to the section header table.
- _get_section_name(section_header)[source]
Given a section header, find this section’s name in the file’s string table
- _make_symbol_table_index_section(section_header, name)[source]
Create a SymbolTableIndexSection object
- _parse_elf_header()[source]
Parses the ELF file header and assigns the result to attributes of this object.
- _patch_elf_and_read_maps()[source]
patch_elf_and_read_maps(self) -> dict
Read
/proc/self/maps
as if the ELF were executing.This is done by replacing the code at the entry point with shellcode which dumps
/proc/self/maps
and exits, and actually executing the binary.- Returns
A
dict
mapping file paths to the lowest address they appear at. Does not do any translation for e.g. QEMU emulation, the raw results are returned.If there is not enough space to inject the shellcode in the segment which contains the entry point, returns
{}
.
Doctests:
These tests are just to ensure that our shellcode is correct.
>>> for arch in CAT_PROC_MAPS_EXIT: ... context.clear() ... with context.local(arch=arch): ... sc = shellcraft.cat2("/proc/self/maps") ... sc += shellcraft.exit() ... sc = asm(sc) ... sc = enhex(sc) ... assert sc == CAT_PROC_MAPS_EXIT[arch], (arch, sc)
- _populate_functions()[source]
Builds a dict of ‘functions’ (i.e. symbols of type ‘STT_FUNC’) by function name that map to a tuple consisting of the func address and size in bytes.
- _populate_got()[source]
Loads the symbols for all relocations.
>>> libc = ELF(which('bash')).libc >>> assert 'strchrnul' in libc.got >>> assert 'memcpy' in libc.got >>> assert libc.got.strchrnul != libc.got.memcpy
- _populate_libraries()[source]
>>> from os.path import exists >>> bash = ELF(which('bash')) >>> all(map(exists, bash.libs.keys())) True >>> any(map(lambda x: 'libc' in x, bash.libs.keys())) True
- _populate_plt()[source]
Loads the PLT symbols
>>> path = pwnlib.data.elf.path >>> for test in glob(os.path.join(path, 'test-*')): ... test = ELF(test) ... assert '__stack_chk_fail' in test.got, test ... if test.arch != 'ppc': ... assert '__stack_chk_fail' in test.plt, test
- _populate_symbols()[source]
>>> bash = ELF(which('bash')) >>> bash.symbols['_start'] == bash.entry True
- _populate_synthetic_symbols()[source]
Adds symbols from the GOT and PLT to the symbols dictionary.
Does not overwrite any existing symbols, and prefers PLT symbols.
Synthetic plt.xxx and got.xxx symbols are added for each PLT and GOT entry, respectively.
Example:bash.
>>> bash = ELF(which('bash')) >>> bash.symbols.wcscmp == bash.plt.wcscmp True >>> bash.symbols.wcscmp == bash.symbols.plt.wcscmp True >>> bash.symbols.stdin == bash.got.stdin True >>> bash.symbols.stdin == bash.symbols.got.stdin True
- _read_dwarf_section(section, relocate_dwarf_sections)[source]
Read the contents of a DWARF section from the stream and return a DebugSectionDescriptor. Apply relocations if asked to.
- asm(address, assembly)[source]
Assembles the specified instructions and inserts them into the ELF at the specified address.
This modifies the ELF in-place. The resulting binary can be saved with
ELF.save()
- checksec(banner=True, color=True)[source]
Prints out information in the binary, similar to
checksec.sh
.
- debug(argv=[], *a, **kw) tube [source]
Debug the ELF with
gdb.debug()
.- Parameters
argv (list) – List of arguments to the binary
*args – Extra arguments to
gdb.debug()
**kwargs – Extra arguments to
gdb.debug()
- Returns
tube
– Seegdb.debug()
- disable_nx()[source]
Disables NX for the ELF.
Zeroes out the
PT_GNU_STACK
program headerp_type
field.
- disasm(address, n_bytes) str [source]
Returns a string of disassembled instructions at the specified virtual memory address
- dynamic_by_tag(tag) tag [source]
- Parameters
tag (str) – Named
DT_XXX
tag (e.g.'DT_STRTAB'
).- Returns
elftools.elf.dynamic.DynamicTag
- dynamic_value_by_tag(tag) int [source]
Retrieve the value from a dynamic tag a la
DT_XXX
.If the tag is missing, returns
None
.
- fit(address, *a, **kw)[source]
Writes fitted data into the specified address.
See:
packing.fit()
- flat(address, *a, **kw)[source]
Writes a full array of values to the specified address.
See:
packing.flat()
- static from_assembly(assembly) ELF [source]
Given an assembly listing, return a fully loaded ELF object which contains that assembly at its entry point.
- Parameters
Example
>>> e = ELF.from_assembly('nop; foo: int 0x80', vma = 0x400000) >>> e.symbols['foo'] = 0x400001 >>> e.disasm(e.entry, 1) ' 400000: 90 nop' >>> e.disasm(e.symbols['foo'], 2) ' 400001: cd 80 int 0x80'
- static from_bytes(bytes) ELF [source]
Given a sequence of bytes, return a fully loaded ELF object which contains those bytes at its entry point.
Example
>>> e = ELF.from_bytes(b'\x90\xcd\x80', vma=0xc000) >>> print(e.disasm(e.entry, 3)) c000: 90 nop c001: cd 80 int 0x80
- get_ehabi_infos()[source]
Generally, shared library and executable contain 1 .ARM.exidx section. Object file contains many .ARM.exidx sections. So we must traverse every section and filter sections whose type is SHT_ARM_EXIDX.
- get_section_by_name(name)[source]
Get a section from the file, by name. Return None if no such section exists.
- get_section_index(section_name)[source]
Gets the index of the section by name. Return None if no such section name exists.
- get_segment_for_address(address, size=1) Segment [source]
Given a virtual address described by a
PT_LOAD
segment, return the first segment which describes the virtual address. An optionalsize
may be provided to ensure the entire range falls into the same segment.
- get_supplementary_dwarfinfo(dwarfinfo)[source]
Read supplementary dwarfinfo, from either the standared .debug_sup section or the GNU proprietary .gnu_debugaltlink.
- has_ehabi_info()[source]
Check whether this file appears to have arm exception handler index table.
- has_phantom_bytes()[source]
The XC16 compiler for the PIC microcontrollers emits DWARF where all odd bytes in all DWARF sections are to be discarded (“phantom”).
We don’t know where does the phantom byte discarding fit into the usual chain of section content transforms. There are no XC16/PIC binaries in the corpus with relocations against DWARF, and the DWARF section compression seems to be unsupported by XC16.
- iter_notes()[source]
- Yields
All the notes in the PT_NOTE segments. Each result is a dictionary- like object with
n_name
,n_type
, andn_desc
fields, amongst others.
- iter_properties()[source]
- Yields
All the GNU properties in the PT_NOTE segments. Each result is a dictionary- like object with
pr_type
,pr_datasz
, andpr_data
fields.
- classmethod load_from_path(path)[source]
Takes a path to a file on the local filesystem, and returns an ELFFile from it, setting up a correct stream_loader relative to the original file.
- offset_to_vaddr(offset) int [source]
Translates the specified offset to a virtual address.
- Parameters
offset (int) – Offset to translate
- Returns
int – Virtual address which corresponds to the file offset, or
None
.
Examples
This example shows that regardless of changes to the virtual address layout by modifying
ELF.address
, the offset for any given address doesn’t change.>>> bash = ELF('/bin/bash') >>> bash.address == bash.offset_to_vaddr(0) True >>> bash.address += 0x123456 >>> bash.address == bash.offset_to_vaddr(0) True
- ELF.patch_custom_libraries(str, str, bool, str) -> ELF[source]
Looks for the interpreter binary in the given path and patches the binary to use it if available. Also patches the RUNPATH to the given path using the patchelf utility.
- Parameters
- Returns
A new ELF instance is returned after patching the binary with the external
patchelf
tool.
Example
>>> tmpdir = tempfile.mkdtemp() >>> linker_path = os.path.join(tmpdir, 'ld-mock.so') >>> write(linker_path, b'loader') >>> ls_path = os.path.join(tmpdir, 'ls') >>> _ = shutil.copy(which('ls'), ls_path) >>> e = ELF.patch_custom_libraries(ls_path, tmpdir) >>> e.runpath.decode() == tmpdir True >>> e.linker.decode() == linker_path True
- process(argv=[], *a, **kw) process [source]
Execute the binary with
process
. Note thatargv
is a list of arguments, and should not includeargv[0]
.
- read(address, count) bytes [source]
Read data from the specified virtual address
- Parameters
- Returns
A
bytes
object, orNone
.
Examples
The simplest example is just to read the ELF header.
>>> bash = ELF(which('bash')) >>> bash.read(bash.address, 4) b'\x7fELF'
ELF segments do not have to contain all of the data on-disk that gets loaded into memory.
First, let’s create an ELF file has some code in two sections.
>>> assembly = ''' ... .section .A,"awx" ... .global A ... A: nop ... .section .B,"awx" ... .global B ... B: int3 ... ''' >>> e = ELF.from_assembly(assembly, vma=False)
By default, these come right after eachother in memory.
>>> e.read(e.symbols.A, 2) b'\x90\xcc' >>> e.symbols.B - e.symbols.A 1
Let’s move the sections so that B is a little bit further away.
>>> objcopy = pwnlib.asm._objcopy() >>> objcopy += [ ... '--change-section-vma', '.B+5', ... '--change-section-lma', '.B+5', ... e.path ... ] >>> subprocess.check_call(objcopy) 0
Now let’s re-load the ELF, and check again
>>> e = ELF(e.path) >>> e.symbols.B - e.symbols.A 6 >>> e.read(e.symbols.A, 2) b'\x90\x00' >>> e.read(e.symbols.A, 7) b'\x90\x00\x00\x00\x00\x00\xcc' >>> e.read(e.symbols.A, 10) b'\x90\x00\x00\x00\x00\x00\xcc\x00\x00\x00'
Everything is relative to the user-selected base address, so moving things around keeps everything working.
>>> e.address += 0x1000 >>> e.read(e.symbols.A, 10) b'\x90\x00\x00\x00\x00\x00\xcc\x00\x00\x00'
- save(path=None)[source]
Save the ELF to a file
>>> bash = ELF(which('bash')) >>> bash.save('/tmp/bash_copy') >>> copy = open('/tmp/bash_copy', 'rb') >>> bash = open(which('bash'), 'rb') >>> bash.read() == copy.read() True
- search(needle, writable=False, executable=False) generator [source]
Search the ELF’s virtual address space for the specified string.
Notes
Does not search empty space between segments, or uninitialized data. This will only return data that actually exists in the ELF file. Searching for a long string of NULL bytes probably won’t work.
- Parameters
- Yields
An iterator for each virtual address that matches.
Examples
An ELF header starts with the bytes
\x7fELF
, so we sould be able to find it easily.>>> bash = ELF('/bin/bash') >>> bash.address + 1 == next(bash.search(b'ELF')) True
We can also search for string the binary.
>>> len(list(bash.search(b'GNU bash'))) > 0 True
It is also possible to search for instructions in executable sections.
>>> binary = ELF.from_assembly('nop; mov eax, 0; jmp esp; ret') >>> jmp_addr = next(binary.search(asm('jmp esp'), executable = True)) >>> binary.read(jmp_addr, 2) == asm('jmp esp') True
- ELF.set_interpreter(str, str) -> ELF[source]
Patches the interpreter of the ELF to the given binary using the patchelf utility.
When running the binary, the new interpreter will be used to load the ELF.
- Parameters
- Returns
A new ELF instance is returned after patching the binary with the external
patchelf
tool.
Example
>>> tmpdir = tempfile.mkdtemp() >>> ls_path = os.path.join(tmpdir, 'ls') >>> _ = shutil.copy(which('ls'), ls_path) >>> e = ELF.set_interpreter(ls_path, '/tmp/correct_ld.so') >>> e.linker == b'/tmp/correct_ld.so' True
- ELF.set_runpath(str, str) -> ELF[source]
Patches the RUNPATH of the ELF to the given path using the patchelf utility.
The dynamic loader will look for any needed shared libraries in the given path first, before trying the system library paths. This is useful to run a binary with a different libc binary.
- Parameters
- Returns
A new ELF instance is returned after patching the binary with the external
patchelf
tool.
Example
>>> tmpdir = tempfile.mkdtemp() >>> ls_path = os.path.join(tmpdir, 'ls') >>> _ = shutil.copy(which('ls'), ls_path) >>> e = ELF.set_runpath(ls_path, './libs') >>> e.runpath == b'./libs' True
- string(address) str [source]
Reads a null-terminated string from the specified
address
- Returns
A
str
with the string contents (NUL terminator is omitted), or an empty string if no NUL terminator could be found.
- vaddr_to_offset(address) int [source]
Translates the specified virtual address to a file offset
- Parameters
address (int) – Virtual address to translate
- Returns
int – Offset within the ELF file which corresponds to the address, or
None
.
Examples
>>> bash = ELF(which('bash')) >>> bash.vaddr_to_offset(bash.address) 0 >>> bash.address += 0x123456 >>> bash.vaddr_to_offset(bash.address) 0 >>> bash.vaddr_to_offset(0) is None True
- write(address, data)[source]
Writes data to the specified virtual address
Note
This routine does not check the bounds on the write to ensure that it stays in the same segment.
Examples
>>> bash = ELF(which('bash')) >>> bash.read(bash.address+1, 3) b'ELF' >>> bash.write(bash.address, b"HELO") >>> bash.read(bash.address, 4) b'HELO'
- property address[source]
Address of the lowest segment loaded in the ELF.
When updated, the addresses of the following fields are also updated:
However, the following fields are NOT updated:
Example
>>> bash = ELF('/bin/bash') >>> read = bash.symbols['read'] >>> text = bash.get_section_by_name('.text').header.sh_addr >>> bash.address += 0x1000 >>> read + 0x1000 == bash.symbols['read'] True >>> text == bash.get_section_by_name('.text').header.sh_addr True
- Type
- arch[source]
Architecture of the file (e.g.
'i386'
,'arm'
).See:
ContextType.arch
- Type
- property execstack[source]
Whether dynamically loading the current binary will make the stack executable.
This is based on the presence of a program header
PT_GNU_STACK
, its setting, and the default stack permissions for the architecture.If
PT_GNU_STACK
is persent, the stack permissions are set according to it:case PT_GNU_STACK: stack_flags = ph->p_flags; break;
Else, the stack permissions are set according to the architecture defaults as defined by
DEFAULT_STACK_PERMS
:/* On most platforms presume that PT_GNU_STACK is absent and the stack is * executable. Other platforms default to a nonexecutable stack and don't * need PT_GNU_STACK to do so. */ uint_fast16_t stack_flags = DEFAULT_STACK_PERMS;
By searching the source for
DEFAULT_STACK_PERMS
, we can see which architectures have which settings.$ git grep '#define DEFAULT_STACK_PERMS' | grep -v PF_X sysdeps/aarch64/stackinfo.h: #define DEFAULT_STACK_PERMS (PF_R|PF_W) sysdeps/arc/stackinfo.h: #define DEFAULT_STACK_PERMS (PF_R|PF_W) sysdeps/csky/stackinfo.h: #define DEFAULT_STACK_PERMS (PF_R|PF_W) sysdeps/ia64/stackinfo.h: #define DEFAULT_STACK_PERMS (PF_R|PF_W) sysdeps/loongarch/stackinfo.h: #define DEFAULT_STACK_PERMS (PF_R | PF_W) sysdeps/nios2/stackinfo.h: #define DEFAULT_STACK_PERMS (PF_R|PF_W) sysdeps/riscv/stackinfo.h: #define DEFAULT_STACK_PERMS (PF_R | PF_W)
- Type
- property fortify[source]
Whether the current binary was built with Fortify Source (
-DFORTIFY
).- Type
- property libc[source]
If this
ELF
imports any libraries which contain'libc[.-]
, and we can determine the appropriate path to it on the local system, returns a newELF
object pertaining to that library.If not found, the value will be
None
.- Type
- property libc_start_main_return[source]
Address of the return address into __libc_start_main from main.
>>> bash = ELF(which('bash')) >>> libc = bash.libc >>> libc.libc_start_main_return > 0 True
Try to find the return address from main into __libc_start_main. The heuristic to find the call to the function pointer of main is to list all calls inside __libc_start_main, find the call to exit after the call to main and select the previous call.
- Type
- property nx[source]
Whether the current binary uses NX protections.
Specifically, we are checking for
READ_IMPLIES_EXEC
being set by the kernel, as a result of honoringPT_GNU_STACK
in the kernel.READ_IMPLIES_EXEC
is set, according to a set of architecture specific rules, that depend on the CPU features, and the presence ofPT_GNU_STACK
.Unfortunately,
ELF
is not context-aware, so it’s not always possible to determine whether the process of a binary that’s missingPT_GNU_STACK
will have NX or not.The rules are as follows:
ELF arch
linux
GNU_STACK
other
NX
i386
< 5.8 1
non-exec
enabled
exec / missing
disabled
>= 5.8 2
exec / non-exec
enabled
missing
disabled
amd64
< 5.8 1
non-exec
enabled
exec / missing
disabled
>= 5.8 2
exec / non-exec / missing
enabled
arm
< 5.8 3
non-exec*
enabled
exec / missing
disabled
>= 5.8 4
exec / non-exec*
enabled
missing
disabled
mips
< 5.18 5
non-exec*
enabled
exec / missing
disabled
>= 5.18 6
exec / non-exec*
enabled
missing
disabled
powerpc
non-exec / exec
enabled
missing
disabled
powerpc64
exec / non-exec / missing
enabled
ia64
non-exec
enabled
exec / missing
e_flags & EF_IA_64_LINUX_EXECUTABLE_STACK == 0
enabled
e_flags & EF_IA_64_LINUX_EXECUTABLE_STACK != 0
disabled
the rest
exec / non-exec / missing
enabled
* Hardware limitations are ignored.
- If
READ_IMPLIES_EXEC
is set, then all readable pages are executable. if (elf_read_implies_exec(loc->elf_ex, executable_stack)) current->personality |= READ_IMPLIES_EXEC;
- 1(1,2)
-
#define elf_read_implies_exec(ex, executable_stack) \ (executable_stack != EXSTACK_DISABLE_X)
- 2(1,2)
-
#define elf_read_implies_exec(ex, executable_stack) \ (mmap_is_ia32() && executable_stack == EXSTACK_DEFAULT)
- mmap_is_ia32():
/* * True on X86_32 or when emulating IA32 on X86_64 */ static inline int mmap_is_ia32(void)
- 3
-
int arm_elf_read_implies_exec(int executable_stack) { if (executable_stack != EXSTACK_DISABLE_X) return 1; if (cpu_architecture() < CPU_ARCH_ARMv6) return 1; return 0; }
- 4
-
int arm_elf_read_implies_exec(int executable_stack) { if (executable_stack == EXSTACK_DEFAULT) return 1; if (cpu_architecture() < CPU_ARCH_ARMv6) return 1; return 0; }
- 5
-
int mips_elf_read_implies_exec(void *elf_ex, int exstack) { if (exstack != EXSTACK_DISABLE_X) { /* The binary doesn't request a non-executable stack */ return 1; } if (!cpu_has_rixi) { /* The CPU doesn't support non-executable memory */ return 1; } return 0; }
- 6
-
int mips_elf_read_implies_exec(void *elf_ex, int exstack) { /* * Set READ_IMPLIES_EXEC only on non-NX systems that * do not request a specific state via PT_GNU_STACK. */ return (!cpu_has_rixi && exstack == EXSTACK_DEFAULT); }
- 7(1,2)
-
#ifdef __powerpc64__ /* stripped */ # define elf_read_implies_exec(ex, exec_stk) (is_32bit_task() ? \ (exec_stk == EXSTACK_DEFAULT) : 0) #else # define elf_read_implies_exec(ex, exec_stk) (exec_stk == EXSTACK_DEFAULT) #endif /* __powerpc64__ */
- 8
-
#define elf_read_implies_exec(ex, executable_stack) \ ((executable_stack!=EXSTACK_DISABLE_X) && ((ex).e_flags & EF_IA_64_LINUX_EXECUTABLE_STACK) != 0)
- EF_IA_64_LINUX_EXECUTABLE_STACK:
#define EF_IA_64_LINUX_EXECUTABLE_STACK 0x1 /* is stack (& heap) executable by default? */
- 9
-
# define elf_read_implies_exec(ex, have_pt_gnu_stack) 0
- Type
- If
- property relro[source]
Whether the current binary uses RELRO protections.
This requires both presence of the dynamic tag
DT_BIND_NOW
, and aGNU_RELRO
program header.The ELF Specification describes how the linker should resolve symbols immediately, as soon as a binary is loaded. This can be emulated with the
LD_BIND_NOW=1
environment variable.DT_BIND_NOW
If present in a shared object or executable, this entry instructs the dynamic linker to process all relocations for the object containing this entry before transferring control to the program. The presence of this entry takes precedence over a directive to use lazy binding for this object when specified through the environment or via
dlopen(BA_LIB)
.(page 81)
Separately, an extension to the GNU linker allows a binary to specify a PT_GNU_RELRO program header, which describes the region of memory which is to be made read-only after relocations are complete.
Finally, a new-ish extension which doesn’t seem to have a canonical source of documentation is DF_BIND_NOW, which has supposedly superceded
DT_BIND_NOW
.DF_BIND_NOW
If set in a shared object or executable, this flag instructs the dynamic linker to process all relocations for the object containing this entry before transferring control to the program. The presence of this entry takes precedence over a directive to use lazy binding for this object when specified through the environment or via
dlopen(BA_LIB)
.>>> path = pwnlib.data.elf.relro.path >>> for test in glob(os.path.join(path, 'test-*')): ... e = ELF(test) ... expected = os.path.basename(test).split('-')[2] ... actual = str(e.relro).lower() ... assert actual == expected
- Type
- property sections[source]
A list of
elftools.elf.sections.Section
objects for the segments in the ELF.- Type
- property segments[source]
A list of
elftools.elf.segments.Segment
objects for the segments in the ELF.- Type
- property sym[source]
Alias for
ELF.symbols
- Type
- property ubsan[source]
Whether the current binary was built with Undefined Behavior Sanitizer (
UBSAN
).- Type
- class pwnlib.elf.elf.Function(name, address, size, elf=None)[source]
Encapsulates information about a function in an
ELF
binary.- Parameters
- class pwnlib.elf.elf.dotdict[source]
Wrapper to allow dotted access to dictionary elements.
Is a real
dict
object, but also serves up keys as attributes when reading attributes.Supports recursive instantiation for keys which contain dots.
Example
>>> x = pwnlib.elf.elf.dotdict() >>> isinstance(x, dict) True >>> x['foo'] = 3 >>> x.foo 3 >>> x['bar.baz'] = 4 >>> x.bar.baz 4