Logo
Articles Compilers Libraries Books MiniBooklets Assembly C++ Rust Go Linux CPU Others Videos
Advertisement

Article by Ayman Alheraki on January 11 2026 10:37 AM

x86-64 Machine Code Encoding Overview of Machine Code Format

x86-64 Machine Code Encoding : Overview of Machine Code Format

 

Machine code encoding in the x86-64 architecture is a complex, multi-component format designed to provide flexible and efficient instruction representation. Each instruction's binary encoding consists of multiple fields that specify the operation, the operands, addressing modes, and additional modifiers. Understanding these components is essential for designing an assembler capable of translating human-readable assembly into accurate machine code.

This section details the fundamental elements of x86-64 machine code encoding: prefixes, REX prefixes, opcode bytes, ModR/M byte, SIB byte, displacement values, and immediate data.

1. Instruction Prefixes

Prefixes are optional one-byte values placed before the opcode to modify or extend instruction behavior. They serve various purposes:

  • Lock and Repeat prefixes:

    • 0xF0 (LOCK): Ensures exclusive access to memory in multi-processor systems.

    • 0xF2 (REPNE/REPNZ) and 0xF3 (REP/REPE/REPZ): Used to repeat string instructions conditionally or unconditionally.

  • Segment override prefixes:

    • 0x2E (CS), 0x36 (SS), 0x3E (DS), 0x26 (ES), 0x64 (FS), 0x65 (GS) These override the default segment used for memory addressing, critical in legacy code and system programming.

  • Operand-size override prefix:

    • 0x66: Switches the operand size from the default (16-bit or 32-bit) to the other mode, especially important in compatibility modes.

  • Address-size override prefix:

    • 0x67: Changes the default addressing mode size (32-bit vs. 16-bit addressing).

Key Notes:

  • Multiple prefixes can appear before an instruction, but their order and combination must adhere to architectural rules.

  • The assembler must detect and encode necessary prefixes for correct operand/address size or special behavior.

2. REX Prefix

The REX prefix is a one-byte prefix introduced in x86-64 to extend register addressing and operand size. Its format is:

  • W: Operand size extension bit (1 = 64-bit operand).

  • R: Extension of the ModR/M reg field.

  • X: Extension of the SIB index field.

  • B: Extension of the ModR/M r/m field or SIB base field.

Purpose:

  • Extends the register set from 8 general-purpose registers (RAX, RBX, etc.) to 16 registers (R8–R15).

  • Enables 64-bit operand size explicitly (when W=1), overriding default operand sizes.

Encoding Details:

  • REX prefix byte range: 0x40 to 0x4F.

  • Must be present for instructions using extended registers or 64-bit operands if no other prefix already forces 64-bit.

Assembler Role:

  • Detect operand size and register usage to determine if REX prefix is required.

  • Encode bits accordingly to ensure correct register addressing.

3. Opcode Bytes

The opcode specifies the operation to be performed by the processor. It can range from one to three bytes or even longer for certain instructions.

  • One-byte opcodes: Common instructions like add, mov, jmp.

  • Two-byte opcodes: Begin with the escape byte 0x0F, followed by a second opcode byte (e.g., conditional jumps, extended instructions).

  • Three or more bytes: Include additional escape bytes for further instruction set extensions, e.g., AVX instructions.

Important Points:

  • Opcode bytes alone often do not specify all operand information; further encoding (ModR/M, SIB) is needed.

  • The assembler must select the correct opcode based on the mnemonic, operand types, and operand sizes.

4. ModR/M Byte

The ModR/M byte is critical for operand specification. It encodes the addressing mode, register operand, and the register or memory operand.

Format (bits):

  • Mod (2 bits): Specifies addressing mode:

    • 00: Register indirect or memory with no displacement (except when R/M=5).

    • 01: Memory with 8-bit signed displacement.

    • 10: Memory with 32-bit signed displacement.

    • 11: Register-direct addressing mode.

  • Reg/Opcode (3 bits): Specifies either a register operand or extended opcode.

  • R/M (3 bits): Specifies register or memory addressing mode.

Function:

  • Specifies the operands, whether they are registers or memory addresses.

  • Works closely with SIB byte for complex addressing.

Assembler Tasks:

  • Determine proper Mod bits based on operand addressing mode.

  • Encode correct Reg and R/M fields for involved registers or memory operands.

5. SIB (Scale-Index-Base) Byte

The SIB byte provides more flexible memory addressing using scaled index registers and base registers.

Format (bits):

  • Scale (2 bits): Scale factor applied to index register (1, 2, 4, or 8).

  • Index (3 bits): Register used as the index.

  • Base (3 bits): Base register for memory addressing.

Usage Conditions:

  • Present only if ModR/M’s R/M field equals 100 (in 32-bit or 64-bit addressing modes).

  • Enables addressing like [base + index*scale + displacement].

Assembler Role:

  • Encode SIB byte when operand memory addressing involves scaled indexing.

  • Correctly handle cases when index or base are absent (e.g., no base register means a special encoding).

6. Displacement

Displacement is an optional signed value added to the effective address calculation in memory operands.

  • Can be 0 bytes (no displacement), 1 byte (8-bit), or 4 bytes (32-bit) depending on addressing mode.

  • Used with ModR/M and SIB bytes to form complete memory addresses.

Examples:

  • [rbx] — no displacement.

  • [rbx + 0x10] — 8-bit or 32-bit displacement, depending on size.

  • [rbx + rsi*4 + 0x1234] — uses SIB and displacement.

Assembler Responsibilities:

  • Determine displacement size based on operand and proximity of target address.

  • Encode displacement bytes immediately following ModR/M and SIB bytes.

7. Immediate Data

Immediate data represents constant values encoded directly in the instruction after all other components.

  • Size varies from 1 to 4 or 8 bytes depending on instruction and operand size.

  • Examples include immediate operands for mov, add, cmp, or jump offsets in relative jumps.

Characteristics:

  • Always appear at the end of instruction encoding.

  • May be signed or unsigned.

  • Some instructions accept zero immediate values optimized as special encodings.

Assembler Tasks:

  • Encode immediate values in the correct size and endian order (little endian).

  • Validate immediate value fits the specified operand size.

Summary

An x86-64 instruction's machine code is a carefully structured concatenation of:

  1. Prefixes that modify behavior and operand/address sizes.

  2. An optional REX prefix extending register encoding and operand size to 64 bits.

  3. The opcode bytes that identify the instruction.

  4. A ModR/M byte specifying operand addressing modes and registers.

  5. An optional SIB byte for complex scaled addressing.

  6. An optional displacement value extending memory addressing.

  7. Optional immediate operand data encoding constants.

Mastery of these components and their interrelations is essential for accurate encoding and decoding of instructions. For assembler design, this means implementing precise logic for selecting, encoding, and validating each field to ensure the binary output precisely reflects the intended operations.

 

Advertisements

Responsive Counter
General Counter
1272216
Daily Counter
770