Article by Ayman Alheraki on January 11 2026 10:37 AM
In the architecture of an x86-64 assembler, the opcode table forms the backbone for instruction encoding and decoding. To correctly translate assembly mnemonics into machine code, the assembler must maintain a comprehensive and fully expanded listing of opcodes, accounting for all possible instruction variants.
A complete opcode listing includes every base opcode along with its associated variants. These variants emerge due to differences in operand types, operand sizes, addressing modes, prefixes, and instruction extensions (such as SIMD or AVX). The assembler uses this expanded listing to:
Identify the precise opcode bytes corresponding to an instruction variant.
Apply correct prefixes and operand encodings.
Validate operand compatibility.
Enable accurate disassembly and debugging features.
Maintaining this expanded table is critical for generating correct and optimized machine code, particularly in the complex x86-64 environment with its vast instruction set.
Instruction variants arise primarily from:
Operand Size Variations: Many instructions support multiple operand sizes, e.g., 8-bit (byte), 16-bit (word), 32-bit (double word), and 64-bit (quad word). Operand-size override prefixes (0x66) and REX prefixes adjust opcode behavior accordingly.
Register vs. Memory Operands: Instructions often have distinct encodings depending on whether operands are registers, memory addresses, or immediate values.
Addressing Modes: Different modes such as direct, indirect, indexed, or based addressing require variant encodings in ModR/M and SIB bytes.
Immediate Operand Sizes: Immediate values may be encoded in 8, 16, 32, or 64 bits depending on instruction and mode.
Segment Overrides and Lock Prefixes: Certain instructions support segment override prefixes or lock prefixes, resulting in different opcode forms.
Instruction Extensions: Extended instruction sets like SSE, AVX, AVX-512 introduce new opcode variants with additional prefixes (VEX, EVEX), wider operands, and masking features.
Opcode Maps and Escape Codes: Some instructions use opcode escape bytes (0x0F, 0x0F38, 0x0F3A) to extend the opcode space, creating more variants.
Each entry in the opcode table typically contains:
Mnemonic: Instruction name with operands specified.
Opcode Bytes: One or more bytes representing the base opcode and possible escape codes.
Operands: Detailed operand types and constraints (register, immediate, memory, segment).
Prefixes: Required prefixes, including operand size override, REX, VEX, or EVEX prefixes.
ModR/M and SIB Information: Indicates if ModR/M or SIB bytes are needed and their encoding rules.
Encoding Notes: Special handling or exceptions (e.g., opcode extensions, reserved fields).
Instruction Flags: Indications of privileged instructions, side effects, or special behavior.
Consider the instruction MOV. Its opcode expands into many variants depending on source and destination operands:
Register to Register: MOV r32, r32 encoded with opcode 0x89 and ModR/M byte specifying registers.
Immediate to Register: MOV r64, imm64 uses opcode 0xB8 + register index with immediate 64-bit value.
Memory to Register: MOV r32, [mem] uses opcode 0x8B plus addressing bytes.
Register to Memory: MOV [mem], r32 uses opcode 0x89 plus addressing bytes.
Byte Variants: MOV r8, r8 with opcode 0x88 or 0x8A depending on direction.
Segment Registers: MOV sr, r16 or vice versa with specific opcodes.
Each variant requires specific opcode bytes, prefixes, and operand encodings, all of which must be represented in the opcode table.
In x86-64, instruction variants heavily depend on prefix bytes:
Operand-Size Override Prefix (0x66): Switches operand size from 32-bit default to 16-bit in legacy mode.
REX Prefix (0x40–0x4F): Enables 64-bit operand size, extended registers, and special addressing modes. Each REX prefix bit modifies opcode interpretation, effectively doubling or extending the opcode variant count.
VEX/EVEX Prefixes: Used in SIMD/AVX instructions to encode operand size, vector length, masking, and instruction variants.
The opcode table must encode these prefix dependencies explicitly, ensuring the assembler emits correct prefix bytes and sequences.
Due to the large number of variants (often thousands for modern x86-64), maintaining opcode tables manually is error-prone and inefficient. Contemporary assembler designs employ:
Data-Driven Approaches: Opcode entries defined as data structures, enabling automated parsing and generation.
Specification-Based Generation: Using instruction set specifications (in XML, JSON, or domain-specific languages) to generate opcode tables and encoding logic automatically.
Hierarchical Encoding: Grouping instructions by opcode maps and variant classes to reduce redundancy.
Testing Frameworks: Automated verification against official instruction set references to validate opcode correctness.
The full expanded opcode listing with instruction variants is essential for the precise encoding of x86-64 assembly instructions. It captures every permutation of operand size, addressing mode, prefixes, and instruction extensions. An effective opcode table is comprehensive, well-structured, and supported by automated tools to handle the scale and complexity of the x86-64 ISA. Mastery of this aspect directly impacts assembler correctness, efficiency, and maintainability.