Logo
Articles Compilers Libraries Books MiniBooklets Assembly C++ Linux Others Videos
Advertisement

Article by Ayman Alheraki on January 11 2026 10:37 AM

Designing an x86-64 Assembler Categorized Instruction Reference

Designing an x86-64 Assembler: Categorized Instruction Reference

 

Data Movement

1. Data Movement Instructions Overview

Data movement instructions are fundamental to the x86-64 architecture, enabling transfer of data between registers, memory, and I/O ports. These instructions form the backbone of all computation by facilitating operand setup, intermediate result handling, and final data storage.

In x86-64 ISA, data movement encompasses a broad range of instructions that vary in operand types, sizes, addressing modes, and semantics. This section provides a detailed overview of the categories and key instructions related to data movement.

2. Register-to-Register Moves

  • The simplest and most common data movement operations involve moving data between general-purpose registers (GPRs) or SIMD registers.

  • The core instruction is MOV, which copies data from the source operand to the destination operand.

  • Both operands can be registers of the same size (e.g., MOV RAX, RBX moves the full 64-bit value from RBX to RAX).

  • Operand sizes vary from 8-bit (AL, BL), 16-bit (AX, BX), 32-bit (EAX, EBX), to 64-bit (RAX, RBX).

  • Overlapping registers are allowed; the assembler and CPU handle partial register updates (e.g., moving into the lower 32 bits zero-extends to 64 bits on x86-64).

Notable behavior:

  • Moves between SIMD registers use different instructions (e.g., MOVAPS, VMOVDQA for aligned packed data).

3. Register-to-Memory and Memory-to-Register Moves

  • The MOV instruction is also used to transfer data between memory and registers.

  • The source or destination operand can be a memory address specified through complex addressing modes (base register, index register, scale, displacement).

  • Memory operands can be byte-sized, word-sized, doubleword, quadword, or vector types, depending on the instruction.

Examples:

  • MOV RAX, [RBX+8] loads a 64-bit value from memory address computed by RBX + 8 into RAX.

  • MOV [RCX], EDX stores the 32-bit contents of EDX into the memory location pointed by RCX.

Key considerations:

  • Alignment of memory operands affects performance but not correctness; certain instructions require aligned addresses (MOVAPS).

  • The assembler must encode correct ModR/M and SIB bytes to support all addressing modes.

4. Immediate to Register/Memory Moves

  • The MOV instruction supports moving immediate constant values directly into registers or memory locations.

  • Immediate values can be of varying sizes matching the operand size.

Example:

  • MOV RAX, 0x123456789ABCDEF0 loads a 64-bit immediate into RAX.

  • MOV BYTE PTR [RDX], 0xFF stores an 8-bit immediate value to memory.

  • Special zero- or sign-extension instructions are unnecessary in this context because immediate values are encoded in the instruction itself.

5. Specialized Data Movement Instructions

Beyond the basic MOV, the x86-64 ISA provides specialized instructions for specific use cases:

  • MOVZX (Move with Zero-Extend): Moves data from a smaller source operand to a larger destination register, zero-filling the upper bits.

    Example: MOVZX RAX, BYTE PTR [RBX] loads an 8-bit value from memory and zero-extends to 64 bits.

  • MOVSX (Move with Sign-Extend): Similar to MOVZX, but sign-extends the source operand.

    Example: MOVSX RAX, WORD PTR [RCX] loads a 16-bit signed value and sign-extends it to 64 bits.

  • LEA (Load Effective Address): Calculates the address of a memory operand and loads it into a register without accessing memory.

    Example: LEA RAX, [RBX+RCX*4+16] computes RBX + RCX*4 + 16 and stores the result in RAX.

  • XLAT (Translate Byte): Uses the AL register as an index into a lookup table in memory to load a translated byte.

  • MOVS (Move String): Moves data from the memory address pointed to by RSI to the memory address pointed to by RDI. Variants like MOVSB, MOVSW, MOVSD, and MOVSQ move bytes, words, doublewords, or quadwords respectively, and automatically update the pointers according to the Direction Flag.

6. SIMD Data Movement Instructions

  • Data movement in SIMD registers is handled with instructions specific to vector data.

  • MOVAPS/MOVUPS: Move aligned/unaligned packed single-precision floating-point data between XMM registers or memory.

  • VMOVDQA/VMOVDQU: Move aligned/unaligned packed integer data in YMM or ZMM registers.

  • VMOVAPS/VMOVUPS (AVX): Extended versions of these instructions use VEX or EVEX prefixes to operate on YMM/ZMM registers.

  • Broadcast Moves: Instructions like VBROADCASTSS replicate a single scalar value across all elements of a vector register.

  • The assembler must recognize these instructions and encode appropriate prefixes for AVX and AVX-512 support.

7. I/O Port Data Movement

  • The IN and OUT instructions move data between the CPU and hardware I/O ports.

  • These are primarily used for low-level hardware communication.

  • Operands can be immediate or in the DX register to specify port number; data sizes vary between 8, 16, or 32 bits.

8. Atomic Data Movement

  • Atomic moves are critical in multithreaded environments to avoid data races.

  • The XCHG instruction exchanges data atomically between registers and memory.

  • While MOV is generally non-atomic, certain encodings of XCHG with a register and memory location are atomic without requiring a LOCK prefix.

  • The assembler must handle instruction variants and prefixes to ensure correct atomicity semantics.

9. Instruction Encoding Notes

  • Data movement instructions use complex encoding involving opcode bytes, ModR/M bytes, SIB bytes (for scale-index-base addressing), and immediate values.

  • Proper handling of REX prefixes in 64-bit mode is essential to access extended registers (R8-R15) and 64-bit operand sizes.

  • AVX and AVX-512 instructions introduce VEX and EVEX prefixes, changing encoding significantly for SIMD data movement.

  • The assembler design must modularize encoding logic to correctly produce valid machine code for all addressing modes and operand combinations.

10 Summary

Data movement instructions are the essential building blocks of assembly programming on x86-64. Mastery of these instructions and their nuances—including size extensions, addressing modes, specialized SIMD variants, and atomic operations—is indispensable for designing a robust assembler.

An assembler must accurately support the rich and evolving set of data movement instructions while optimizing encoding for various CPU features and modes. This foundation enables efficient program execution and correct data handling in all x86-64 applications.

 

Advertisements

Responsive Counter
General Counter
1001626
Daily Counter
826