Article by Ayman Alheraki on January 11 2026 10:37 AM
String instructions in the x86-64 ISA provide specialized operations designed to efficiently manipulate sequences of bytes or words in memory, often referred to as "strings." These instructions are optimized for bulk data operations such as copying, comparing, scanning, or modifying contiguous memory blocks. Although modern software often favors SIMD and optimized library functions for string handling, string instructions remain foundational for low-level programming, OS development, bootstrapping, and embedded systems.
String instructions leverage implicit operands via the RSI (source index), RDI (destination index), and RCX (counter) registers, operating on memory locations with automatic pointer increment or decrement depending on the direction flag (DF) in the RFLAGS register.
RSI (Source Index): Points to the source string location in memory.
RDI (Destination Index): Points to the destination string location in memory.
RCX (Count): Used as a repetition counter for operations, controlling the number of elements processed.
RFLAGS (Direction Flag - DF): Controls pointer increment (DF=0) or decrement (DF=1) during string operations.
MOVS (Move String) Family:
MOVSB — Move byte from [RSI] to [RDI].
MOVSW — Move word (2 bytes).
MOVSD — Move doubleword (4 bytes).
MOVSQ — Move quadword (8 bytes).
Behavior:
Copies data element-by-element from source to destination, automatically adjusting RSI and RDI by the element size depending on DF. When combined with REP prefix (REP MOVSB), the operation repeats RCX times for bulk memory copy.
CMPS (Compare String) Family:
CMPSB, CMPSW, CMPSD, CMPSQ
Behavior:
Compares the byte/word/doubleword/quadword at [RSI] and [RDI].
Sets CPU flags (ZF, CF, SF, OF, etc.) based on comparison result.
Automatically increments/decrements RSI and RDI according to DF.
Can be repeated using REPZ/REPE or REPNE/REPNZ prefixes to perform string comparison until a mismatch is found or RCX reaches zero.
SCAS (Scan String) Family:
SCASB, SCASW, SCASD, SCASQ
Behavior:
Compares the value in AL, AX, EAX, or RAX with the memory byte/word/doubleword/quadword at [RDI].
Sets flags based on comparison.
Adjusts RDI per DF.
When prefixed with REPZ/REPE or REPNE/REPNZ, scans memory until match or mismatch found or RCX decrements to zero.
Commonly used for searching a value within a memory block.
LODS (Load String) Family:
LODSB, LODSW, LODSD, LODSQ
Behavior:
Loads a byte/word/doubleword/quadword from [RSI] into the accumulator register (AL, AX, EAX, RAX), advancing RSI by the operand size considering DF.
Used to sequentially read from memory into registers.
STOS (Store String) Family:
STOSB, STOSW, STOSD, STOSQ
Behavior:
Stores the accumulator register value (AL, AX, EAX, RAX) into the memory location pointed by [RDI] and updates RDI by operand size and direction flag.
Often used for memory initialization or filling.
String instructions support repetition prefixes that facilitate bulk operations without explicit loop control:
REP (Repeat): Repeats the instruction while RCX ≠ 0. Decrements RCX each iteration.
REPE / REPZ (Repeat While Equal / Zero): Repeats while RCX ≠ 0 and Zero Flag (ZF) = 1.
REPNE / REPNZ (Repeat While Not Equal / Not Zero): Repeats while RCX ≠ 0 and ZF = 0.
These prefixes are typically used with MOVS, CMPS, SCAS, LODS, and STOS instructions to accelerate operations such as memory copying, comparison, scanning, loading, and storing across large data blocks.
The DF in the RFLAGS register controls whether RSI and RDI are incremented or decremented after each element operation:
DF = 0 (Clear): RSI and RDI increment by the size of the processed element. This is the default state, enabling forward traversal of memory.
DF = 1 (Set): RSI and RDI decrement by the operand size, allowing backward traversal.
The CLD (Clear Direction Flag) and STD (Set Direction Flag) instructions explicitly control the DF.
While string instructions simplify certain repetitive memory operations, they have limitations in performance and flexibility on modern processors:
They operate serially on elements and do not inherently exploit data-level parallelism offered by SIMD instructions (e.g., AVX-512).
The complexity of repeated string operations can cause pipeline stalls or inefficient micro-op decoding on modern CPUs.
Compiler-generated code often replaces string instructions with intrinsic SIMD routines or optimized loops to better leverage hardware capabilities.
Nevertheless, understanding and correctly assembling string instructions is critical in low-level system programming, bootloaders, BIOS/UEFI development, and OS kernel routines, where minimal runtime dependencies and direct hardware control are paramount.
Copy 64 bytes from source to destination:
tmov rcx, 64cld ; Clear direction flag for forward movementrep movsb ; Repeat move byte from [rsi] to [rdi] rcx timesScan for a byte value 0x0A in a buffer:
xmov al, 0x0Amov rcx, buffer_lengthcldrepne scasb ; Repeat scan byte until AL matches or rcx=0
x86-64 string instructions provide a compact and expressive set of instructions for operating on sequences of bytes or larger elements in memory, controlled implicitly via dedicated registers and the direction flag. Though modern software frequently replaces these with SIMD or compiler intrinsics for performance, string instructions remain a vital part of the architecture’s instruction set, essential for assembler designers to implement with precise encoding and semantic fidelity.