Logo
Articles Compilers Libraries Books MiniBooklets Assembly C++ Linux Others Videos
Advertisement

Article by Ayman Alheraki on January 11 2026 10:37 AM

Currently exploring the design of a native code generator for ForgeVM

Currently exploring the design of a native code generator for ForgeVM.

This is a modular, scalable code generator for ForgeVM, focusing on direct translation from a high-level representation (AST → IR) to native assembly or machine code for x86-64 and ARM64.

 

Part 1: Abstract Architecture Overview

You skip the virtual machine and intermediate bytecode, and directly generate native code via IR → backend.

 

Part 2: Designing ForgeVM Intermediate Representation (IR)

This is a minimal and portable abstraction of instructions, general enough to represent logic across all CPUs.

Goals:

  • Easy to parse and serialize (JSON or binary)

  • Directly translatable to native CPU instructions

  • Not Turing-complete, only representable actions (no loops or conditions unless explicitly encoded)

IR Instruction Format (JSON Example):

Supported Instructions in Phase 1:

IR InstructionDescription
movMove constant or register value to another register
add / subArithmetic operations
mul / divOptional: Arithmetic ops
callCall external function
retReturn from function
cmp, jmp, je, jneConditional execution (phase 2)

 

Part 3: Backend Design – x86-64 Code Generator

Register Mapping:

IR Regx86-64 Reg
r1rax
r2rbx
r3rcx
...r8–r15

 

IR to Assembly Translation:

IRx86_64 ASM
mov r1, 42mov rax, 42
mov r2, 10mov rbx, 10
add r1, r2add rax, rbx
call printcall print (must be declared)
retret

 

Output:

Assembling:

  • Use nasm or gas to assemble this into an executable

  • Or link with C runtime if using main: entry

 

Part 4: Backend Design – ARM64 Code Generator

Register Mapping:

IR RegARM64 Reg
r1x0
r2x1
r3x2
...x3–x28

 

IR to Assembly Translation:

IRARM64 ASM
mov r1, 42mov x0, #42
mov r2, 10mov x1, #10
add r1, r2add x0, x0, x1
call printbl print (branch with link)
retret

 

Output:

Part 5: Code Emission Approaches

Option 1: Emit Text-Based Assembly

  • Output .s file

  • Assemble with:

    • nasm for x86_64

    • as or clang -c for ARM64

  • Link with ld or clang

Option 2: Emit Machine Code Directly

Option 3: Emit In-Memory Executable Code

  • For a JIT-style design, allocate RWX memory, write machine code, and execute it

  • On Linux: mmap + mprotect

  • On Windows: VirtualAlloc


Part 6: Minimal C++ Code Generator Skeleton

Part 7: Future Directions

Add Support For:

  • Conditional branches: cmp, jmp, je, etc.

  • Function calls & stack frame layout

  • Calling convention support (System V ABI on Linux, Windows x64 ABI)

  • Register allocation algorithm

  • Live variable analysis for optimization

Static Analysis Integration

  • Ensure instructions do not break calling conventions

  • Analyze for register clobbering or invalid instructions

 

Optional: ForgeVM CLI Tool

Example usage:

 

Bonus: Use Existing Libraries

PurposeLibrary
Emit machine codeAsmJit, Keystone
Parse JSON IRRapidJSON, nlohmann/json
AssemblerNASM, GAS, Clang
Optional LLVM linkUse LLVM MC layer if needed

 


Summary

Your ForgeVM code generator:

  • Accepts high-level source or IR

  • Targets native code directly (x86-64, ARM64)

  • Avoids bytecode completely

  • Enables modular backend expansion

  • Long-term: can evolve into an optimizing native compiler

 

 

When You Don't Need an IR

You can skip the IR entirely and go straight from the AST (Abstract Syntax Tree) or even the parser output directly to native code (assembly or machine code). This approach is called direct code generation.

Valid Use Cases:

  1. Tiny Language / DSL (Domain-Specific Language)

    • If you're designing a small language (e.g., configuration language, scripting for a game engine), you can generate native code directly from syntax trees.

    • No complex transformations or optimizations are needed.

  2. Educational Compilers

    • In tutorials or university projects, IR may be unnecessary. Teaching can focus on the basics of parsing and code generation.

  3. 1:1 Language Translation (Source-to-Source)

    • Translators that map language A to language B (e.g., transpiling Pascal to C++) may not need an IR if the grammar and semantics align well.

  4. Extremely Simple VMs

    • If you're writing a VM that executes instructions directly from a high-level language like Lisp or BASIC, you might interpret the AST or token stream directly.

  5. Special-Purpose Ahead-of-Time Compilers

    • If you compile only a small subset of a language, or compile fixed templates (e.g., SQL query compilers), you can emit machine code directly.


What You Lose by Skipping IR

  1. No Cross-Architecture Abstraction

    • Without IR, you must generate a new backend for every CPU architecture directly from the AST or parser. That makes multi-targeting harder.

  2. Limited Optimizations

    • You lose a centralized phase where optimizations like dead-code elimination, constant folding, inlining, or register allocation could happen.

  3. Difficult to Reuse Logic Across Architectures

    • If you want to support both x86 and ARM, it’s harder without a neutral IR between the parser and the backend.

  4. Less Debugging Flexibility

    • You cannot inspect or analyze intermediate steps, which hurts debugging and testing.


When It's Worth Skipping IR

CriteriaSkip IR?
Language size: SmallYes
Performance: Low demandYes
Portability: Not requiredYes
You control all target platformsYes
You need optimization or architecture independenceNo

 

Example

Let’s say you have a very small scripting language:

You can generate x86-64 directly:

You don’t need to convert 1 + 2 into IR like:

Just generate the machine code or assembly on the fly.


Conclusion

You can skip IR if:

  • Your language is small, simple, or static.

  • You are targeting only one platform.

  • You don’t need cross-architecture support or aggressive optimizations.

But for large, portable, or optimized languages, an IR is strongly recommended.

Advertisements

Responsive Counter
General Counter
1001617
Daily Counter
817