SimplifyC++

Article by Ayman Alheraki on January 11 2026 10:37 AM

#9 Foundation and Architecture Language Implementation Project Structure - Dependency Management — Lexer, Parser, Run

#9 Foundation and Architecture: Language Implementation Project Structure -> Dependency Management — Lexer, Parser, Runtime

github : https://github.com/ForgeLang/LearnSeries

In building a modular interpreter using modern C++20/23, one of the most important engineering tasks is to carefully define and manage dependencies between core components: the lexer, parser, and runtime. These three modules form the heart of any language engine, and poor coupling among them can result in architectural rigidity, untestable code, and difficult feature expansion.

This section describes how to architect dependencies between these components using clear boundaries, modern C++ idioms, and compiler-level guarantees. The goal is to achieve modular cohesion, minimal coupling, and maximum clarity, using idiomatic C++20/23 design principles.

1. Why Dependency Management Matters in a Language Project

In an interpreter or compiler, each phase of the pipeline must consume only what it needs, and produce outputs suitable for the next stage, without relying on runtime details of other phases. Poor separation typically manifests in:

Lexer calling parser logic (bad design)
Runtime logic embedded into AST nodes
Cyclical dependencies between semantic analysis and evaluation
Direct variable sharing or implicit globals

A clear dependency model eliminates these problems and enables unit testing, parallel development, and future extensions, such as JIT compilation, static analysis, or embedded REPLs.

2. High-Level Component Boundaries

The architecture follows a layered structure:


x
          ┌────────────┐
          │   Source   │
          └────┬───────┘
               ▼
        ┌──────────────┐
        │   Lexer      │  → TokenStream
        └────┬─────────┘
             ▼
        ┌──────────────┐
        │   Parser     │  → AST
        └────┬─────────┘
             ▼
        ┌──────────────┐
        │ Semantics    │  → Typed AST / Checked AST
        └────┬─────────┘
             ▼
        ┌──────────────┐
        │   Runtime    │  → Evaluation, Built-ins, Stack Frames
        └──────────────┘

Only downward dependencies are allowed. Each layer depends only on the layer(s) below it, never above.

3. Lexer: Input Tokenization Layer

Purpose:

Accepts a source buffer (std::string_view)
Produces a linear sequence of tokens (TokenStream)
Does not depend on AST, parser, or runtime

Dependencies:

core/SourceManager.hpp for span tracking
core/ErrorReporter.hpp for structured errors
Standard library only

Output:

Token structure with type, lexeme, source location
TokenStream: typically std::vector<Token> or an iterator view

Modern C++ Tools:

std::string_view, std::optional, std::variant for token values
std::source_location (C++20) for diagnostics
constexpr lexers for testing and compile-time evaluation

4. Parser: Syntax Construction Layer

Purpose:

Receives a stream of tokens from the lexer
Produces an Abstract Syntax Tree (AST)
Does not call or rely on runtime evaluation

Dependencies:

lexer/Token.hpp
core/SourceManager.hpp
core/ErrorReporter.hpp
Internal AST definitions (ast/Expr.hpp, ast/Stmt.hpp)

Output:

AST nodes represented using std::variant or algebraic types

For example:


using Expr = std::variant<BinaryExpr, LiteralExpr, CallExpr, VarExpr>;

Design Notes:

Use recursive descent with backtracking or lookahead
Report syntax errors gracefully via std::expected or error accumulation

Modern C++ Tools:

std::visit, std::monostate for variant traversal
Concepts and constraints for AST transformations

5. Runtime: Execution Layer

Purpose:

Receives a fully parsed and semantically valid AST
Executes the program in an environment model
Provides built-in functions, memory, control flow, stack

Dependencies:

AST definitions
Symbol table or semantic results
core/Environment.hpp, core/Value.hpp, and runtime data structures

Key Responsibilities:

Interprets AST using a visitor or evaluation engine
Manages a scoped environment (stack frames, variables, closures)
Provides built-in I/O and standard library functions

Output:

Result of program execution (Value)
Can be used for REPL or scripting interface

Modern C++ Tools:

std::variant to represent runtime values:


using Value = std::variant<IntValue, FloatValue, BoolValue, StringValue, FunctionValue>;

std::function or lambdas for native function calls
std::jthread, std::future, or coroutine-based async features for concurrency (optional)

6. Example of Dependency Direction (CMake and Code)

Let’s say we define libraries like this in CMakeLists.txt:


x
add_library(ForgeCore ...)
add_library(ForgeLexer ...)
add_library(ForgeParser ...)
add_library(ForgeRuntime ...)

# Dependencies
target_link_libraries(ForgeLexer PUBLIC ForgeCore)
target_link_libraries(ForgeParser PUBLIC ForgeLexer ForgeCore)
target_link_libraries(ForgeRuntime PUBLIC ForgeParser ForgeCore)

This ensures that:

Lexer is completely independent
Parser depends only on Lexer and Core
Runtime depends only on Parser and Core
No module introduces upward or cyclic dependencies

7. AST and Value Boundary

A key architectural boundary lies between:

Parser output (AST)
Runtime input (Value system)

The AST must never carry runtime values. This enforces purity and makes the AST reusable for:

Static analysis
Code formatting
Type checking
Compilation or transpilation

All runtime data is generated and stored after parsing, within the evaluation engine.

8. Semantic Analysis as an Optional Intermediate Layer

To maintain decoupling between syntax and execution, a semantic layer can act as an intermediate verifier:

Ensures type correctness
Infers return types
Validates function signatures
Annotates AST nodes with resolved types or symbol bindings

This layer produces either:

A typed AST (decorated with type info)
Or a symbol table used by the runtime

It can optionally cache or transform parts of the AST for optimization.

9. Runtime Extensions and Isolation

To maintain a clean runtime interface:

Built-in functions (e.g., print, input, len) are registered explicitly
External native libraries can be loaded dynamically or statically linked
Runtime APIs are defined through interfaces and Value conversions

In C++20/23, this is aided by:

std::function<Value(const std::vector<Value>&)> for native function wrappers
std::span for passing slices safely
Optional use of modules to isolate standard library extensions

10. Testing Each Component in Isolation

Modular dependency design enables:

Lexer unit tests using raw source strings
Parser tests using token streams
AST tests using manually constructed nodes
Runtime tests using evaluation contexts and mock environments

Example: Test just the parser:


lexer(source);
Parser parser(lexer.tokenize());
AST ast = parser.parse_expression();
// Assertions on structure

Conclusion

Managing dependencies between the lexer, parser, and runtime is a foundational engineering principle when building a modern interpreter. With modern C++20/23, developers can enforce type-safe boundaries, minimize coupling, and structure their interpreter as a collection of focused, testable units.

By keeping components strictly layered and avoiding runtime dependencies in syntax and analysis phases, we enable a clean, composable, and scalable language infrastructure that supports both growth and correctness.

#9 Foundation and Architecture: Language Implementation Project Structure -> Dependency Management — Lexer, Parser, Runtime

1. Why Dependency Management Matters in a Language Project

2. High-Level Component Boundaries

3. Lexer: Input Tokenization Layer

4. Parser: Syntax Construction Layer

5. Runtime: Execution Layer

6. Example of Dependency Direction (CMake and Code)

7. AST and Value Boundary

8. Semantic Analysis as an Optional Intermediate Layer

9. Runtime Extensions and Isolation

10. Testing Each Component in Isolation

Conclusion

Advertisements