Advanced Topics Adding Debug Info

Article by Ayman Alheraki on January 11 2026 10:37 AM

Advanced Topics Adding Debug Info — DWARF Overview

Advanced Topics : Adding Debug Info — DWARF Overview

1. Introduction to Debug Information in Assemblers

Debug information is a critical aspect of modern software development, allowing developers to trace execution, inspect variables, and map machine code back to source code. Without debug info, even the most powerful debuggers cannot associate binary instructions with meaningful source-level constructs. For assemblers, this means supporting output that complies with established debugging formats.

The DWARF (Debugging With Attributed Record Formats) standard is the most widely used debug information format for ELF-based systems (Linux, BSD, etc.) and is supported by most modern debuggers like GDB and LLDB. This section explores how DWARF works, what components are needed to generate it, and how an x86-64 assembler can emit useful DWARF debug sections.

2. What is DWARF?

DWARF is a standardized, extensible, and hierarchical debugging data format. It describes the relationship between machine-level instructions and the original source code constructs such as:

Source file names and line numbers
Function names and boundaries
Variable names, types, and locations (registers or memory)
Stack frame layout and call information

DWARF is designed to be language-agnostic and architecture-independent, making it a robust choice for modern debugging.

3. DWARF Versions and Structure

As of the 2020s, DWARF Version 5 is the most recent stable version, introducing improvements over earlier versions such as:

Better compression and faster lookup tables
Split DWARF support (debug info separated from binaries)
Improved descriptions for optimized code

A DWARF-enabled object file contains multiple dedicated sections, including:

.debug_info: Main tree structure of debug entries (DIEs)
.debug_abbrev: Abbreviation table for compact DWARF encoding
.debug_line: Line number mapping
.debug_str: String table for names and file paths
.debug_loc: Location lists for variables
.debug_ranges or .debug_rnglists: Address ranges associated with entries

Each section has a compact binary format and may require address or relocation resolution at link time.

4. Debugging Information Entries (DIEs)

The core of DWARF is the Debugging Information Entry (DIE). A DIE represents a source-level entity such as a function, variable, type, or scope. DIEs form a tree hierarchy rooted in a compilation unit.

Example DIE hierarchy:


Compilation Unit DIE
├── Subprogram DIE (function)
│   ├── Formal Parameter DIE
│   └── Variable DIE
└── Another Subprogram DIE

Each DIE includes:

A tag (e.g., DW_TAG_subprogram, DW_TAG_variable)
Attributes (name, type, location, low_pc, high_pc, etc.)
Optional children forming nested scopes

5. Line Number Program (LNP)

The .debug_line section maps program counters to source file and line numbers. This is essential for breakpoints, stepping, and tracing during debugging. The LNP is a virtual machine (DWARF Line Number VM) that emits opcodes describing instruction addresses and their associated file/line pairs.

A line number entry includes:

File index (from file table)
Line number
Instruction address (PC)
Flags like is_stmt or end_sequence

To implement the .debug_line section:

Track source file and line during assembly.
Emit line number VM opcodes that encode transitions.
Finalize the table with end_sequence markers at the end of functions or sections.

6. Emitting DWARF from an Assembler

To emit DWARF in an assembler, you must:

Track and store symbolic debug information during parsing (labels, line numbers, symbols).
Construct DWARF sections as binary streams using your own encoders or existing libraries.
Maintain abbreviation tables and string tables for reuse and compression.
Resolve addresses and section offsets at output or relocation time.

The assembler must maintain internal data structures to:

Map source files to index values.
Track current function and variable scopes.
Store variable locations (register or memory offset).
Manage ranges and location lists.

7. Integration with Linkers and Debuggers

DWARF information is typically consumed by linkers and debuggers. At link time, symbolic addresses are resolved, and final layout adjustments are made. The assembler must:

Emit relocations for all DWARF entries referencing code symbols.
Align all debug sections appropriately.
Optionally support emitting debug symbols to external files for Split DWARF.

After linking, tools like objdump -W, readelf --debug-dump, gdb, and lldb can verify or use the debug sections.

8. Advanced Topics

Support for Inlined Functions

DWARF can represent inlined code using DW_TAG_inlined_subroutine, mapping multiple PC ranges back to a single source function.

Compressed Debug Sections

DWARF can be compressed using standards like .zdebug_info to reduce binary size. Assemblers may optionally support compression flags or emit already-compressed debug sections.

Split DWARF

Split DWARF allows separating debug info from binaries to reduce executable size. The assembler must emit special references and ensure compatibility with linkers that merge .dwo files.

9. Challenges and Considerations

Complexity: Generating correct and compliant DWARF is non-trivial. It requires deep understanding of both the format and the assembler’s internal symbol and instruction state.
Toolchain Compatibility: Your assembler's DWARF output must align with expectations from linkers and debuggers. Testing against multiple tools is essential.
Incremental Generation: Emitting DWARF inline during assembly requires buffering and backpatching since many addresses are only known after code emission.

10 Conclusion

Adding DWARF support to your x86-64 assembler significantly enhances its utility in real-world software development. It enables source-level debugging, symbolic inspection, and integration into modern toolchains. While complex, implementing DWARF solidifies your assembler's maturity and aligns it with professional standards. Begin by supporting .debug_info and .debug_line, then progressively add .debug_abbrev, .debug_str, and .debug_loc as your assembler evolves.