Exporters & deliverables

warden export converts the knowledge base for a given version into one of six deliverable formats. All six are deterministic: given the same KB state, the output is byte-identical on every run. That means they diff cleanly in git and compose naturally with CI pipelines.

The six formats

headers

A C header of recovered function prototypes. Use this to feed names into a downstream C/C++ toolchain or as a human-readable symbol sheet.

pseudo

Per-function listings: the recovered name, type signature, agent summary, provenance, and lifted pseudo-C (requires the original .wasm to be present; falls back to a mnemonic count otherwise). Use this for manual review.

kb-text

A columnar, stable text dump of every symbol (index, stable id, lock flag, provenance, confidence, and name), sorted by function index. The primary format for committing alongside source and reading diffs across versions.

ghidra

A Python script that pushes recovered names back into a Ghidra project. Run it from Ghidra’s Python console after loading the same module.

csv

A neutral CSV file of every defined function: index, stable id, name, type, provenance, and confidence. Edit it in any spreadsheet or script and import it back with warden import.

json

The same content as csv but in JSON array form, one object per function. Useful when automating annotation workflows or feeding into IDA scripts.

Basic usage

warden export <label> --format <fmt>

By default, output goes to stdout. Pass --out <file> to write to a file instead.

warden export v1 --format kb-text                        # stdout
warden export v1 --format kb-text   --out v1.kb.txt      # file
warden export v1 --format headers   --out recovered.h
warden export v1 --format pseudo    --out v1.pseudo.txt
warden export v1 --format ghidra    --out rename_v1.py
warden export v1 --format csv       --out v1.csv         # round-trip bridge
warden export v1 --format json      --out v1.json

--db selects the project database when it is not the default warden.db:

warden export v1 --format kb-text --db /path/to/project.db --out v1.kb.txt

Format details

`headers`

Emits a C header wrapped in an include guard. Each defined function gets a comment line with its function index, Wasm type signature, provenance, and confidence score, followed by a skeleton declaration. Only defined (non-imported) functions are emitted; imports are excluded.

/* WARDEN recovered header */
/* version: v1  emscripten: 3.1.55 */
#ifndef WARDEN_RECOVERED_H
#define WARDEN_RECOVERED_H
#include <stdint.h>

/* idx=3 (i32) -> (i32)  [oracle 0.94] */
void malloc(void); /* TODO: real prototype from type sig */

/* idx=7 () -> ()  [human 1.00] */
void verify_license(void); /* TODO: real prototype from type sig */

#endif /* WARDEN_RECOVERED_H */

The prototypes are stubs. Parameter types are not yet recovered from the Wasm type signature. The comment carries the full signature string so you can fill it in. This is a known limitation; proper C prototype reconstruction is on the roadmap.

When to use it: feeding into a C toolchain, generating a symbol cheat-sheet, or as a starting point for writing a real header by hand.

`pseudo`

Emits a readable listing of every defined function with its recovered name, type signature, agent-generated summary, and provenance/confidence. When the original .wasm path is still accessible in the KB, instruction mnemonics are included inline; otherwise the listing notes the instruction count without disassembly.

// WARDEN pseudocode: v1

// ---- verify_license  () -> () ----
// Checks the license key against a hardcoded salt and calls abort if invalid.
// provenance=human confidence=1.00
function verify_license {  // stable=a3f9e1c042d8
    i32.const
    call
    i32.eqz
    if
    call
    end
}

// ---- malloc  (i32) -> (i32) ----
// provenance=oracle confidence=0.94
function malloc {  // stable=7b2d88fc901a
    // 142 instructions (disassembly not loaded)
}

When to use it: manual review of agent and Oracle output, writing a report, or orienting a second analyst. The stable ID truncated to 12 hex characters appears in every function header so cross-referencing the KB is easy.

`kb-text`

A columnar dump of every function in index order (including imports) with a fixed-width layout designed for git diff. The format is:

# WARDEN KB export (version_id=1)
index  stable_id          lk provenance  conf   name
4a9f31b2c8e60d17    oracle      0.94  malloc
7d1e55a3f0b2c8e6    import      1.00  emscripten_memcpy_big
a3f9e1c042d8b917  L human       1.00  verify_license
0c8b47f21e93da55    agent       0.61  process_frame
3d7f1a82bc094e61    -           -     -

Columns: index (Wasm function index), stable_id (first 16 hex chars of the full stable identity hash), lk (L when the symbol is locked; space otherwise), provenance, confidence, name (dash when unnamed).

Commit kb-text output alongside your source code. When the vendor ships a new .wasm, warden ingest + warden diff + warden export v2 --format kb-text gives you a git diff that shows exactly which functions changed, were added, or were dropped, and which annotations carried over automatically.

When to use it: source-controlled annotation snapshots, CI regression detection, sharing the current KB state without giving someone access to the database.

`ghidra`

Emits a Python script for Ghidra’s built-in scripting console. The script iterates over every defined function that has a recovered name and calls fn.setName(name, SourceType.USER_DEFINED) to apply it.

# WARDEN -> Ghidra rename script (run in Ghidra's Python console).
# Assumes the nneonneo/ghidra-wasm-plugin loaded the same module.
from ghidra.program.model.symbol import SourceType
fm = currentProgram.getFunctionManager()
renames = [
    (3, 'malloc'),
    (7, 'verify_license'),
    (12, 'process_frame'),
]
for idx, name in renames:
    # Map wasm function index -> Ghidra function (plugin-specific helper).
    fn = getFunctionByWasmIndex(idx) if 'getFunctionByWasmIndex' in dir() else None
    if fn is not None:
        fn.setName(name, SourceType.USER_DEFINED)
print('WARDEN: applied %d renames' % len(renames))

The Ghidra round-trip is a round-trip bridge that targets the nneonneo/ghidra-wasm-plugin and calls its getFunctionByWasmIndex helper. If that helper is not present in your Ghidra environment, the rename loop silently skips every function. Verify the plugin is loaded before running the script.

When to use it: you already have a Ghidra project open for the same module and want WARDEN’s recovered names applied without re-doing the work interactively. The index-based mapping is stable as long as the loaded .wasm is the same binary that WARDEN ingested for that version.

Built-in decompiler

The warden.lift module contains a pure-Python stack-machine lifter that re-folds Wasm stack operations back into readable pseudo-C. It handles the integer subset comprehensively including infix arithmetic, memory loads and stores, local and global variables, and function calls. It also renders f32 and f64 arithmetic, comparisons, and constants as readable expressions, so floating-point code reads the same way the integer subset does. Float min, max, and copysign render as named calls (fminf, fmax, copysign); unary ops render as name(x) (and neg as -(x)); f32.const and f64.const decode their raw IEEE-754 bytes to clean literals like 0.5; and conversions render as a C cast ((float)x) or a named call. It degrades gracefully for anything unmodeled by emitting a /* mnemonic */ comment and an opaque temporary instead of crashing.

// f_avg: (a, b) -> (a + b) * 0.5
f32 f_avg(f32 p0, f32 p1) {
    return ((p0 + p1) * 0.5);
}

Load it with samples.float_demo() to lift the float sample interactively.

Structured control flow

The lifter now reconstructs structured control flow, not just straight-line code. WebAssembly’s block, loop, and if constructs are already well-nested, so the lifter builds a control-flow tree and emits proper C constructs from it:

if/else. Result-typed ifs assign each branch into a fresh temp so the value is available after the closing brace.
while loops with break/continue. The common block + loop idiom renders as a clean while (1) { ... if (cond) break; ... } with no goto. A labeled goto is the fallback for control flow that does not fit the innermost-loop/break pattern, so output is always correct.
switch for br_table. Multi-way branches become a switch statement with a case per target and a default.

An unmodeled opcode still degrades to a /* mnemonic */ comment and pushes an opaque temp. The lifter never crashes; every function produces valid pseudo-C. Two samples ship with WARDEN and illustrate both constructs:

// abs_demo: if/else with a result
i32 abs_demo(i32 p0) {
    i32 t0;
    if ((p0 < 0)) {
        t0 = (0 - p0);
    } else {
        t0 = p0;
    }
    return t0;
}

// sum_to_n: while loop with break
i32 sum_to_n(i32 p0) {
    v1 = 0;
    v2 = 0;
    while (1) {
        if ((v2 > p0)) break;
        v1 = (v1 + v2);
        v2 = (v2 + 1);
    }
    return v1;
}

Load the samples with samples.control_flow() to lift them interactively.

How `--format pseudo` uses it

When you run warden export --format pseudo and the original .wasm is available, the exporter now calls the lifter instead of dumping raw instruction mnemonics. Each function block contains a proper pseudo-C body:

// WARDEN pseudocode: v1

// ---- parse_token  (i32, i32) -> (i32) ----
// Parses a token from the input buffer.
// provenance=oracle confidence=0.91
i32 parse_token(i32 p0, i32 p1) {
    return ((p0 + p1) * 7);
}  // stable=c4a8f21d903b

// ---- verify_license  () -> () ----
// Checks the license key against a hardcoded salt and calls abort if invalid.
// provenance=human confidence=1.00
void verify_license() {
    /* unreachable */
}  // stable=a3f9e1c042d8

Functions whose .wasm is not on disk fall back to the previous mnemonic-count note; the switch is automatic.

Targeting a single function

Use warden lift to decompile one function by name without running a full export:

warden lift v1 parse_token                # first match by name
warden lift v1 parse_token --index 7      # disambiguate by function index
warden lift v1 parse_token --out out.c    # write to file instead of stdout

The --index N flag is useful when multiple functions share a recovered name across an ambiguous KB state.

Python API

from warden.lift import lift_function, lift_module

pseudo_one = lift_function(module, func)   # -> str  (one function)
pseudo_all = lift_module(module)           # -> str  (all defined functions, index order)

lift_module skips imports (they have no body) and concatenates in function-index order so the result diffs cleanly across builds.

The lifter covers the integer, floating-point, and control-flow subset that Emscripten-compiled C/C++ produces in practice. f32 and f64 arithmetic, comparisons, and constants render as readable expressions, the same as the integer subset. SIMD (v128) opcodes are not modeled and degrade to a single /* v128 op 0xNN */ comment with an opaque temp, so a SIMD-using function still lifts without a crash or a stack desync. The output is always valid pseudo-C, never a crash or a partial file.

Round-trip bridge

warden export --format csv|json and the new warden import command deliver annotations that round-trip. Export a version’s names to a neutral file, edit them in Ghidra, IDA, or a text editor, and import the result back into the knowledge base.

Exporting to CSV or JSON

Two new format values are accepted by warden export:

warden export v1 --format csv  --out v1.csv
warden export v1 --format json --out v1.json

Each row contains: index, stable_id, name, type, provenance, confidence. The stable_id column is the key used on import. The type column carries the Wasm type signature and is round-tripped as-is. The Python API:

from warden.bridge import export_symbols

csv_text  = export_symbols(kb, version_id, fmt="csv")
json_text = export_symbols(kb, version_id, fmt="json")

Importing back

After editing the file, import it with the new warden import command:

warden import v1 v1.csv                          # CSV, provenance from file
warden import v1 v1.json --format json           # JSON
warden import v1 v1.csv --provenance human       # override provenance for all rows
warden import v1 v1.csv --provenance human --lock  # also lock every written symbol

The command prints a summary: matched, written, skipped (no name), rejected by the economy, and unmatched (row did not resolve to a known function). The Python API:

from warden.bridge import import_symbols, ImportResult

result: ImportResult = import_symbols(
    kb,
    version_id,
    text,           # CSV or JSON string
    fmt="csv",
    provenance=None,   # None = use the file's value; "human" overrides every row
    confidence=None,   # None = use the file's value; a float overrides every row
    lock=False,        # True = lock every symbol that is written
)
# result fields: matched, written, rejected_by_economy, skipped, unmatched, details

How identity and the economy interact

The bridge resolves each row to a function by stable_id first. This means a name recovered against one build lands on the same logical function in a later build, even when the function index shifts. If the stable_id is absent or unrecognized, the bridge falls back to the index column. Writes go through the same provenance/confidence economy as every other annotation source. An import never clobbers a higher-authority annotation. For example, a human 1.00 name already in the KB is not overwritten by an oracle 0.94 row from the file. Pass --provenance human to assert human authority for all rows in the file.

The existing warden export --format ghidra script is the push side of the Ghidra round-trip. The CSV/JSON formats are the neutral pull side: they work with any tool that can read a spreadsheet or JSON file, including IDA’s name-import scripts and manual annotation sessions.

Format details

Column	Description
`index`	Wasm function index in the exported version. Fallback key on import.
`stable_id`	Full stable identity hash. Primary key on import; survives rebuilds.
`name`	Recovered function name. Empty rows are skipped on import.
`type`	Wasm type signature string (e.g. `(i32, i32) -> (i32)`).
`provenance`	Annotation source (`human`, `oracle`, `agent`, etc.).
`confidence`	Confidence score in the range 0.0 to 1.0.

HTML report

warden report writes a self-contained HTML file: no server, no CDN, no build step. Everything is inlined so the file opens from any clone with a double-click, and the output is deterministic (same KB state in, byte-identical HTML out) so it diffs cleanly in git.

warden report v1                          # writes warden-report-v1.html to cwd
warden report v1 --out reports/v1.html   # explicit path
warden report v1 --db /path/to/project.db --out v1.html

What the report contains

Section	Description
Coverage summary	Named / total defined functions with a progress bar broken down by provenance (oracle, human, agent).
Confidence heatmap	Every defined function in index order. Row background hue encodes provenance; alpha encodes confidence. Solid green rows are human-verified; fading amber rows are agent guesses that need review.
Thread and memory model	Atomic sites, pthread markers, and shared-memory facts recorded by `warden analyze`. Hidden when the KB has no thread facts.
Changelog	The diff from the nearest earlier version: a chip summary (unchanged / moved / modified / new / deleted) followed by a “needs review” list of genuine app-level deltas. Hidden for the first version.

The heatmap color key:

Color	Provenance	Trust level
Emerald	`human`	Verified by hand
Blue	`oracle`	Matched against a known corpus
Cyan / teal	`export` / `import`	Free fact from the binary
Violet	`string-xref`	Inferred from a string reference
Amber	`diff-carry`	Carried across a version bump
Dark amber	`agent`	Model guess (lowest trust)
Zinc (desaturated)	(unnamed)	No symbol recovered

Python API

from warden.report import render_report, write_report

html: str = render_report(kb, version_id)              # returns HTML string
write_report(kb, version_id, "reports/v1.html")        # writes UTF-8 file

Pass module=<Module> to either function if you have the parsed .wasm on hand; it is optional and reserved for future inline disassembly views. The report is fully driven by the KB without it.

Commit the HTML report alongside kb-text snapshots. The report is byte-identical for the same KB state, so git diff --stat will tell you at a glance whether anything actually changed between runs. This is useful in CI to detect spurious annotation drift.

Comparing across versions

Because all formats are deterministic, you can snapshot them at each version and use standard diff tooling to review what changed:

warden export v1 --format kb-text --out snapshots/v1.kb.txt
warden export v2 --format kb-text --out snapshots/v2.kb.txt
diff snapshots/v1.kb.txt snapshots/v2.kb.txt

Functions with unchanged stable_id and annotations appear as unchanged lines. New functions, dropped functions, and any confidence or provenance changes are visible immediately.

For a richer semantic changelog (which functions are new, removed, carried over, or only partially matched), use warden diff before exporting. The diff engine runs the same fingerprinting that export relies on, so the two views are consistent.

Reference

`warden export` flags

Flag	Default	Description
`--format, -f`	`kb-text`	Output format: `headers`, `pseudo`, `kb-text`, `ghidra`, `csv`, or `json`.
`--out, -o`	(stdout)	Write output to a file instead of printing to stdout.
`--db`	`warden.db`	Project database path (or `WARDEN_DB` env var).

`warden import` flags

warden import <label> <file> [--format csv|json] [--provenance <prov>] [--lock]

Flag	Default	Description
`--format, -f`	`csv`	Input format: `csv` or `json`.
`--provenance`	(from file)	Override provenance for every row in the file (e.g. `human`).
`--lock`	false	Lock every symbol that is written.
`--db`	`warden.db`	Project database path (or `WARDEN_DB` env var).

Getting started

The pipeline

Reference

Project

Exporters & deliverables

The six formats

headers

pseudo

kb-text

ghidra

csv

json

Basic usage

Format details

`headers`

`pseudo`

`kb-text`

`ghidra`

Built-in decompiler

Structured control flow

How `--format pseudo` uses it

Targeting a single function

Python API

Round-trip bridge

Exporting to CSV or JSON

Importing back

How identity and the economy interact

Format details

HTML report

What the report contains

Python API

Comparing across versions

Reference

`warden export` flags

`warden import` flags

​The six formats

headers

pseudo

kb-text

ghidra

csv

json

​Basic usage

​Format details

​headers

​pseudo

​kb-text

​ghidra

​Built-in decompiler

​Structured control flow

​How --format pseudo uses it

​Targeting a single function

​Python API

​Round-trip bridge

​Exporting to CSV or JSON

​Importing back

​How identity and the economy interact

​Format details

​HTML report

​What the report contains

​Python API

​Comparing across versions

​Reference

​warden export flags

​warden import flags

The six formats

Basic usage

Format details

`headers`

`pseudo`

`kb-text`

`ghidra`

Built-in decompiler

Structured control flow

How `--format pseudo` uses it

Targeting a single function

Python API

Round-trip bridge

Exporting to CSV or JSON

Importing back

How identity and the economy interact

Format details

HTML report

What the report contains

Python API

Comparing across versions

Reference

`warden export` flags

`warden import` flags