Roadmap

WARDEN is organized as a collaborative, iterative loop, and each capability in that loop is built to be independently useful. You get value before the whole system exists. This page maps every capability to its honest status: what ships and is tested, what is wired end to end but needs optional native tooling, and what is still open. For how these capabilities fit together in practice, see the reverse-engineering loop. The status key used throughout this page:

Status	Meaning
Implemented and tested	Code ships, exercised by the test suite, runs on `warden demo`.
Scaffolded	Module and interface exist, the seam is wired end-to-end, but the implementation is a stub or requires optional native tooling to activate.

Ingestion

Goal. A queryable model of any Emscripten module: parse the binary and its JS glue, compute stable function identities, seed obvious symbols, and store everything in a versioned knowledge base. Deliverables. warden init, warden ingest, warden funcs, warden show, warden coverage, warden set-name are all fully operational. The KB can round-trip a module and answer “what do we know about func[N]?” without any external tooling. Status: implemented and tested.

Component	Source	What it does
WASM section parser	`src/warden/ingest/`	Reads type, import, function, export, code, name, element/table sections in pure Python.
JS-glue parser	`src/warden/ingest/`	Reads Emscripten’s export-index map, `dynCall` signatures, and `PROXY_TO_PTHREAD` shape from the `.js` glue file.
Knowledge base	`src/warden/kb/`	SQLite schema with `module_versions`, `functions`, `symbols`, and `diffs` tables; provenance/confidence/locked columns; `upsert_symbol` economy (human > oracle > agent).
CLI	`src/warden/cli.py`	`init`, `ingest`, `versions`, `coverage`, `funcs`, `show`, `set-name`, `verify`, `export`, and `demo`.
`warden demo`	`src/warden/samples.py`	Runs the entire loop end-to-end on generated sample modules with no network or native toolchain.

The disassembler stays in sync on the constructs real Emscripten modules emit: exception handling (try, catch, throw, rethrow, delegate, catch_all, try_table), tail calls (return_call, return_call_indirect), and typed (multi-value) block types. Unknown sections (data count, tag, and custom) and unmodeled opcodes are consumed and skipped rather than crashing, so ingestion always finishes with a per-function disasm_error note instead of an error. Covered by tests/test_ingest.py, tests/test_ingest_edge.py, tests/test_fingerprint.py, tests/test_kb.py, tests/test_cli.py, and tests/test_pipeline.py.

Stable identity and fingerprints

Goal. Key every annotation on a content identity that survives a rebuild, so names, types, and notes carry across binary updates even when the table index shifts. Deliverable. A deterministic fingerprinter that computes a stable_id composite per function, and the single similarity engine that both the Oracle and the diff reuse. Status: implemented and tested.

Component	Source	What it does
Identity fingerprinter	`src/warden/identity/fingerprint.py`	Computes exact body hash, structural skeleton, opcode-class histogram, call-target set, and type signature per function. The `stable_id` composite is what annotations are keyed to.
Similarity engine	`src/warden/identity/fingerprint.py`	Combines exact-body equality, structural-skeleton equality, fuzzy MinHash Jaccard, opcode-histogram cosine, and call-neighborhood overlap into one score.

Fingerprint determinism is tested explicitly: same bytes in, same stable_id out. This is the foundational guarantee the entire carry-over mechanism depends on. See core concepts for why one engine powers both identification and diff.

The Emscripten Oracle

Goal. Auto-identify 40 to 80 percent of any Emscripten module as known musl / libc++ / dlmalloc / Emscripten-runtime code, instantly, with real upstream names, so agent and human effort concentrates on the application-specific remainder. Deliverables. warden oracle build and warden oracle identify. A corpus of labeled .wasm artifacts (emsdk times build-flag matrix) backed by the signature store; version inference from the distribution of Oracle matches. Status: Oracle engine, MinHash-LSH index, and the manifest-driven corpus farm implemented and tested. Running the farm at scale still needs emsdk.

Component	Source	What it does
Signature extraction	`src/warden/oracle/corpus.py`	`extract_signatures` fingerprints every named defined function in a labeled module and classifies it by library (musl, libc++, dlmalloc, emscripten, wasi-libc, musl-pthread).
Signature store	`src/warden/oracle/signatures.py`	JSON-serialisable store; `load` / `save` / `extend` / `libraries()`.
Identification pass	`src/warden/oracle/match.py`	Fingerprints every defined function in the target, scores against each corpus signature using `similarity()`, and writes matches above threshold as `oracle`-provenance symbols into the KB.
MinHash-LSH index	`src/warden/oracle/index.py`	`SignatureIndex.build(store, bands=8)` builds a sublinear candidate index; `index.candidates(fp)` returns approximate neighbors; `identify_indexed(kb, version_id, store, threshold=0.82, write=True)` is a drop-in replacement for the linear `identify()` pass. CLI: `warden oracle identify <label> --store s --indexed`.
Version inference	`src/warden/oracle/`	`infer_version` reads the distribution of `emscripten_version` fields across matches and returns the plurality winner with a calibrated confidence score.

The corpus farm is end to end. scripts/corpus/build_matrix.sh builds the reference programs across an emscripten version times opt-level matrix (it compiles .c with emcc and .cpp with em++), and writes both the per-build signatures and a manifest.json. harvest_directory(root) (or warden oracle harvest <dir>) reads that manifest and builds one oracle.json in a single call. classify_library now spans musl, musl-pthread, libc++, libc++abi, compiler-rt, dlmalloc, emscripten, and wasi-libc, SignatureStore.extend deduplicates so a re-run stays idempotent, and SignatureIndex.build(bands=None) auto-picks the band count from the store size.

What still needs tooling: actually running the farm requires emsdk (Docker), which produces the multi-thousand-signature corpus that gives the Oracle the 40 to 80 percent identification rate. The seed store shipped with the repo (src/warden/oracle/seed_signatures.json) is a small hand-crafted fixture used by tests. The harvesting, classification, dedup, and indexing around that build all run today with no native toolchain.

The decompiler and lifter

Goal. Render readable pseudo-C for every function with no native tooling, and bridge recovered names to and from existing RE tools so analysts keep their current workflow. Deliverables. warden lift decompiles to pseudo-C. warden export --format ghidra emits a runnable Python rename script; warden export --format headers emits a C header; warden export --format pseudo emits readable per-function listings; warden export --format kb-text emits a git-diffable plain-text snapshot. warden export --format csv|json exports annotations to a neutral file; warden import reads them back. Status: structured lifter, neutral round-trip bridge (CSV/JSON), and the Ghidra rename script all implemented and tested. The four text export formats (headers, pseudo, kb-text, ghidra) in src/warden/export/text.py are covered by tests/test_cli.py. The ghidra format emits a valid Python snippet that calls getFunctionByWasmIndex (from the nneonneo/ghidra-wasm-plugin) and fn.setName(name, SourceType.USER_DEFINED) for every named function in the KB. Built-in lifter (structured control flow). warden.lift is a pure-Python stack-machine lifter that renders readable pseudo-C without any native tooling. lift_function(module, func) returns a string; lift_module(module) lifts every function. warden export --format pseudo emits real pseudo-C, and warden lift <label> [--index N] [--out FILE] exposes the lifter directly. The lifter reconstructs structured control flow, not just straight-line arithmetic:

Real if/else: result-typed ifs assign each branch into a temp variable.
while loops with break/continue: the common block-plus-loop idiom renders as a clean while (1) { ... if (cond) break; ... } with no goto. A labeled goto is the fallback for control flow that does not fit the innermost-loop break pattern, so output stays correct even for unusual shapes.
switch for br_table.
Expression folding for infix arithmetic, memory loads, and calls.

The lifter degrades gracefully: an unmodeled opcode becomes a /* mnemonic */ comment, never a crash. Examples from samples.control_flow(): abs_demo lifts to an if/else that returns a value; sum_to_n lifts to a while loop with a break. For example, parse_token lifts to:

i32 parse_token(i32 p0, i32 p1) { return ((p0 + p1) * 7); }

Round-trip symbol bridge (warden.bridge). Export a version’s annotations to a neutral CSV or JSON file, edit them in Ghidra, IDA, or by hand, then import them back. The bridge keys matches on the stable function identity first (so a name recovered against one build lands on the same logical function in another build, even when the table index shifts), then falls back to the function index. All imports go through the provenance/confidence economy, so an import never clobbers higher-authority work. Key API in src/warden/bridge/:

export_symbols(kb, version_id, fmt="csv") -> str (fmt: "csv" or "json").
import_symbols(kb, version_id, text, fmt="csv", *, provenance=None, confidence=None, lock=False) -> ImportResult (fields: matched, written, rejected_by_economy, skipped, unmatched, details).

CLI additions:

warden export --format csv|json          # export annotations to a neutral file
warden import <label> <file> [--format csv|json] [--provenance human] [--lock]

The --provenance human flag overrides the file’s stored provenance for the import run. --lock marks imported symbols so no lower-authority pass can overwrite them. The --format ghidra command is the push side of the Ghidra workflow; these neutral formats complete the round trip.

A deeper Ghidra integration could go further: launching Ghidra headlessly, running the rename script via analyzeHeadless, and reading decompiled p-code back through pyghidra are not automated. Activating that flow requires Ghidra and the nneonneo/ghidra-wasm-plugin installed locally. The generated rename script and the CSV/JSON import cover the common case without those dependencies.

Agents and the deep engine

Goal. A propose, verify, write-back loop that populates the KB so human effort is spent only on what the agents cannot resolve confidently, plus a per-function deep engine that recovers a full understanding, not just a name. Deliverables. warden agent <label> running a multi-backend crew; warden deep <label> running one agent per function bottom-up; warden mcp serving the KB as an MCP tool surface so any capable model can drive it from outside. Status: agent loop, deep per-function engine, MCP server, specialized analyzers as autonomous proposers, and multi-round bottom-up naming all implemented and tested. Per-role LLM specialists are still open. Implemented:

Offline heuristic backend (src/warden/agents/backends.py): deterministic, zero-dependency; uses string xrefs and call-neighborhood context to produce proposals. Works with no API key.
OpenRouter / Kimi backend (src/warden/agents/backends.py): the cheap default. Naming from grounded facts is bulk work, not frontier reasoning, so an inexpensive model like Kimi K2.6 does it. Model moonshotai/kimi-k2.6 by default; auto-selected when OPENROUTER_API_KEY is set and openrouter is installed (pip install -e '.[agents]'), and tried first.
OpenAI backend (src/warden/agents/backends.py): structured JSON output via the OpenAI Responses API, model gpt-5.3-codex by default. Auto-selected when OPENAI_API_KEY is set and openai is installed, after OpenRouter. codex and oai are provider aliases.
Anthropic backend (src/warden/agents/backends.py): structured JSON output via the Anthropic Messages API, model claude-opus-4-8. Auto-selected when ANTHROPIC_API_KEY is set and anthropic is installed, if OpenRouter and OpenAI are not available.
Crew loop (src/warden/agents/crew.py): gather_facts seeds each call with hard evidence (type signature, call targets, string xrefs, opcodes) to constrain hallucination; verify_proposal is a cheap syntactic gate; run_agent_pass iterates bottom-up (fewest call targets first), skips already-confident symbols, gates through the verifier and KB economy.
Deep per-function engine (src/warden/agents/deep.py): warden deep runs one agent per function, walked bottom-up over the call graph. Each agent gets the function’s facts, its decompiled C, and the recovered understanding of its callees (read from the KB, not raw bytes), and returns a name, an understanding, a variable rename map, and cleaned C. Leaf functions use the cheap leaf --backend; high-fan-in parents escalate to a stronger --parent-model. Identical functions are deduped by stable identity. Progress streams to agent_events and, with --watch, to the terminal.
MCP server (src/warden/mcp/server.py): FastMCP server exposing project reads, function facts, agent backend discovery, server-side agent runs, and economy-gated symbol proposals. Agent writes are economy-gated at the KB layer, so they cannot overwrite human or higher-confidence Oracle annotations. Activate with pip install -e '.[mcp]' then warden mcp.
Concurrency analyzer (src/warden/analysis/concurrency.py): analyze_concurrency(module, kb, version_id) returns a ConcurrencyReport with .shared_memory, .atomic_sites, .pthread_markers, and .facts. Populates the thread_model KB table via kb.add_thread_fact. Deterministic; zero external dependencies.
Struct analyzer (src/warden/analysis/structs.py): analyze_structs(...) returns a list of StructLayout values (each carrying .name, .fields as StructField(offset, size, type, name), and .source_function). Populates the structs KB table via kb.upsert_struct. CLI: warden analyze <label> runs both analyzers and persists all facts.

The concurrency and struct analyzers run inside the call-graph crew as autonomous proposers: each emits conservative, economy-gated naming proposals (low confidence, so they only fill empty slots) on top of the notes they route to the namer. run_agent_pass(..., rounds=N) walks the call graph bottom-up for up to N rounds and stops early at a fixpoint, so a caller named in one round informs its callers in the next. The MCP surface includes search_symbols, get_diff, export_kb_text, and analyze_version, all economy-gated.

What is still open: the design describes six specialized agents (Oracle, Concurrency, Type/Struct, Naming/Summarization, Diff, and Verifier) as distinct LLM-backed roles. The current crew runs the concurrency and struct analyzers as deterministic proposers plus one naming backend, not six independent LLM agents. Routing each role to its own specialist model is the next step.

Diff and carry-over

Goal. Turn reverse engineering from Sisyphean to incremental. When a new .wasm ships, classify every function as unchanged / moved / modified / new / deleted, carry all annotations forward automatically for unchanged and moved, apply a confidence penalty for fuzzy matches, and emit a semantic changelog that separates genuine application deltas from runtime churn caused by an Emscripten version bump. Deliverable. warden diff <from> <to> carries annotations forward and prints a human-readable changelog; the diff report is stored in the KB for time-travel queries. Status: fully implemented and tested. src/warden/diff/engine.py runs a three-pass matching algorithm:

Exact-body hash match

Functions with the same exact_hash are unchanged if the table index stayed, or moved if it shifted.

Stable-identity match

Functions with the same stable_id but a different body are treated as unchanged/moved too, because the stable identity intentionally tolerates relocations the exact hash would miss.

Greedy fuzzy match

Among the remainder, functions are paired by similarity().overall. Score >= 0.6 is classified modified; the rest become new or deleted.

Annotation carry-over copies oracle / agent / human symbols to the new stable_id with a 0.7 confidence multiplier for fuzzy matches; diff-carry provenance is recorded. render_changelog separates runtime_churn from app_modified using the _RUNTIME_PREFIXES table. Covered by tests/test_pipeline.py and tests/test_cli.py. Time-travel queries answer history questions over the same stored data. when_first_seen, evolution_of (with a body_changed flag per appearance), symbol_history, and find_by_name on KnowledgeBase let you ask when a function first appeared, when its body actually changed, and who named it. The warden history <name-or-id> command surfaces all three. See diff and carry-over for the API.

Verification

Goal. Make “understood” provable. Lift target functions via wasm2c/w2c2 to C, recompile the agent reconstruction the same way, differentially execute both over a fuzzer-generated corpus, and require I/O and memory match. Deliverable. warden verify <wasm> reports determinism and differential-readiness. The verifier gate in the agent loop activates the behavioral check when the required tooling is present. Status: determinism verification, an extended mini-interpreter, a deterministic input corpus, an interpreter-based behavioral check across versions, and the gated wasm2c orchestration all implemented and tested. The wasm2c path activates only when a C toolchain is present. Implemented in src/warden/verify/harness.py:

verify_determinism re-ingests the same bytes twice and confirms every function’s stable_id is bit-identical across runs. This is the foundational guarantee that the entire carry-over mechanism depends on.
tooling_status probes PATH for wasm2c, w2c2, a C compiler, and wasm-validate; reports can_differential truthfully.
differential_plan returns the concrete shell steps for the wasm2c lift, recompile, differential execution pipeline, and whether the environment can run them. warden verify <wasm> surfaces this output.

Mini interpreter. warden.interp is a zero-dependency interpreter for the integer subset of WebAssembly that makes behavioral equivalence runnable without any native toolchain.

execute_function(module, func, args, *, host=None, memory=None, fuel=100000) executes a single function and returns a list of integer results. Raises UnsupportedExecution for instructions outside the integer subset.
differential_execute(mod_a, fn_a, mod_b, fn_b, inputs) runs both functions over a list of argument tuples and returns a per-input list of dicts reporting whether the outputs matched. For example, it proves parse_token v1 and v2 are behaviorally equivalent (v2’s bounds-check result is dropped from the return), while flagging that internal_crc differs.
CLI: warden exec <label> <index> [args...] prints the result of executing a function by index directly from the KB.

Two pieces make behavior checkable. verify.corpus.generate_inputs produces a deterministic input corpus (boundary values first, then a hand-written seeded recurrence, no random module). differential_versions(kb, from_id, to_id) pairs that corpus with the interpreter to prove two versions of a function behave the same, with zero native toolchain; warden equiv <from> <to> runs it. The interpreter models signed and unsigned division and remainder, shifts and rotates, bit-counting, unsigned comparisons, select, and narrow 8 and 16 bit loads and stores, and traps cleanly on division by zero. run_differential orchestrates the wasm2c/w2c2 lift, compile, and compare pipeline through an injectable runner, and returns an honest plan instead of a result when the C toolchain is absent.

What still needs tooling: the wasm2c path itself runs only when wasm2c or w2c2 plus a C compiler are on PATH. The crew verifier gate (crew.py:verify_proposal) remains the cheap syntactic check. SeeWasm symbolic checks and Wasabi/Frida dynamic tracing are not yet wired.

The UI and the MCP surface

Goal. A “RE-as-version-control” interface: diff view, confidence heatmap, time-travel query (“when did this function first appear?”), thread/memory map, one-click export to pseudocode or headers, plus a programmatic surface any external model can drive. Deliverable. Usable surfaces for analysts who want WARDEN’s power without running CLI commands manually, and an MCP server that exposes the same knowledge base to external models. Status: static HTML report, a rich terminal diff view, a read-only HTTP dashboard, and the MCP server all implemented and tested. A full IDE-grade UI is still future work. All the data a UI consumes is present in the KB today: versioned functions, per-function confidence and provenance, diff reports stored with kb.store_diff, and the kb-text export format designed to diff cleanly in git. The warden demo output already produces a human-readable coverage progression and changelog in the terminal. Static HTML report. warden.report generates a self-contained HTML file (inline CSS, no server required) that captures an analysis session as a shareable artifact.

render_report(kb, version_id, module=None) returns the HTML as a string; write_report(kb, version_id, path, module=None) writes it to disk.
The report includes a coverage summary, a confidence heatmap of functions colored by provenance and confidence score, a thread/memory model section drawn from the concurrency analyzer’s facts, and the diff changelog.
CLI: warden report <label> [--out FILE].

Two interactive surfaces sit on top of the report. warden.ui.terminal renders a rich terminal diff view (warden ui diff <from> <to>) that colors functions by classification and provenance. warden.ui.server is a read-only HTTP dashboard (warden serve) built on the standard library: it serves a single self-contained page with a confidence heatmap, a coverage summary, and a from/to diff view, backed by a small /api/* JSON surface. The connection runs in query_only mode, so the whole HTTP surface is structurally incapable of writing. See the UI reference. The MCP server (src/warden/mcp/server.py) exposes the same knowledge base over the Model Context Protocol, so Claude or any MCP-capable model can drive the loop alongside a human. Every write is economy-gated at the KB layer. See the MCP reference. No IDE plugin has been built yet. This is the natural integration point for a richer browser frontend over the JSON or MCP surface, or a Ghidra panel that highlights confidence with color.

Status at a glance

Capability	Status
Ingestion	Implemented and tested
Stable identity and fingerprints	Implemented and tested
The Emscripten Oracle	Engine, MinHash-LSH index, and manifest-driven corpus farm implemented; running the farm needs emsdk
The decompiler and lifter	Structured lifter (if/else, while, switch), round-trip bridge (CSV/JSON import/export, economy-gated, stable-identity keyed), and Ghidra rename script all implemented and tested
Agents and the deep engine	Loop, deep per-function engine, MCP server, analyzers as autonomous proposers, and multi-round naming implemented; per-role LLM specialists still open
Diff and carry-over	Implemented and tested, including time-travel queries
Verification	Determinism, extended interpreter, input corpus, cross-version behavioral check, and gated wasm2c orchestration implemented; wasm2c path needs a C toolchain
The UI and the MCP surface	Static HTML report, terminal diff view, read-only HTTP dashboard, and MCP server implemented; IDE-grade UI still future work

How to contribute

The project is alpha. Every capability has a concrete gap where a focused contribution lands quickly.

Ingestion (best entry point)

Fix edge cases in the WASM section parser (src/warden/ingest/), add fingerprint properties to src/warden/identity/fingerprint.py, or improve the JS-glue parser to handle additional Emscripten output shapes. Every change can be validated against the existing test suite with pytest.

The decompiler and lifter

The lifter renders f32 and f64 arithmetic, comparisons, and constants as readable expressions, and SIMD (v128) opcodes degrade to a /* mnemonic */ comment. Open work: model the SIMD opcodes properly instead of commenting them; add a Ghidra headless launch wrapper that runs the generated rename script via analyzeHeadless; or wire pyghidra to pull p-code back into the lifter as an optional enrichment path. The Ghidra paths require Ghidra and the nneonneo/ghidra-wasm-plugin installed locally.

The Oracle corpus

Run the emsdk matrix build (scripts/corpus/build_matrix.sh) and contribute the resulting oracle.json as a versioned artifact; harvest_directory and the reference programs make this a single command once the matrix is built. Add more entries to classify_library for runtime prefixes it does not yet recognize, or tune recommended_bands for precision and recall on cross-opt-level matching.

Diff and carry-over

Improve the fuzzy similarity score in src/warden/identity/fingerprint.py (call-graph anchoring, dominator-tree comparison) to reduce false modified/new classifications on large modules. The time-travel helpers (when_first_seen, evolution_of, symbol_history, find_by_name) are in; building richer history views on top of them is open.

Agents and the deep engine

The concurrency and struct analyzers run as economy-gated proposers and run_agent_pass supports multi-round bottom-up naming. The next step is routing each crew role (Oracle, Concurrency, Type/Struct, Naming) to its own LLM specialist instead of one shared backend, and adding more MCP tools on top of search_symbols, get_diff, export_kb_text, and analyze_version.

Verification

The interpreter-based differential_versions check and the gated run_differential orchestration are in, and the interpreter executes f32 functions. Open work: wire tooling_status().can_differential into the crew loop so verify_proposal runs the behavioral check for high-confidence proposals, extend the interpreter to f64 and memory-growing programs, and add SeeWasm symbolic or Wasabi/Frida dynamic tracing.

The UI and the MCP surface

The terminal diff view (warden ui diff) and the read-only HTTP dashboard (warden serve) are in. Build a richer browser frontend on the /api/* JSON surface, or a Ghidra panel that highlights confidence with color. The static HTML report and the MCP server are also good programmatic surfaces to build on.

See the contributing guide for how to open a PR, and the limitations page for an honest accounting of current gaps. To understand the full architecture behind these capabilities, start with core concepts and the reverse-engineering loop.

Getting started

The pipeline

Reference

Project

Ingestion

Stable identity and fingerprints

The Emscripten Oracle

The decompiler and lifter

Agents and the deep engine

Diff and carry-over

Verification

The UI and the MCP surface

Status at a glance

How to contribute

​Ingestion

​Stable identity and fingerprints

​The Emscripten Oracle

​The decompiler and lifter

​Agents and the deep engine

​Diff and carry-over

​Verification

​The UI and the MCP surface

​Status at a glance

​How to contribute

Ingestion

Stable identity and fingerprints

The Emscripten Oracle

The decompiler and lifter

Agents and the deep engine

Diff and carry-over

Verification

The UI and the MCP surface

Status at a glance

How to contribute