Changelog

All notable changes to WARDEN are documented here. The format follows Keep a Changelog and the project aims to adhere to Semantic Versioning.

Scaffolded seams finished, then every capability deepened

This work turns the scaffolded seams across the Oracle, diff, the agent crew, verification, and the UI into working, tested code, then deepens each one. Everything below runs with no native toolchain, except where it explicitly activates optional tools.

Improved

Parser robustness. The disassembler now handles the constructs real Emscripten modules emit without desyncing: exception-handling opcodes (try, catch, throw, rethrow, delegate, catch_all, try_table), tail calls (return_call, return_call_indirect), and typed block types. Unknown sections (data count, tag, custom) and unmodeled opcodes are consumed and skipped rather than crashing.
Float decompilation. The lifter renders f32 and f64 arithmetic, comparisons, and constants as readable expressions, and degrades SIMD (v128) opcodes to a comment.
Oracle calibration. Deterministic identify tie-breaking that does not depend on store order, calibrated infer_version confidence, SignatureStore.stats() with the warden oracle inspect command, and an evaluate_identification precision/recall harness for measuring a corpus without emsdk.
Sharper similarity. The fuzzy score blends the MinHash term with call-neighborhood, opcode-histogram, type-signature, and instruction-count terms and exposes its sub-scores, and the diff requires a shared structural skeleton before a fuzzy modified pairing. Stable identity and determinism are unchanged.
Per-role specialists. A SpecialistRouter routes the naming and analyzer roles to separate configured backends or models. The default is one auto-selected backend, unchanged and offline-first.
Float execution. The interpreter executes f32 functions, and differential_versions generates deterministic float inputs and compares them NaN-safely, so behavioral equivalence now covers float functions with no native toolchain.
Richer dashboard. The read-only dashboard gained symbol search (GET /api/search), a function detail panel with evidence, and a time-travel history panel (GET /api/history/{stable_id}), all under the same query_only read-only guarantee.

Added

Oracle corpus farm. harvest_directory(root) and warden oracle harvest <dir> build one oracle.json from a built matrix directory and its manifest.json. build_corpus_from_manifest reads a manifest of labeled modules. classify_library now spans musl, musl-pthread, libc++, libc++abi, compiler-rt, dlmalloc, emscripten, and wasi-libc. SignatureStore.extend deduplicates so re-runs stay idempotent, and SignatureIndex.build accepts bands=None to auto-pick the band count via recommended_bands. build_matrix.sh now compiles .cpp with em++ and writes a manifest, with new pthread, allocation, and C++ reference programs.
Time-travel queries. KnowledgeBase.when_first_seen, evolution_of (with a body_changed flag), symbol_history, find_by_name, and resolve_stable_id. The warden history <name-or-id> command shows when a function first appeared, how it evolved, and who named it. diff_versions gained a store=False flag for pure-read diffs.
Specialist crew and MCP tools. The concurrency and struct analyzers run inside the call-graph crew as economy-gated proposers, and run_agent_pass(..., rounds=N) walks the call graph bottom-up to a fixpoint. New MCP tools: search_symbols, get_diff, export_kb_text, and analyze_version.
Behavioral verification. verify.corpus.generate_inputs builds a deterministic input corpus. differential_versions proves two versions of a function behave the same using the interpreter, with no native toolchain; warden equiv <from> <to> runs it. The interpreter now models signed and unsigned division and remainder, shifts and rotates, bit-counting, unsigned comparisons, select, and narrow 8 and 16 bit loads and stores, and traps cleanly on division by zero. run_differential orchestrates the wasm2c/w2c2 lift, compile, and compare pipeline through an injectable runner, and reports an honest plan when the C toolchain is absent.
Interactive UX. A new warden.ui package: a rich terminal diff view (warden ui diff <from> <to>) and a read-only HTTP dashboard (warden serve) built on the standard library, with a confidence heatmap, coverage summary, and from/to diff over a small /api/* JSON surface. The connection runs in query_only mode, so the HTTP surface cannot write.

First public alpha, released 2026-06-07

The first public alpha. WARDEN runs end-to-end with zero native dependencies. warden demo walks the whole pipeline on generated sample modules: ingest, Oracle identification, agent crew, a new version, and diff with carry-over. Every capability across ingestion, the Oracle, diff, the agent crew, verification, and the UI has a working, tested implementation.

Added

Ingestion. Pure-Python WebAssembly binary parser covering LEB128, all standard sections, and full opcode disassembly including sign-ext, bulk-memory, reference types, threads/atomics, and SIMD immediates. Includes the name custom-section parser and an Emscripten JS-glue parser (version, dynCall signatures, pthread/PROXY_TO_PTHREAD markers).
Knowledge base. SQLite-backed, versioned symbol store keyed to stable function identities, with provenance, confidence, evidence, struct layouts, a thread/memory model, an audit log, and the provenance/confidence economy enforced at the write layer.
Stable identity and fingerprinting. Structural skeleton hash, opcode-class histogram, call-neighborhood, surviving type signature, and a deterministic MinHash fuzzy signature. One composite similarity engine reused by both the Oracle and the diff.
Emscripten Oracle. Signature store, corpus builder from labeled modules, identification pass, and Emscripten-version inference. The warden.oracle.index module adds SignatureIndex.build(store, *, bands=8), index.candidates(fp), and identify_indexed(kb, version_id, store, *, threshold=0.82, write=True), a band-based MinHash-LSH structure for sublinear candidate lookup that matches the linear identify() at the same threshold. CLI: pass --indexed to warden oracle identify. Containerized emsdk matrix scaffold under scripts/corpus/.
Cross-version diff and carry-over. Match/classify pipeline (unchanged / moved / modified / new / deleted), automatic annotation carry-over (verbatim for shared identities, penalized for fuzzy matches), and a semantic changelog separating app changes from runtime churn.
Built-in decompiler / lifter. warden.lift (lift_function, lift_module): a pure-Python stack-machine lifter that renders readable pseudo-C. CLI: warden lift <label> [--index N] [--out FILE]. It also backs warden export --format pseudo, so pseudocode export emits real pseudo-C instead of a mnemonic dump. Example: i32 parse_token(i32 p0, i32 p1) { return ((p0 + p1) * 7); }.
Structured control-flow decompiler and round-trip symbol bridge. warden.lift now reconstructs structured control flow on top of the existing expression folding. Real if/else blocks are emitted (result-typed ifs assign each branch into a temp). While loops use the common block+loop idiom and render as while (1) { ... if (cond) break; ... } with no goto. A labeled goto is the fallback only for control flow that does not fit the innermost-loop/break pattern, so output stays correct. br_table becomes a switch. An unmodeled opcode degrades to a /* mnemonic */ comment, never a crash. Example outputs: abs_demo lifts to an if/else that returns; sum_to_n lifts to a while loop with break. The public API is unchanged (lift_function(module, func) -> str, lift_module(module) -> str) as is the CLI (warden lift <label> [--index N]). A new samples.control_flow() helper provides these two demo functions. warden.bridge adds a neutral-format round-trip for annotations. export_symbols(kb, version_id, fmt="csv") -> str (fmt: "csv" or "json") and import_symbols(kb, version_id, text, fmt="csv", *, provenance=None, confidence=None, lock=False) -> ImportResult (fields: matched, written, rejected_by_economy, skipped, unmatched, details). The import keys on the stable function identity first, then falls back to the function index, so a name recovered against one build lands on the same logical function in another build even when the index shifts. Imports pass through the provenance/confidence economy, so a file never clobbers higher-authority work. CLI: warden export --format csv|json (added alongside the existing headers, pseudo, kb-text, and ghidra formats) and a new warden import <label> <file> [--format csv|json] [--provenance human] [--lock] command.
Agent crew. A propose, verify, write-back loop with a deterministic offline heuristic backend (no API key required), an optional OpenAI backend (gpt-5.3-codex by default, with codex and oai aliases), and an optional Anthropic backend (claude-opus-4-8 by default). LLM backends use structured JSON output.
Call-graph agent strategy. warden.analysis.callgraph provides build_call_graph(module) (returns a CallGraph with .edges, .imports_called, .indirect_callers, .table_targets, and .callees(index)) and layered_schedule(module, graph=None) (bottom-up layers of function indices, with strongly-connected components condensed via iterative Tarjan). run_agent_pass gained two keyword arguments: strategy ("call-graph" by default, or "flat") and concurrency (int, default 8). CLI: warden agent <label> [--strategy call-graph|flat]. The call-graph strategy works in five steps: (1) build a static intra-module call graph (direct calls are exact; call_indirect / dynCall indirect calls are over-approximated to table targets of the matching type from module.elements); (2) condense SCCs and sort into bottom-up layers so every function’s defined callees are in earlier layers; (3) run the concurrency and struct analyzers first and route their findings into per-function notes (atomic sites, struct layouts); (4) process layers bottom-up, giving each function a FunctionFacts.callee_names list of its callees’ recovered names so the naming LLM sees callee meanings before producing a name for the caller; (5) functions within the same layer are independent and are proposed concurrently in-process via asyncio (blocking LLM backends run in worker threads capped by concurrency). Writes still go through the provenance/confidence economy, so concurrent branches sharing a callee cannot clobber each other. FunctionFacts gained two new fields: callee_names (list[str]) and notes (list[str]). The "flat" strategy preserves the original single-pass, leaves-first ordering.
Specialized concurrency and struct analyzers. warden.analysis.concurrency (analyze_concurrency returns a ConcurrencyReport with .shared_memory, .atomic_sites, .pthread_markers, .facts) and warden.analysis.structs (analyze_structs returns StructLayout objects with .name, .fields, .source_function). Both populate the previously-empty thread_model and structs KB tables. CLI: warden analyze <label>.
Verification. Determinism verification (runs today) plus a detected, optional wasm2c/w2c2 differential-equivalence plan. A zero-dependency interpreter (warden.interp: execute_function, differential_execute) makes behavioral-equivalence checking runnable without external tooling. CLI: warden exec <label> <index> [args...].
Static HTML report generator. warden.report (render_report, write_report): a self-contained HTML file (inline CSS, no server required) with a coverage summary, a confidence heatmap colored by provenance and confidence, a thread/memory model section, and the diff changelog. CLI: warden report <label> [--out FILE].
Exporters. C headers, readable pseudocode, a git-diffable KB text dump, and a Ghidra rename script.
MCP server. Optional warden mcp tool surface mirroring the GhidraMCP pattern, including backend discovery, grounded function facts, server-side agent runs, and economy-gated symbol proposals.
CLI and UX. warden init, ingest, versions, coverage, funcs, show, set-name, oracle, agent, agent-backends, diff, lift, exec, analyze, report, export, verify, mcp, and demo with rich terminal output.
Test suite (247 tests), CI (lint, types, tests, wasm validation), a PyPI Trusted Publishing release workflow, a Mintlify documentation site, a Docker image, pre-commit hooks, and a full documentation set.

Getting started

The pipeline

Reference

Project

Improved

Added

Added

​Improved

​Added

​Added

Improved

Added

Added