Skip to main content
Today’s WebAssembly reverse engineering is a pile of disconnected tools, and every annotation you make dies when the vendor ships a new .wasm. You re-do the work. That’s the real pain. Not the first decompile, but the second through hundredth. WARDEN treats RE as a living, versioned knowledge base keyed to stable function identities rather than file offsets, so your names, types, and notes carry across binary updates automatically.

The Emscripten Oracle

Emscripten, musl, dlmalloc, and libc++ are open source, so WARDEN compiles its own ground truth and auto-identifies runtime functions in a target, attaching the real upstream name. You stop reversing code that already has public source.

A persistent symbol KB

Every name, type, struct, and comment lives in a database keyed to a content identity, with provenance and a confidence score. It survives rebuilds instead of dying with the file.

Cross-version carry-over

When a new .wasm drops, WARDEN diffs it, ports annotations to unchanged and moved functions, and surfaces only the genuine deltas. Reversing becomes incremental.

See the whole thing in one command

The core path is pure Python + standard library. No Ghidra, no Emscripten, no native toolchain required to fork and run.
git clone https://github.com/purpshell/warden.git
cd warden
python -m venv .venv && source .venv/bin/activate
pip install -e .

warden demo            # runs the entire pipeline, offline
warden demo generates sample modules and walks the system end-to-end: ingest, Oracle identification, agent crew, ship a new version, diff and carry-over. You watch coverage climb to 100% and a v1 → v2 semantic changelog get produced with zero manual work.

Start here: the 60-second quickstart

Install WARDEN and run the full pipeline on your own module.

How it works

Stable identity

Each function gets a content identity (structural skeleton + call-neighborhood + type signature) that stays constant across rebuilds even when its table index shifts. Annotations attach to that, not to an offset. Read the concepts →

One engine, two jobs

The same fingerprint/similarity engine powers both the Oracle (match against compiled ground truth) and the diff carry-over (match against the previous version).

A provenance economy

Every write records who made it: human, oracle, export, agent, or diff-carry, along with a confidence score. Human edits are sovereign; agents only overwrite lower-confidence agent output. That’s what makes it safe to re-run the whole crew on every update.

Agents do the labor

A propose, verify, write-back crew fills the KB without proportional human time. Runs with zero dependencies via an offline heuristic, or upgrades to a real LLM crew. Read about agents →

What “100% reverse engineered” means here

Perfect source recovery is impossible. The compiler destroyed that information. WARDEN targets a rigorous, achievable 100% along three axes.
Every function has some binding: an Oracle-confirmed real name, a recovered name, or an agent proposal with a confidence score. No anonymous func_412 ever remains.
Reconstructions are differentially executed against the original until outputs match. Determinism verification runs today; the wasm2c differential harness activates when a C toolchain is present. Read about verification →
Across versions, every byte-level delta is mapped to a function and a semantic explanation. Nothing changes silently. Read about diffing →
WARDEN is alpha. The spine (ingest, KB, identity, diff, Oracle matching, exporters, the agent loop, and determinism verification) runs today. The deeper integrations (Ghidra round-trip, the full emsdk corpus farm, the wasm2c verifier, and a UX) are scaffolded with clear interfaces. See the roadmap and honest limits.
Last modified on June 7, 2026