Three honest axes of “100%”
“100% reverse engineered” is a precise, three-part claim. Each axis addresses a different failure mode.| Axis | Claim | Failure mode it guards |
|---|---|---|
| Symbol coverage | Every defined function has a binding: a name, a provenance, and a confidence score. No func_412 stays anonymous. | Dark spots that conceal unknown behavior. |
| Behavioral equivalence | Reconstructions are differentially executed against the original until their outputs and side effects match on a corpus of inputs. | Plausible-sounding names that describe the wrong behavior. |
| Change accountability | Across any two versions, every delta is mapped to a function and a semantic explanation. Nothing changes silently. | Missed security-relevant changes in an update. |
Running the check
verify command reads the file, runs the determinism check, and prints a colored summary.
It then calls differential_plan and reports whether the optional behavioral harness is ready:
stable_id is unstable, the exit line prints in red and the offending
function indices are listed. verify takes no options beyond the .wasm path. The determinism
check always runs; differential readiness is always reported.
What runs today: verify_determinism
verify_determinism(wasm_bytes: bytes) -> DeterminismReport is the zero-dependency check that
runs on every call to warden verify. It requires only the Python standard library.
How it works
The function parses the same raw bytes twice viaparse_module, calls fingerprint_function on
each pair of defined functions, and compares the resulting stable_id strings byte-exactly. The
returned DeterminismReport carries:
total: number of defined (non-import) functions examined.stable: count whosestable_idmatched across both parse runs.unstable: list of function indices where the id differed.ok:Truewhenunstableis empty.summary: a one-liner such as"42/42 functions fingerprint deterministically".
Why it matters for carry-over
Thestable_id is WARDEN’s stable function identity: a composite of the structural fingerprint,
call-neighborhood, and type signature that stays constant across rebuilds even when table indices
shift. The entire annotation carry-over mechanism (the feature that makes RE incremental rather
than Sisyphean) depends on every stable_id being reproducible from the binary bytes alone.
Because the check operates on raw bytes with no external tooling, it runs identically in CI, in a
sandboxed environment, and on a laptop with nothing native installed.
Differential execution (runs today)
Thewarden.interp mini-interpreter is the middle tier: it makes behavioral equivalence
runnable right now, with zero native toolchain required, for the integer subset that covers the
vast majority of Emscripten arithmetic and glue code.
What the interpreter covers
execute_function is a pure-Python stack-machine evaluator for the i32 integer subset. It models:
- All i32 arithmetic and bitwise opcodes (
add,sub,mul,and,or,xor,shl). - All i32 comparisons, signed and unsigned (
eq,ne,lt_s,gt_s,le_s,ge_s,eqz). - Structured control:
block,loop,if/else/end,br,br_if,return. - Local variable operations:
local.get,local.set,local.tee. - Direct function calls (with recursive fuel tracking).
- Linear memory loads and stores (
i32.load,i32.store).
UnsupportedExecution. The harness catches this and records
the pair as “undecided” rather than crashing, so a partially-modeled module still yields useful
results for the functions that are covered.
Execution is deterministic by construction: the result depends only on the module bytes, the
function index, and the argument vector. No wall-clock, no RNG, no external state.
Running one function
exec command looks up the named project, parses its .wasm, locates function <index>,
and executes it on the provided arguments.
execute_function accepts optional keyword arguments:
| Argument | Default | Purpose |
|---|---|---|
host | None | Callback for imported function calls; absent → imports are no-ops |
memory | Zeroed 64 KiB | Pre-seeded bytearray; observe stores after the call |
fuel | 100000 | Instruction budget; exceeded → UnsupportedExecution |
Differential execution
differential_execute runs two functions from two modules over the same input corpus and reports
per-input agreement:
match field is True when both sides return identical result stacks. If either side raises
UnsupportedExecution, that side records None and match is False (undecided, not wrong).
Concrete example. parse_token v1 and v2 differ structurally (v2 adds a bounds check), but
the bounds-check result is dropped before the return. differential_execute proves they are
behaviorally equivalent across the full input corpus: every row shows match: True.
internal_crc, by contrast, shows match: False on most inputs, which is a genuine behavioral change.
differential_execute never mutates the modules or functions it receives, and each input gets a
fresh zeroed memory. You can run it in a loop across a large corpus without side effects.Three-tier verification model
With the interpreter in place, WARDEN now has three stacked tiers for behavioral equivalence:| Tier | Dependency | Coverage |
|---|---|---|
warden.interp (mini-interpreter) | Pure Python, stdlib only | i32 integer subset; runs today on any machine |
wasm2c / w2c2 + cc (native harness) | WABT + C compiler | Full numeric and memory behavior; corpus-bounded |
| SeeWasm (symbolic cross-check) | SeeWasm (future) | Path-condition soundness for security-critical branches |
The optional differential harness
The behavioral harness activates only when the right native toolchain is present. Nothing silently pretends a check ran. WARDEN reports readiness honestly.Tooling detection: tooling_status()
tooling_status() uses shutil.which to probe PATH for four tools:
| Field | Binary probed | Role |
|---|---|---|
wasm2c | wasm2c | WABT’s WebAssembly-to-C transpiler |
w2c2 | w2c2 | Alternative wasm-to-C transpiler (faster compile) |
cc | cc, clang, or gcc (first found) | C compiler to build the lifted output |
wabt_validate | wasm-validate | Structural validation of the .wasm before lifting |
can_differential property is True when (wasm2c or w2c2) and cc. That is the minimum
required to execute the harness. wabt_validate is independent, useful for confirming the input
is well-formed before spending time lifting it.
The plan: differential_plan(wasm_path)
differential_plan(wasm_path: str) -> dict calls tooling_status() and returns a dict
describing what would run and whether it can:
ready is False, the note field tells you exactly what to install. The concrete steps
it describes, in order:
Transpile the original to C
wasm2c (from WABT) produces a self-contained C file that is functionally equivalent to the
original by construction.Compile the lifted C to a native executable
wasm-rt-impl.c is included with WABT.Compile the agent/LLM reconstruction the same way
The reconstruction is compiled identically to produce a second native executable.
Activating the harness
warden verify app_v1.wasm will report
differential equivalence ready: True.
What behavioral equivalence actually claims
When the differential harness reports a function as verified, the precise claim is: corpus-bounded behavioral equivalence: the reconstruction agrees with the original on every input tried, to the depth the fuzzer reached. This is strong, automatable evidence. It is not a formal proof. The corpus is finite, and an adversarial input outside it could in principle expose a divergence. Formal equivalence checking via SMT-based symbolic execution over all inputs is a future direction for cryptographically critical functions, but it is not what the current harness provides.For most reverse-engineering purposes (understanding behavior, tracking changes, writing
accurate documentation), corpus-bounded evidence is sufficient and is considerably stronger
than “the name sounds right.”
Future dynamic ground-truth hooks
Three additional sources of behavioral evidence are identified in the architecture vision and scaffolded as future work.SeeWasm: symbolic cross-check
SeeWasm: symbolic cross-check
For specific claimed path conditions (“this branch fires when
magic == 0xCAFE”), SeeWasm
can confirm the condition symbolically rather than relying on the model’s assertion. This
would plug in as an additional gate in the verifier after the differential harness, providing
path-condition soundness for security-critical branches.Wasabi / Frida / Chrome DevTools: dynamic ground truth
Wasabi / Frida / Chrome DevTools: dynamic ground truth
Wasabi is a WebAssembly instrumentation framework; Frida and Chrome DevTools expose runtime
traces. Live execution traces confirm which element-table targets actually fire, which atomic
operations guard which memory regions, and what values flow through indirect calls. This is
precisely the information hardest to recover statically. These hooks would validate the thread
model and dynCall attribution produced by the concurrency agent.
Oracle-as-oracle: free verification for matched functions
Oracle-as-oracle: free verification for matched functions
Any function matched by the Emscripten Oracle to a known upstream source (musl, dlmalloc, the
Emscripten runtime) is verified by definition: its behavior is already documented in the
public source. The reconstruction can be checked directly against that real source code. This
is available today for Oracle-matched functions, at no additional toolchain cost.
Verification in the confidence economy
Every symbol written to the KB carries aprovenance and a confidence score. The verifier
controls whether an agent proposal is promoted:
- Oracle matches (
provenance="oracle") are inherently verified against compiled ground truth. They receive the highest confidence. - Agent proposals (
provenance="agent") are gated by the verifier before write-back. Until behavioral equivalence is confirmed, they carry a confidence below 1.0 and are marked unverified. - Human names (
provenance="human",locked=True) are sovereign. The verifier does not touch them and they cannot be overwritten by agent passes.
Summary of checks
| Check | Dependency | Runs today | Guarantee |
|---|---|---|---|
verify_determinism | Pure Python, stdlib only | Yes | stable_id byte-stability, which guards all carry-over |
warden.interp differential execution | Pure Python, stdlib only | Yes | Corpus-bounded behavioral equivalence for the i32 subset |
| Structural validation | wasm-validate (WABT) | When installed | Input is a well-formed WebAssembly module |
| Differential harness | wasm2c or w2c2 + cc | When installed | Corpus-bounded behavioral equivalence (full instruction set) |
| Oracle source verification | Emscripten Oracle corpus | Today, Oracle-matched functions only | Exact behavioral match to upstream source |
| Symbolic cross-check | SeeWasm | Future | Path-condition soundness for critical branches |
| Dynamic trace validation | Wasabi / Frida / DevTools | Future | Runtime dynCall and atomic ground truth |
Phase 4: agents
Where the proposals the verifier gates actually come from.
Phase 3: diff
How verified annotations carry forward when a new version ships.