Core concepts

Three facts drive every decision in WARDEN. Internalize these and the whole tool follows.

1. Annotations attach to a stable identity, not an index

The hard problem in “keep it defined” is: what do you key annotations on when offsets and table indices shift on every build? WARDEN computes a content identity per function: a composite of its structural skeleton (control-flow and call shape with literals stripped), its call-neighborhood (which imports it calls), and its surviving type signature. The same logical function keeps the same stable_id across rebuilds even when its table index changes.

Annotations are stored keyed on stable_id, not on a per-version row. That is the entire carry-over mechanism: when a new .wasm contains a function with a stable_id you’ve already named, the name is simply already there. Nothing to port.

A function carries four fingerprints, each robust to a different kind of change:

Structural

A hash of the normalized control-flow/call skeleton (the backbone), with constants and call targets dropped.

Semantic

An opcode-class histogram, the call-out signature, and the type signature (free in WASM).

Fuzzy

A deterministic MinHash over normalized instruction n-grams, so near-matches across -O levels still score highly.

Exact

SHA-256 of the raw body, for verbatim “unchanged” detection.

2. One fingerprint engine powers both the Oracle and the diff

“Identify against compiled ground truth” and “match against the previous version” are the same operation pointed at different corpora. WARDEN has a single similarity engine that combines exact-body equality, structural-skeleton equality, fuzzy MinHash Jaccard, opcode-histogram cosine, and call-neighborhood overlap into one score.

The Oracle runs that engine against a corpus of labeled runtime functions compiled from the open-source toolchain.
The diff engine runs the very same engine against the previous version of the target to classify and carry annotations.

Improve the engine once and both get sharper.

3. Every write has a provenance and a confidence

This is what makes it safe to re-run the entire agent crew on every update without clobbering verified work. Each symbol records who wrote it and how sure they are, and the knowledge base enforces an economy at the write layer.

Provenance	Rank	Meaning
`human`	highest	A person set or confirmed it. Sovereign; can be locked.
`oracle`	high	Matched to known upstream runtime/libc code.
`export` / `import`	medium	Recovered from the name section, exports, or imports.
`string-xref`	medium	Implied by a referenced string.
`diff-carry`	low	Ported from a fuzzy match on the previous version (penalized confidence).
`agent`	lowest	Proposed by the agent crew, with a calibrated confidence score.

The rules, enforced in KnowledgeBase.upsert_symbol:

A human write always wins

And a human can lock a symbol, making it immutable to every automated source.

An agent may only fill an empty slot or overwrite lower-confidence agent output

It can never clobber Oracle, human, or higher-confidence work.

Other automated sources resolve by rank, then confidence

So a name-section name beats a string-xref guess, and a high-score Oracle match beats both.

When scripting against the library, always write symbols through upsert_symbol so the economy is enforced. Never insert into the symbols table directly. Doing so can accidentally clobber a human’s verified name.

Putting it together

Ingest seeds for free

Exports, imports, and the name section give immediate partial coverage on load.

The Oracle collapses runtime

A large fraction of a real module becomes known libc/runtime code, with real names.

Agents fill the remainder

Human effort concentrates only on the application-specific, low-confidence functions.

The payoff arrives on the next version: the diff carries everything forward and hands you only the handful of functions that genuinely changed.

Next: the reverse-engineering loop

See these three ideas at work in the practical, command-by-command loop you run on a real module.

Getting started

The pipeline

Reference

Project

1. Annotations attach to a stable identity, not an index

Structural

Semantic

Fuzzy

Exact

2. One fingerprint engine powers both the Oracle and the diff

3. Every write has a provenance and a confidence

Putting it together

Ingest seeds for free

The Oracle collapses runtime

Agents fill the remainder

Next: the reverse-engineering loop

​1. Annotations attach to a stable identity, not an index

Structural

Semantic

Fuzzy

Exact

​2. One fingerprint engine powers both the Oracle and the diff

​3. Every write has a provenance and a confidence

​Putting it together

Ingest seeds for free

The Oracle collapses runtime

Agents fill the remainder

Next: the reverse-engineering loop

1. Annotations attach to a stable identity, not an index

2. One fingerprint engine powers both the Oracle and the diff

3. Every write has a provenance and a confidence

Putting it together