1. Annotations attach to a stable identity, not an index
The hard problem in “keep it defined” is: what do you key annotations on when offsets and table indices shift on every build? WARDEN computes a content identity per function: a composite of its structural skeleton (control-flow and call shape with literals stripped), its call-neighborhood (which imports it calls), and its surviving type signature. The same logical function keeps the samestable_id across rebuilds even when its table index changes.
Annotations are stored keyed on
stable_id, not on a per-version row. That is the entire
carry-over mechanism: when a new .wasm contains a function with a stable_id you’ve already
named, the name is simply already there. Nothing to port.Structural
A hash of the normalized control-flow/call skeleton (the backbone), with constants and call
targets dropped.
Semantic
An opcode-class histogram, the call-out signature, and the type signature (free in WASM).
Fuzzy
A deterministic MinHash over normalized instruction n-grams, so near-matches across
-O
levels still score highly.Exact
SHA-256 of the raw body, for verbatim “unchanged” detection.
2. One fingerprint engine powers both the Oracle and the diff
“Identify against compiled ground truth” and “match against the previous version” are the same operation pointed at different corpora. WARDEN has a single similarity engine that combines exact-body equality, structural-skeleton equality, fuzzy MinHash Jaccard, opcode-histogram cosine, and call-neighborhood overlap into one score.- The Oracle runs that engine against a corpus of labeled runtime functions compiled from the open-source toolchain.
- The diff engine runs the very same engine against the previous version of the target to classify and carry annotations.
3. Every write has a provenance and a confidence
This is what makes it safe to re-run the entire agent crew on every update without clobbering verified work. Each symbol records who wrote it and how sure they are, and the knowledge base enforces an economy at the write layer.| Provenance | Rank | Meaning |
|---|---|---|
human | highest | A person set or confirmed it. Sovereign; can be locked. |
oracle | high | Matched to known upstream runtime/libc code. |
export / import | medium | Recovered from the name section, exports, or imports. |
string-xref | medium | Implied by a referenced string. |
diff-carry | low | Ported from a fuzzy match on the previous version (penalized confidence). |
agent | lowest | Proposed by the agent crew, with a calibrated confidence score. |
KnowledgeBase.upsert_symbol:
A human write always wins
And a human can lock a symbol, making it immutable to every automated source.
An agent may only fill an empty slot or overwrite lower-confidence agent output
It can never clobber Oracle, human, or higher-confidence work.
Putting it together
Ingest seeds for free
Exports, imports, and the name section give immediate partial coverage on load.
The Oracle collapses runtime
A large fraction of a real module becomes known libc/runtime code, with real names.
Agents fill the remainder
Human effort concentrates only on the application-specific, low-confidence functions.