Oracle quality is bounded by corpus coverage
The Emscripten Oracle is only as good as the flag-matrix corpus you give it. The design calls for a CI farm that sweeps every taggedemsdk release across the full build matrix: -O0
through -Oz, -pthread on/off, LTO on/off, exception modes. In practice, the corpus you
supply with warden oracle build is whatever labeled .wasm modules you have on hand. Gaps
in that matrix produce gaps in identification.
Concretely:
- A function compiled with
-O3 -fltothat does not appear in your corpus at those flags may fail to match even if the same function compiled at-O2is present, because LTO inlining and cross-module constant folding can materially change the structural skeleton and MinHash signature. - An Emscripten version you have not built corpus modules for falls back to fuzzy matching against the nearest version, which degrades precision on functions whose codegen changed between those releases.
- Custom toolchains (forks of Emscripten, emscripten-fastcomp artifacts, or modules compiled
with a non-Emscripten WASM backend) are outside the Oracle’s scope entirely. Fuzzy
fingerprinting (
minhash_jaccard+ histogram cosine) still runs, so partial matches are surfaced rather than nothing, but confidence scores will be low and version inference will be unreliable.
Obfuscation, custom allocators, and whole-program LTO inlining
WARDEN’s function identity (stable_id) is keyed to structural skeleton, call-neighborhood,
type signature, and instruction count. All are derived from function boundaries that the
WebAssembly binary exposes explicitly. Techniques that blur those boundaries weaken matching
proportionally.
Whole-program LTO inlining. An upstream library function aggressively inlined into its
callers no longer appears as a discrete function in the binary. The Oracle cannot match what is
not there; the diff engine cannot carry over an annotation for a function that was absorbed into
several others.
Custom allocators. A module that replaces dlmalloc/emmalloc with a bespoke heap
implementation may look nothing like the Oracle’s memory-management signatures. Manual
annotation with warden set-name is the fallback.
Deliberate obfuscation. Opaque predicates, control-flow flattening, and instruction
substitution degrade the structural hash and MinHash toward random on maximally scrambled code.
The exact hash (SHA-256 of the raw body) remains accurate for “unchanged” detection even under
obfuscation, but semantic identification fails.
There is no principled workaround for heavy obfuscation short of manual RE. WARDEN reduces the
surface area of unknown code; it does not eliminate it.
”Behavioral equivalence” is corpus-bounded evidence, not formal proof
WARDEN uses the phrase “100% behavioral equivalence” to mean: differential execution against the original.wasm produced matching outputs and side effects on the inputs tested. This is
strong evidence, not a mathematical guarantee.
warden verifycurrently provides determinism checking (does re-parsing the binary produce the same KB?) and a readiness report for the wasm2c differential harness. The harness itself activates only whenwasm2corw2c2and a C toolchain are present at runtime.- Input coverage is finite. A differential test corpus that does not exercise a code path cannot confirm or deny equivalence on that path.
- Formal verification via symbolic execution or deductive proof is not implemented. SeeWasm integration is listed in the vision as a future capability for specific path conditions, not a shipped feature.
When WARDEN records
confidence=1.0 for an Oracle-matched function, that reflects the
identity match against a labeled ground-truth corpus, not a differential execution result. The
two are complementary; neither subsumes the other.Heavy refactors legitimately produce large review queues
WARDEN’s diff engine (warden diff) classifies each function as identical, moved,
modified, new, or deleted using a composite similarity scorer (exact hash, structural
hash, fuzzy MinHash Jaccard, histogram cosine, call-neighborhood overlap). It carries
annotations forward automatically for unchanged and moved functions, and flags modified
functions for review.
On a release where the vendor did a large refactor (split a module, merged hot paths, changed
the calling convention of a widely-used helper), the diff engine will correctly surface a large
fraction of functions for review, because they genuinely changed. WARDEN reduces that queue by
filtering out toolchain churn (Emscripten version bumps auto-attributed by the Oracle) and by
carrying forward the large unchanged majority.
It does not reduce the queue to zero when the application genuinely changed significantly.
Expect a non-trivial review list for major version jumps; that list is accurate, not inflated.
The offline agent backend is a heuristic placeholder
Withoutpip install -e '.[agents]' and either OPENAI_API_KEY or ANTHROPIC_API_KEY, the
agent crew runs the OfflineHeuristicBackend. Its logic is deliberate and documented:
String cross-reference
If the function references strings, derive a name from the first one. Confidence ceiling:
0.45.Call neighborhood
If the function has identifiable import call targets, name it after the first. Confidence
ceiling:
0.30.SIMD immediate decoding covers documented ranges; unknown opcodes degrade gracefully
The opcode decoder handles the WASM MVP plus the extensions an Emscripten module is expected to emit: sign-extension, non-trapping float-to-int (0xFC prefix), bulk memory, reference types,
tail calls, threads/atomics (0xFE prefix), and SIMD (0xFD prefix). SIMD sub-opcode
immediates are decoded by documented ranges; not all 256 possible sub-opcodes in the 0xFD
space have fully characterized immediate layouts in the current implementation.
When an unknown or undocumented sub-opcode is encountered, an UnsupportedOpcode exception is
raised and caught at the function level. The function is marked with a disasm_error note in
its fingerprint and the error surfaces in warden show. The function is still
byte-fingerprintable via its exact SHA-256 hash and structural skeleton up to the unknown opcode
so it participates in “unchanged” detection in the diff engine, but instruction-level
analysis (opcode histogram, MinHash, call extraction) is incomplete for that function. Oracle
and diff matching degrade to exact-hash and partial structural comparison for affected functions.
This is a bounded, diagnosed failure mode rather than a silent corruption. If you encounter a
disasm_error note, the opcode byte and affected function index give you everything needed to
file a targeted issue.Summary
| Limit | Scope | Workaround |
|---|---|---|
| Oracle hit rate depends on corpus coverage | Oracle identification | Build corpus for target toolchain and opt flags |
| LTO inlining hides function boundaries | Oracle + diff matching | Manual warden set-name for inlined survivors |
| Custom allocators diverge from known signatures | Oracle identification | Manual annotation; add custom allocator to corpus |
| Heavy obfuscation defeats structural matching | Oracle + diff + agent | Manual RE; exact hash still detects “unchanged” |
| Behavioral equivalence is corpus-bounded evidence | Verification | Add more input coverage; use differential harness when available |
| Large refactors produce large review queues | Diff carry-over | Expected: the queue is accurate, not inflated |
| Offline backend produces low-confidence names | Agent crew | Install [agents] extra and set OPENAI_API_KEY or ANTHROPIC_API_KEY |
| Unrecognized SIMD sub-opcodes skip disassembly | Ingest + fingerprint | Exact hash still works; file an issue with the opcode byte |
Roadmap
Which of these limits are targeted in upcoming phases.
Core concepts
The identity, fingerprint, and provenance model that defines what WARDEN can and cannot know.