Cross-version diff & carry-over

When the vendor ships a new .wasm, most of your reverse-engineering work is still valid. The same functions exist, doing the same things, at shifted table indices. The diff engine recovers that work automatically, classifies every function in the new binary, ports annotations forward, and hands you a focused changelog of what actually changed in the application. This is the diff and carry-over stage of the WARDEN loop: diff/engine.py, driven by warden diff.

The diff engine reuses the identical fingerprint and similarity engine as the Oracle: same hash compositions, same similarity() function, different corpus. Any improvement to the fingerprinting algorithm benefits both. See core concepts for the full fingerprint breakdown.The fuzzy similarity score is sharper than a plain MinHash compare. It blends the MinHash term with the call-neighborhood, opcode-histogram, type-signature, and instruction-count terms, and exposes every sub-score so you can see exactly why a pair scored the way it did. This reduces false modified and new classifications. Stable identity and determinism are unchanged: the same input still yields the same stable_id and the same fingerprint, and similarity() is still a pure function of its two fingerprints.

How it works

diff_versions() loads all defined functions for both versions from the KB, reconstructs their fingerprints from the stored rows, and runs three passes in sequence. Each pass consumes functions from a shared “unmatched” set so a function is never counted twice.

Pass 1: exact-body match

Functions with an identical exact_hash (SHA-256 of the raw body bytes) are matched first. This is O(n) via a dictionary lookup. No similarity math is needed.

Same index in both versions → classified unchanged.
Different index → classified moved.

Score is 1.0 for both. These functions already share a stable_id row in the symbols table, so their annotations are already there. There is nothing to port.

Pass 2: stable identity match

Functions that share a stable_id but whose raw body differs slightly (for example, a literal constant was patched in a way that structural_hash absorbs) are matched by identity key lookup. Because the KB’s symbols table is keyed on stable_id (not on a version or function index), these functions also already share an annotation row.

Classified unchanged or moved by the same index-comparison rule.
Score is 0.99 to distinguish from a literal exact-body match.

Both Pass 1 and Pass 2 carry annotations verbatim and for free: the functions literally point at the same symbol row.

Pass 3: greedy fuzzy match

Remaining functions (those that changed meaningfully enough to get a new stable_id) are paired by the highest similarity().overall score among all remaining-from candidates. The overall score is a sharper blend than a plain MinHash compare. A single fuzzy term alone pairs functions that share opcode shape but do different work, which inflates the modified count and hides genuine deltas. So the blend now combines the MinHash term with four corroborating signals: the call-neighborhood overlap, the opcode-histogram cosine, a type-signature match, and an instruction-count closeness term. Two functions that only look alike at the opcode level no longer pass the threshold unless their call targets, type signatures, and sizes line up too. The structural-skeleton hash still contributes a small bonus when it matches exactly.

overall = 0.40 × fuzzy_jaccard          # MinHash over 4-gram opcode tokens
        + 0.16 × histogram_cosine        # opcode-class distribution
        + 0.10 × call_neighborhood_jaccard  # shared import call targets
        + 0.16 × type_signature_match    # 1 if type signatures are equal else 0
        + 0.10 × instruction_count_ratio # min/max of the two instruction counts
        + 0.08 × structural_match        # 1 if structural_hash matches else 0

The weights sum to 1.0, so overall stays in [0, 1]. Each term is symmetric in its two fingerprints, so similarity(a, b) equals similarity(b, a). similarity() exposes each of these as a sub-score on the returned MatchScore (fuzzy, histogram, call_overlap, type_match, count_ratio, and the boolean structural), plus a sub_scores dict that carries all six as numbers, so you can read off exactly which signals carried a pairing and which dragged it down. An exact-body match short-circuits to 1.0 as before. A pair is accepted if the best score is at or above MODIFIED_THRESHOLD = 0.6 and the two functions share a structural skeleton (structural is true). A cosmetic edit (a bounds check, an extra arithmetic op) keeps the control-flow and call skeleton, so the structural floor rejects unrelated functions that only scored highly on the coarse shape terms without losing genuine modifications. Accepted pairs are classified modified. Functions left unmatched after all three passes become new (in the newer version only) or deleted (in the older version only). Because the extra terms make each accepted pair better corroborated, the sharper score cuts down on false modified and new classifications: lookalikes no longer steal a match slot from the function that genuinely evolved.

The 0.6 threshold is deliberately lenient. A modified function that retains its general call pattern and opcode character but gained a bounds check or an extra branch will typically score 0.65–0.85. Setting the threshold higher risks losing carry-over for legitimately modified functions; setting it lower creates false pairings. The value is defined as MODIFIED_THRESHOLD in diff/engine.py and can be overridden if you are working with a heavily optimized corpus.The sharper blend keeps the same threshold meaningful. A real modification still clears 0.6 because its call neighborhood, type signature, and size travel with it, while a coincidental opcode lookalike that used to scrape past on the fuzzy term alone now falls short on the corroborating terms. The score stays a deterministic, pure function of the two fingerprints, so the same pair always produces the same number.

Classification summary

Class	Meaning	Index change	Annotation ported?
`unchanged`	Identical body (`exact_hash` or `stable_id` match), same index	No	Already shared
`moved`	Identical body or identity, different index	Yes	Already shared
`modified`	Fuzzy match above threshold; body changed meaningfully	Maybe	Copied with penalty
`new`	No match found in the older version	N/A	None; queued for analysis
`deleted`	No match found in the newer version	N/A	Archived

Annotation carry-over: identical vs. fuzzy

Unchanged and moved: zero work

Functions classified unchanged or moved share the same stable_id between both versions. Because the symbols table is keyed to stable_id, not to a version row or function index, these functions already point at the same symbol. There is nothing to copy. The name, type signature, summary, provenance, and confidence from your v1 work are immediately visible in v2 with zero intervention.

Modified: copied with a confidence penalty

When a fuzzy match is accepted, _carry_symbol() runs:

Look up the older function’s symbol by its stable_id.
Check whether the newer function’s stable_id already has a symbol. If it does, leave it alone. A pre-existing annotation from a higher-authority source takes precedence.
Write a new symbol for the newer stable_id with:
- The same name, type signature, and summary as the source.
- provenance = "diff-carry" (rank 40 in the provenance economy, below oracle (90) but above agent (30)).
- confidence = old.confidence × CARRY_PENALTY where CARRY_PENALTY = 0.7.

A function named parseToken with confidence 0.92 after the Oracle and human review will arrive in v2 as parseToken with confidence 0.644 and provenance diff-carry. The penalty signals “probably still right, worth a second look.” Agents will not overwrite this (their rank is lower); Oracle re-identification can upgrade it if the function still hits a corpus signature. The evidence field records the carry trail:

{"kind": "carry-over", "detail": "from a3f1b2c4d5e6 score=0.78"}

diff_versions() carries only when carry=True (the default). Pass --no-carry to warden diff to produce a classification report without touching the symbols table. This is useful for a dry-run assessment of what changed.

The semantic changelog

After classification, render_changelog() produces a human-readable report that does two things ordinary binary diff tools cannot: it counts only app-code changes and explains the rest as runtime/toolchain churn. A function is tagged as runtime churn if its name (from the current or previous version) starts with any of a list of known prefixes:

emscripten_  __em_  wasi_  dlmalloc  memcpy  memset  malloc  free
__cxa_  pthread_  stackSave  stackRestore  __wasm_call_ctors  ...

The changelog separates the two buckets so a 300-function change caused by an Emscripten version bump does not bury the 6 genuine application changes you actually need to review.

Sample changelog

# WARDEN changelog: v1 -> v2

- unchanged: 241
- moved:      18
- modified:   47  (6 app, 41 runtime/toolchain churn)
- new:         3
- deleted:     1
- annotations carried forward: 5

## Needs review (genuine app deltas)

  [MODIFIED] parseToken (score 0.78)
  [MODIFIED] verify_license (score 0.81)
  [MODIFIED] crypto_init (score 0.71)
  [MODIFIED] handle_request (score 0.67)
  [MODIFIED] dispatch_message (score 0.74)
  [MODIFIED] build_response (score 0.69)
  [NEW] verifyLicense_v2
  [NEW] audit_log_append
  [NEW] rate_limit_check

The “41 runtime/toolchain churn” line represents functions that matched Emscripten or musl prefixes (for example, an Emscripten 3.1.55→3.1.61 upgrade). These are silently correct and require no human attention.

The `warden diff` command

warden diff <from-label> <to-label> [--no-carry] [--db <path>]

Run this after ingesting both versions. The result is stored in the diffs table as a JSON DiffReport and printed as the semantic changelog.

# Ingest the new version (existing version already in the KB)
warden ingest app_v2.wasm --label v2

# Diff, carry annotations forward, and print the changelog
warden diff v1 v2

# Classification report only, no annotation writes
warden diff v1 v2 --no-carry

After warden diff completes, you can inspect carry-over results directly:

# See which v2 functions have diff-carry provenance
warden funcs v2

# Inspect a specific function
warden show v2 <index>

Check coverage immediately after diffing. A typical update where only a few functions changed will carry coverage from wherever v1 left off to nearly the same number on v2, with zero manual work.

warden coverage v2

Full pipeline: v2 in practice

Ingest the new version

warden ingest app_v2.wasm --glue app_v2.js --label v2

Seeds any names the new binary exposes via exports, imports, or name section.

Diff and carry

warden diff v1 v2

Runs all three passes, writes carried annotations, prints the semantic changelog.

Review only the app deltas

The changelog’s “Needs review” section lists the functions that actually changed in application code. Inspect each:

warden show v2 <index>

If the carried name is still correct, lock it:

warden set-name v2 <index> <name>

Re-run the Oracle and agents on new functions

New and heavily modified functions have no annotation yet. The Oracle may identify runtime additions from a toolchain bump; agents cover the rest.

warden oracle identify v2 --store oracle.json
warden agent v2

Export a deliverable

warden export v2 --format pseudo
warden export v2 --format ghidra --out v2_rename.py

What the `DiffReport` contains

The full report is stored in the diffs table as JSON (the result of DiffReport.as_dict()) and contains every Change record:

{
  "from": "v1",
  "to": "v2",
  "summary": {
    "unchanged": 241,
    "moved": 18,
    "modified": 47,
    "new": 3,
    "deleted": 1,
    "app_modified": 6,
    "runtime_churn": 41,
    "carried_symbols": 5
  },
  "changes": [
    {
      "classification": "modified",
      "from_index": 112,
      "to_index": 114,
      "name": "parseToken",
      "stable_from": "a3f1b2c4d5e6...",
      "stable_to": "f9e8d7c6b5a4...",
      "score": 0.78,
      "review": true,
      "runtime": false,
      "carried_name": "parseToken"
    }
  ]
}

review: true marks non-runtime modified functions. These are exactly the functions that appear in “Needs review” in the changelog. carried_name is non-null when _carry_symbol() wrote a symbol for this pairing.

Provenance economy position of diff-carry

diff-carry sits at rank 40 in the provenance hierarchy: below oracle (90), export (60), and import (55), but above agent (30). In practice this means:

An Oracle re-identification pass on v2 will upgrade a diff-carry annotation if the function still hits a corpus signature at score ≥ 0.82.
An agent pass will not overwrite a diff-carry annotation, regardless of claimed confidence.
A warden set-name call (provenance human) always wins.

This ordering ensures carried knowledge is never silently destroyed by a lower-authority source, while higher-authority passes can still refine or correct it.

Time-travel queries

Once you have ingested and diffed several versions, the KB holds the full history of every function across the timeline. The KnowledgeBase class exposes read-only methods that answer the common history questions directly, without re-running a diff:

when_first_seen(stable_id) answers “when did this function first appear?” It returns the earliest version label that contains the function’s stable identity.
evolution_of(stable_id) answers “when did its body actually change?” It walks the versions the function appears in and reports each point where the body changed against where it stayed the same.
symbol_history(stable_id) answers “who named it and when?” It returns the chain of symbol rows for the identity, with the name, provenance, and confidence recorded at each step.
find_by_name(name) resolves a human-facing name back to the stable identities that have carried it, so you can start from a name and look up its history.
resolve_stable_id(query) takes a full stable_id, a unique stable_id prefix, or a name and returns a single stable_id string (or None), so you can feed a user query straight into the history methods.

from warden.kb.database import KnowledgeBase

with KnowledgeBase("project.warden.db") as kb:
    stable_id = kb.resolve_stable_id("parse_token")
    if stable_id is not None:
        print(kb.when_first_seen(stable_id))   # earliest version label
        for step in kb.evolution_of(stable_id):
            print(step)                        # one row per version, body changed or not

These methods are the library surface for time-travel questions. The warden history command exposes the same answers on the command line (it is being wired separately), so you can ask for a function’s first appearance, body-change timeline, and naming history without writing any Python.

Core concepts

Stable identity, the shared fingerprint engine, and the provenance economy: the three ideas behind how carry-over works.

CLI reference

Full flag documentation for warden diff and every other command.

Getting started

The pipeline

Reference

Project

Cross-version diff & carry-over

How it works

Pass 1: exact-body match

Pass 2: stable identity match

Pass 3: greedy fuzzy match

Classification summary

Annotation carry-over: identical vs. fuzzy

Unchanged and moved: zero work

Modified: copied with a confidence penalty

The semantic changelog

Sample changelog

The `warden diff` command

Full pipeline: v2 in practice

What the `DiffReport` contains

Provenance economy position of diff-carry

Time-travel queries

Core concepts

CLI reference

​How it works

​Pass 1: exact-body match

​Pass 2: stable identity match

​Pass 3: greedy fuzzy match

​Classification summary

​Annotation carry-over: identical vs. fuzzy

​Unchanged and moved: zero work

​Modified: copied with a confidence penalty

​The semantic changelog

​Sample changelog

​The warden diff command

​Full pipeline: v2 in practice

​What the DiffReport contains

​Provenance economy position of diff-carry

​Time-travel queries

Core concepts

CLI reference

How it works

Pass 1: exact-body match

Pass 2: stable identity match

Pass 3: greedy fuzzy match

Classification summary

Annotation carry-over: identical vs. fuzzy

Unchanged and moved: zero work

Modified: copied with a confidence penalty

The semantic changelog

Sample changelog

The `warden diff` command

Full pipeline: v2 in practice

What the `DiffReport` contains

Provenance economy position of diff-carry

Time-travel queries