The agent crew

The agent crew is the labor force that keeps the knowledge base full without proportional human time. It sweeps unnamed functions bottom-up through the call graph, proposes names and summaries grounded in hard facts, gates every proposal through a verifier, and writes back under the KB’s provenance/confidence economy. The result: you can re-run the crew on every vendor update without clobbering Oracle matches, exports, or anything you’ve verified by hand.

Alpha status. The propose → verify → write-back loop, the offline, OpenRouter/Kimi, OpenAI/Codex, and Anthropic backends, and the verifier gate are implemented and running. The concurrency and type/struct roles are implemented as deterministic, zero-dependency analyzer passes (see Specialized analyzers below) and now run inside the call-graph crew as autonomous proposers that emit conservative, economy-gated names and notes for the namer. The remaining specialized agent roles (Oracle adjudicator, diff, behavioral verifier) are the intended next steps; the naming crew currently runs a single generic backend pass alongside these specialist proposers.

The core loop: propose → verify → write-back

run_agent_pass in src/warden/agents/crew.py drives one full sweep over a module version.

Gather hard facts

For every defined function in the version, gather_facts assembles a FunctionFacts object (the hallucination constraint). Every field is derived mechanically from the binary and the KB; the backend sees only what is actually in the binary. See FunctionFacts below.

Sort bottom-up

Functions are sorted by number of call targets, ascending. Leaf functions run first. As callees acquire names, the next caller’s context is richer when the backend sees it.

Skip already-confident entries

If a function already has a symbol with confidence >= 0.5 (the SKIP_CONFIDENCE constant in crew.py), or if the symbol is locked (human or Oracle), the function is skipped. This is the mechanism that makes re-running safe: the crew never overwrites high-confidence or locked work.

Backend proposes

The selected backend receives the FunctionFacts and returns a Proposal (a name, summary, confidence score, and optional refined type signature), or None to abstain.

Verifier gate

verify_proposal runs cheap sanity checks before anything reaches the KB. It rejects proposals with invalid identifiers, names shorter than two characters, confidence outside [0, 1], or string-xref claims that aren’t backed by actual referenced strings. See the verifier gate for detail.

Write-back under the economy

Accepted proposals are submitted to kb.upsert_symbol with provenance="agent". The KB’s economy decides whether to actually write: an agent proposal may only overwrite a lower-confidence prior agent entry. Human, Oracle, and higher-confidence agent entries are never touched. Each write stores the provenance trail and evidence list alongside the symbol.

After the sweep, run_agent_pass returns an AgentRunResult with counters for considered, proposed, written, rejected_by_verifier, rejected_by_economy, and skipped_existing. The call-graph strategy adds specialist_proposed and specialist_written (the specialist proposals that went through the verifier and economy) and rounds_run (how many bottom-up rounds actually ran before a fixpoint).

FunctionFacts: the hallucination constraint

@dataclass
class FunctionFacts:
    func_index: int
    stable_id: str
    type_signature: str
    call_targets: list[str]
    referenced_strings: list[str]
    raw_name: str | None
    instruction_mnemonics: list[str]
    is_exported: bool

Every field is derived mechanically from the binary and the KB. There is no inference, no LLM input. Backends receive only this struct. This is the contract that constrains hallucination: a backend cannot claim a function references a string that referenced_strings does not contain.

referenced_strings is built by walking the function’s i32.const instructions and looking up each immediate in the module’s data-section string map.
call_targets contains direct call names (with imports resolved to their import names) and <indirect> for call_indirect sites.
type_signature is the WASM type section entry. It is exact, not guessed.
instruction_mnemonics contains up to all opcodes in the function body; LLM backends truncate to the first 40 when building the user message.

The backends

Offline heuristic backend

The zero-dependency default. Deterministic, no API key, no network. Runs immediately after pip install -e . with no extras. Applies three heuristics in priority order:

Priority	Heuristic	Confidence	Trigger
1	String xref	0.45	`referenced_strings` is non-empty
2	Call-neighborhood	0.30	Function makes at least one direct call
3	Placeholder	0.12	Neither heuristic fires

The string xref heuristic is the strongest cheap signal because Emscripten modules are full of format strings, error messages, and symbol names. The placeholder heuristic ensures nothing stays anonymous; 100% symbol coverage is achievable offline. Even if an LLM backend is selected but fails at construction time (missing key, import error), make_backend silently falls back to the offline backend.

OpenRouter / Kimi backend (the cheap default)

Naming a function from grounded facts is bulk, high-volume work, not frontier reasoning, and it is gated by the verifier and the confidence economy. So the crew defaults to the cheapest capable backend. The OpenRouter backend uses the official openrouter Python SDK, so an inexpensive model like Kimi K2.6 does the naming instead of a frontier model. Selected automatically when OPENROUTER_API_KEY is set and the openrouter package is installed (pip install -e '.[agents]'). It is tried first, before OpenAI and Anthropic.

Default model: moonshotai/kimi-k2.6 (override with WARDEN_OPENROUTER_MODEL).
Aliases: --backend openrouter, --backend kimi, and --backend or all select this backend.
Routing: each call passes provider={"sort": "price"}, so OpenRouter sends it to the cheapest provider serving the chosen model.
Output: the model is asked for a JSON object and parsed leniently (the first balanced {...} object is extracted). name is run through slugify; confidence is clamped to [0.0, 1.0].

To keep naming cheap but escalate a harder role to a stronger model, route per role: run_agent_pass(kb, version_id, roles={"naming": "openrouter", "struct": "anthropic"}).

OpenAI / Codex backend

Selected automatically when OPENAI_API_KEY is set, the openai package is installed (pip install -e '.[agents]'), and OpenRouter is not selected. Uses the OpenAI Responses API with structured JSON output so the model returns {name, summary, confidence}.

Default model: gpt-5.3-codex (override with WARDEN_OPENAI_MODEL or WARDEN_AGENT_MODEL).
Aliases: --backend openai, --backend codex, and --backend oai all select this backend.
Reasoning effort: defaults to medium and can be changed with WARDEN_OPENAI_REASONING_EFFORT.
System prompt: the same RE prompt used by the Anthropic backend.
User message: contains only fields from FunctionFacts: function index, type signature, export status, call targets, referenced strings, raw name hint, and up to 40 opcode mnemonics.
Output: the response is validated and clamped. name is run through slugify; confidence is clamped to [0.0, 1.0].

Anthropic backend

Selected automatically when ANTHROPIC_API_KEY is set, the anthropic package is installed (pip install -e '.[agents]'), and the OpenAI backend is not available. Uses the Anthropic Messages API with structured JSON output via a JSON schema constraint so the model always returns {name, summary, confidence} and nothing else.

Default model: claude-opus-4-8 (override with the WARDEN_AGENT_MODEL environment variable).
System prompt: instructs the model to act as a RE assistant, propose a concise snake_case C-style identifier, write a one-sentence purpose, and emit a calibrated confidence in [0, 1]. The model is told explicitly to prefer low confidence when evidence is thin and never invent behavior unsupported by the facts.
User message: contains only fields from FunctionFacts: function index, type signature, export status, call targets, referenced strings, raw name hint, and up to 40 opcode mnemonics. No content outside these facts is sent.
Output: the response is validated and clamped. name is run through slugify to guarantee a valid identifier; confidence is clamped to [0.0, 1.0].

The JSON schema constraint used for structured output:

{
  "type": "object",
  "properties": {
    "name":       { "type": "string" },
    "summary":    { "type": "string" },
    "confidence": { "type": "number" }
  },
  "required": ["name", "summary", "confidence"],
  "additionalProperties": false
}

Backend selection

make_backend(prefer) in backends.py resolves which backend runs:

Condition	Backend chosen
`--backend offline`	`OfflineHeuristicBackend`
`--backend openrouter`, `--backend kimi`, or `--backend or`	`OpenRouterBackend` (falls back to offline if unavailable)
`--backend openai`, `--backend codex`, or `--backend oai`	`OpenAIBackend` (falls back to offline if unavailable)
`--backend anthropic`	`AnthropicBackend` (falls back to offline if unavailable)
No flag, `OPENROUTER_API_KEY` set, `openai` installed	`OpenRouterBackend`
No flag, OpenRouter unavailable, `OPENAI_API_KEY` set, `openai` installed	`OpenAIBackend`
No flag, OpenAI unavailable, `ANTHROPIC_API_KEY` set, `anthropic` installed	`AnthropicBackend`
No flag, no key or package	`OfflineHeuristicBackend`

An explicit --backend flag always beats auto-detection.

Per-role specialist routing

By default the crew runs one auto-selected backend for everything: the same backend that make_backend resolves above names functions and serves every analyzer role. This is the unchanged behavior, and you do not have to configure anything to get it. When you want more control, the crew can route individual roles to different backends or models. There are two roles you can route independently:

The naming role proposes human-readable names and summaries for unnamed functions.
The analyzer roles are the deterministic concurrency and struct passes that emit economy-gated specialist proposals and per-function notes (see Call-graph strategy and Specialized analyzers).

Each role can be pointed at its own configured backend or model. For example, you can name with one provider while the analyzer roles run a cheaper or fully offline backend, or you can pin a specific model per role without changing the global default. A role that is not configured falls back to the single auto-selected backend, so partial configuration is fine: set only the roles you care about and the rest behave exactly as before. The routing decision is deterministic. Given the same configuration and the same environment, the crew always assigns the same backend and model to each role, with no clock and no randomness. The design stays offline-first: if a role is routed to a backend that cannot be constructed (missing key or missing package), it falls back to the offline heuristic backend the same way make_backend does, so every role still works with no API key and no optional package.

Running the agent crew

Store your key and model once with warden config (no environment variables to manage), then just run warden agent:

# One-time: store the key and a model (global default).
warden config set openrouter_api_key sk-or-...
warden config set openrouter_model xiaomi/mimo-v2.5-pro

# Auto-detect backend, cheapest-first (OpenRouter, then OpenAI, then Anthropic, then offline):
warden agent v1

# Force a specific backend:
warden agent v1 --backend offline      # no key needed
warden agent v1 --backend kimi         # OpenRouter, Kimi K2.6
warden agent v1 --backend codex        # OpenAI
warden agent v1 --backend anthropic    # Anthropic

# A per-project override (this project uses a different model):
warden config set --project openrouter_model moonshotai/kimi-k2.6

# Check provider availability:
warden agent-backends

# Point at a non-default database:
warden agent v1 --db /path/to/project.db

The command prints a summary table on completion:

Agent pass: v1 (anthropic)
considered                  42
proposed                    38
written                     31
skipped (already confident)  9
rejected by verifier         2
rejected by economy          5

Re-running the command at any time is safe. Already-confident and locked entries are skipped before the backend is even called, and the economy rejects any proposal that would overwrite a stronger entry.

After running the crew, use warden coverage v1 to see how symbol coverage is split between oracle, export, agent, and human sources. Use warden funcs v1 --unnamed to find functions the crew could not name (or named only at very low confidence).

Deep analysis: one agent per function

warden agent names functions. warden deep goes further: it runs one agent per function, walked bottom-up over the call graph (leaves first), to recover not just a name but a full understanding, a variable rename map, and cleaned C.

warden deep v1 --backend kimi --parent-model moonshotai/kimi-k2.6 --watch

Each function’s agent is fed three things: its disassembly facts, its decompiled C (with generic variable names), and the recovered understanding of the functions it calls. That callee context is read from the knowledge base, not from the callees’ raw bytes, so a parent’s context stays bounded even in a module with thousands of functions. A function closes once it is understood, after which only its distilled understanding flows upward; any raw detail a parent needs is pulled from the database on demand.

Tiered models. Leaf functions are the bulk, so name them with the cheapest capable model and escalate the harder, high-fan-in parents. A good default at scale is MiMo-V2.5-Pro for leaves (cheapest output) and Kimi K2.6 for parents:
WARDEN_OPENROUTER_MODEL=xiaomi/mimo-v2.5-pro \ warden deep v1 --backend kimi --parent-model moonshotai/kimi-k2.6 --watch
Measure before you commit: a warden deep run is tracked and reversible, so run two models on a slice, compare the names and cleaned C, and keep the winner.
Dedup. Identical functions (same stable identity) are analyzed once and reused, the lever that makes 13k-function modules tractable.
Reversible. Every name and variable rename is logged to rename_history, so any change can be undone.
Live. Progress streams to agent_events (and, with --watch, to the terminal). The UI renders the raw and cleaned C side by side and lets you jump between functions.

The whole loop runs on the deterministic offline backend with no API key (a dry run of the bottom-up flow, context-drop, events, and storage). Set OPENROUTER_API_KEY and a real --backend to make the agents real. Results land in the function_analysis table and the symbol economy.

Call-graph strategy

By default, run_agent_pass walks the call graph bottom-up instead of running a single flat sweep. Pass --strategy flat to get the original behavior.

# Default: bottom-up call-graph walk (recommended)
warden agent v1 --strategy call-graph

# Original flat pass (leaves-first ordering, no concurrency within a layer)
warden agent v1 --strategy flat

The concurrency parameter (default 8) caps how many proposals are in-flight at once within a single layer. Set it programmatically via run_agent_pass(..., concurrency=N).

How the call-graph walk works

Build the call graph

build_call_graph(module) in warden.analysis.callgraph constructs a CallGraph with direct and indirect edges for every defined function. Direct call and return_call instructions are exact. call_indirect and return_call_indirect instructions carry only a type index at the static level, so their targets are over-approximated: every defined function in the module’s element table whose type matches the call’s type index is included as a potential callee. The resulting graph is a conservative static skeleton.

Condense recursion into layers

strongly_connected_components (iterative Tarjan) groups mutually recursive functions into SCCs. layered_schedule then condenses the SCC graph into a DAG and assigns a depth to each component: layer 0 holds leaves, and every later layer holds functions whose defined callees all appear in earlier layers. Mutual recursion lands in the same layer and is treated as a single unit. All traversals are sorted, so the schedule is deterministic.

Route to specialist proposers

Before processing any layer, the concurrency and struct analyzers run as autonomous crew proposers (the same passes that warden analyze runs). They do two things at once. First, specialist_proposals turns each finding into its own conservative, economy-gated proposal that goes through verify_proposal and kb.upsert_symbol like any other write:

A function whose distinctive evidence is an atomic read-modify-write gets the name atomic_rmw_site and a summary noting a synchronization primitive.
A function the struct analyzer attributes a layout to gets the name struct_accessor and a summary noting which field offsets it touches through a base pointer.

Second, _specialist_notes routes every finding into per-function hint lists that feed the namer:

Atomic sites from the concurrency analyzer produce notes such as "atomic i32.atomic.rmw.add at offset 8; likely a synchronization primitive".
Struct layouts from the struct analyzer produce notes describing which field offsets the function accesses through a base pointer.

The specialists stay conservative on purpose. Their proposals carry a low confidence (0.35), so the economy only ever lets a specialist name fill an empty slot; it never overrides a backend, Oracle, or human entry, and most specialist proposals are expected to be rejected. The same findings still appear in FunctionFacts.notes so the namer backend sees them when it proposes. Pass run_specialists=False to run_agent_pass to skip the specialist proposals (the notes still enrich the facts).

Enrich each function with callee names

When a function is about to be processed, _enrich looks up the KB names of all its direct defined callees and attaches them as FunctionFacts.callee_names. Because layers are processed bottom-up, the callees have already been named (or skipped) before the caller is reached. A backend that sees callee_names=["parse_header", "validate_checksum"] has far richer context than one that sees only raw opcodes.

Propose each layer concurrently

All functions in a layer are independent (no intra-layer edges by construction), so their proposals can safely run in parallel. _propose_concurrently uses asyncio.gather with a semaphore capped at concurrency. Backends that block (every current backend) are dispatched via asyncio.to_thread so the event loop stays responsive. A single-function layer skips the async path entirely and calls backend.propose directly.

Write back under the economy

Proposals from each layer go through the same verify_proposal gate and kb.upsert_symbol call as the flat pass. Because functions in the same layer cannot be each other’s callees, concurrent branches in one layer never share a callee that is being written at the same time. The KB’s provenance/confidence economy rejects any write that would overwrite a higher-confidence or locked entry, so concurrent branches are safe.

FunctionFacts fields added by the call-graph strategy

The call-graph strategy attaches two fields that the flat pass leaves empty:

Field	Type	Source
`callee_names`	`list[str]`	KB names of defined callees, looked up after each prior layer is written
`notes`	`list[str]`	Per-function hints from the concurrency and struct analyzers

Both fields are part of the FunctionFacts dataclass and are forwarded to the backend as additional context in the user message.

Multi-round naming to a fixpoint

One bottom-up pass names callees before callers, but it cannot improve a caller that was processed before a later round gave its callees better names. run_agent_pass takes a rounds argument (default 1) that repeats the whole bottom-up walk until the names stop changing.

# Run up to three bottom-up rounds, stopping early at a fixpoint
warden agent v1 --strategy call-graph --rounds 3

# The same thing programmatically
run_agent_pass(kb, module, version_id, rounds=3)

Each round walks every layer again. A function that was named in round one with thin context can be re-proposed in round two now that its callees carry real names, and the economy still gates the write so a stronger prior name is never lost. The loop ends as soon as a round produces no new write, which is the fixpoint, or when rounds is reached, whichever comes first. Because the walk, the analyzers, and the offline backend are all deterministic, the fixpoint is reproducible: the same module and the same rounds cap always converge to the same names with no wall clock and no randomness. Set rounds=1 (the default) to keep the single-pass behaviour.

When to use each strategy

Use --strategy call-graph (the default) for any module where naming quality matters. The bottom-up order means callers are named in light of what their callees do, which is the main quality improvement over a flat pass. Use --strategy flat when you want a quick, fully sequential sweep, for example in CI environments where deterministic single-threaded output is easier to diff, or when debugging the backend in isolation.

The verifier gate

verify_proposal(proposal, facts) in crew.py sits between the backend’s output and the KB. It returns (accepted, reason). Currently it performs cheap structural checks:

The name must match ^[A-Za-z_][A-Za-z0-9_]*$ (valid C identifier).
The name must be at least two characters.
Confidence must be in [0.0, 1.0].
A summary that claims string evidence must be backed by non-empty facts.referenced_strings.

This is intentionally minimal: it catches obviously broken output without requiring a toolchain. The function’s signature is the plug-in point for the full behavioral verifier described in the design. That verifier uses differential re-execution via wasm2c, where a lifted C reconstruction is recompiled and executed against the original WASM under a fuzzer corpus. warden verify <wasm> reports whether the current environment has the toolchain needed to activate it.

The behavioral verifier (wasm2c differential re-execution) is scaffolded but not yet active. The verify_proposal call site is where it plugs in when a C toolchain is available.

The provenance/confidence economy

Every write to the KB carries three fields that together make re-running the entire crew safe. The full economy is explained in core concepts; here is how the agent crew interacts with it.

provenance is set to "agent" for every crew write. This places agent output at the lowest authority tier, below human, Oracle, export, and string-xref entries.
confidence is the calibrated score returned by the backend. The offline backend emits 0.45, 0.30, or 0.12 depending on which heuristic fired. LLM backends are instructed to self-calibrate and their output is clamped to [0.0, 1.0].
locked is never set by the crew. Only warden set-name (human writes) sets locked=True, which makes an entry immutable to every automated actor.

The crew enforces the economy at two points:

Before the backend is called: entries at confidence >= 0.5 or marked locked are skipped. The threshold SKIP_CONFIDENCE = 0.5 is the boundary between “confident enough to leave alone” and “fair game.”
After the verifier passes: kb.upsert_symbol enforces that an agent proposal may only land if no higher-confidence agent entry (or any higher-authority entry) already exists. The result is counted as rejected_by_economy and no write happens.

The practical effect: running warden agent on a module that already has Oracle matches and a prior agent pass at confidence 0.45 will re-propose only the functions that are still below threshold, and only overwrite those where the new proposal is stronger.

Specialized analyzers

Beyond the naming crew, WARDEN ships two deterministic analyzers that populate first-class KB facts with no LLM and no API key. They cover the concurrency and type/struct roles from the intended crew architecture and run together under a single command:

warden analyze <label>

Both passes persist their findings to the KB immediately. Re-running is idempotent: the KB upsert semantics apply the same provenance/confidence economy as any other write.

Concurrency analyzer

warden.analysis.concurrency.analyze_concurrency(module, kb, version_id) recovers the thread model from three byte-level fossils that survive Emscripten stripping:

Signal	What it means
Shared memory flag	The WASM `limits` field has the `shared` bit set (atomics require it)
Atomic opcodes (`0xFE` family)	Every `rmw`, `cmpxchg`, `wait`, `notify`, or `fence` instruction is an atomic site
pthread-named imports/exports	`pthread_`, `emscripten_thread`, `_emscripten_proxy*`, `atomic`-tagged helpers surviving in the symbol table

The pass returns a ConcurrencyReport with .shared_memory, .atomic_sites, .pthread_markers, and .facts. When a KB and version ID are supplied, each atomic site is written to the thread_model table via kb.add_thread_fact as kind='atomic', with the memarg offset as the best-effort “guarded data” pointer and a confidence of 0.6. This is high enough to be a fact but below the human/Oracle tier, because the exact guarded data is a best-effort guess.

A module is considered multithreaded when any of the three signals is present. Shared memory or atomic opcodes are conclusive; pthread-named symbols are a weaker hint (a module may import them without actually spawning threads), but they are still recorded.

Struct-layout analyzer

warden.analysis.structs.analyze_structs(module, kb, version_id) reconstructs candidate struct shapes from memory-access patterns. Emscripten compiles a C struct field access into a recognizable two-instruction sequence:

local.get N          # push base pointer
i32.load offset=K    # dereference at fixed displacement K

The pass walks every defined function looking for exactly this adjacency. Each unique (base local, offset) pair is one candidate field; multiple accesses to the same offset are deduped. Fields are grouped by base local into a StructLayout named <func>_arg<N>_t, and the recovered fields are sorted by offset so the output is deterministic. Each StructLayout has a .name, .fields (a list of StructField(offset, size, type, name)), and .source_function. When a KB and version ID are supplied, every layout is persisted via kb.upsert_struct at provenance agent, confidence 0.5, so the recovered shapes become queryable KB facts and carry forward on the next ingest.

The struct analyzer is deliberately conservative: it only fires on the literal local.get → load/store adjacency. Non-trivial address arithmetic (pointer arithmetic, GEP chains) is not modeled. This keeps the results deterministic and avoids false positives at the cost of recall.

Running both passes

warden analyze <label> runs both analyzers in sequence and prints a summary:

Concurrency: shared_memory=True  atomic_sites=14  pthread_markers=6
Structs:     layouts=9

The KB is updated atomically. If either pass fails, no partial facts are written for that pass. Both sets of facts are then available to the naming crew (the next warden agent run sees thread_model and structs entries when assembling FunctionFacts) and to the HTML report generator.

Intended crew architecture

The current implementation runs a single generic naming pass. The target architecture from the design is a crew of specialized agents, each owning a distinct domain:

Oracle agent

Adjudicates fuzzy Oracle matches and attaches upstream Emscripten/musl source links to matched symbols.

Concurrency agent

Owns atomic/lock/TLS analysis; labels lock-to-guarded-data relationships and worker entry points discovered via dynCall/elem tables. Implemented as a deterministic pass in warden.analysis.concurrency. warden analyze runs it and persists findings to the thread_model table. It also runs inside the call-graph crew as an autonomous proposer that emits conservative, economy-gated names and notes for the namer.

Type/struct agent

Reconstructs struct layouts from memory access patterns and propagates types to callers. Implemented as a deterministic pass in warden.analysis.structs. warden analyze runs it and persists findings to the structs table. It also runs inside the call-graph crew as an autonomous proposer that emits conservative, economy-gated names and notes for the namer.

Naming/summarization agent

Proposes human-readable names and writes pseudocode summaries. This is what the current implementation does.

Diff agent

On each new version, explains modified functions and writes the semantic changelog.

Verifier agent

Builds differential test harnesses and triages mismatches between the WASM and its reconstruction.

Each specialized agent would expose the KB and verifier as MCP tools via warden mcp, so any MCP-capable model can drive the loop. The current warden agent command is the starting spine for this architecture.

Relation to other pipeline stages

Oracle identification runs before the agent crew and pre-populates the KB with high-confidence names for runtime/libc functions. In a real Emscripten module, 40–80% of functions may already be named by the time the crew runs. Those are skipped, so the crew concentrates effort on the application-specific remainder.
Diff carry-over runs on ingest of a new version and ports annotations from the previous version. After carry-over, only genuinely changed or new functions are below threshold, so the crew only touches what actually needs attention.
warden demo runs the full pipeline end-to-end offline and shows all three stages feeding each other (Oracle → agent (offline) → diff carry-over), with no API key.

Getting started

The pipeline

Reference

Project

The agent crew

The core loop: propose → verify → write-back

FunctionFacts: the hallucination constraint

The backends

Offline heuristic backend

OpenRouter / Kimi backend (the cheap default)

OpenAI / Codex backend

Anthropic backend

Backend selection

Per-role specialist routing

Running the agent crew

Deep analysis: one agent per function

Call-graph strategy

How the call-graph walk works

FunctionFacts fields added by the call-graph strategy

Multi-round naming to a fixpoint

When to use each strategy

The verifier gate

The provenance/confidence economy

Specialized analyzers

Concurrency analyzer

Struct-layout analyzer

Running both passes

Intended crew architecture

Relation to other pipeline stages

​The core loop: propose → verify → write-back

​FunctionFacts: the hallucination constraint

​The backends

​Offline heuristic backend

​OpenRouter / Kimi backend (the cheap default)

​OpenAI / Codex backend

​Anthropic backend

​Backend selection

​Per-role specialist routing

​Running the agent crew

​Deep analysis: one agent per function

​Call-graph strategy

​How the call-graph walk works

​FunctionFacts fields added by the call-graph strategy

​Multi-round naming to a fixpoint

​When to use each strategy

​The verifier gate

​The provenance/confidence economy

​Specialized analyzers

​Concurrency analyzer

​Struct-layout analyzer

​Running both passes

​Intended crew architecture

​Relation to other pipeline stages

The core loop: propose → verify → write-back

FunctionFacts: the hallucination constraint

The backends

Offline heuristic backend

OpenRouter / Kimi backend (the cheap default)

OpenAI / Codex backend

Anthropic backend

Backend selection

Per-role specialist routing

Running the agent crew

Deep analysis: one agent per function

Call-graph strategy

How the call-graph walk works

FunctionFacts fields added by the call-graph strategy

Multi-round naming to a fixpoint

When to use each strategy

The verifier gate

The provenance/confidence economy

Specialized analyzers

Concurrency analyzer

Struct-layout analyzer

Running both passes

Intended crew architecture

Relation to other pipeline stages