MCP server

The WARDEN MCP server exposes the knowledge base and agent crew as a set of tools. Any Model Context Protocol client can call them: Claude, Cursor, Continue, a custom agent loop, or anything else that speaks the protocol. The design follows the GhidraMCP pattern: a thin, tool-per-operation surface where reads are always safe and writes go through the same provenance/confidence economy as every other WARDEN write path. An agent calling propose_symbol through MCP cannot clobber a human-verified name any more than an agent calling the library directly can.

The MCP server is an optional dependency. Nothing in the rest of WARDEN requires it. If the mcp package is absent, warden mcp prints a clear error and exits. No other command is affected.

Installation

pip install warden-re[mcp]

This adds mcp>=1.2 (the official MCP Python SDK). To install WARDEN with every optional dependency at once:

pip install warden-re[all]

Starting the server

warden mcp

By default the server opens warden.db in the current directory. To point it at a specific project database:

warden mcp --db /path/to/project.db

The server speaks MCP over stdio (stdin/stdout). There is no HTTP or SSE transport yet; it must run as a subprocess managed by the client.

Wiring into an MCP client

Most MCP clients accept a server block in a JSON configuration file. The exact key name varies by client. The pattern is the same: point at the warden binary, pass mcp as the subcommand, and optionally pass --db:

{
  "mcpServers": {
    "warden": {
      "command": "warden",
      "args": ["mcp", "--db", "/path/to/project.db"]
    }
  }
}

{
  "servers": [
    {
      "name": "warden",
      "transport": "stdio",
      "command": ["warden", "mcp", "--db", "/path/to/project.db"]
    }
  ]
}

The server registers itself with FastMCP under the name "warden", which is how it appears in the client’s tool namespace.

Exposed tools

All tools are safe to call concurrently. Reads carry no side effects. Writes are economy-gated (see how the economy works).

agent_backends()

List available agent backends, aliases, required credentials, and default models. Use this before calling run_agent_pass with backend: "auto".Inputs: none.Returns: array of objects, one per backend.

Field	Type	Description
`name`	string	User-facing selector (`offline`, `openai`, `anthropic`).
`backend`	string	Internal backend name written into evidence.
`aliases`	array	Optional alternate selectors, such as `codex` and `oai`.
`available`	bool	Whether the SDK and key are present in the server environment.
`requires`	array	Required packages and environment variables.
`default_model`	string \| null	Default model for the backend. OpenAI defaults to `gpt-5.3-codex`.

list_versions()

List all ingested module versions in the project. Call this first to discover the version_id values needed by every other tool.Inputs: none.Returns: array of objects, one per version.

Field	Type	Description
`id`	int	Internal version ID. Pass this to `coverage`, `list_functions`, etc.
`label`	string	Human label (e.g. `"v1"`, `"v2"`).
`emscripten_version`	string \| null	Inferred Emscripten version, if detected.
`functions`	int	Total function count (imported + defined).
`shared_memory`	bool	Whether shared memory was detected in the module.

Example response:

[
  {
    "id": 1,
    "label": "v1",
    "emscripten_version": "3.1.55",
    "functions": 412,
    "shared_memory": false
  }
]

coverage(version_id)

Symbol-coverage statistics for a specific version. Use this to decide whether to run an agent pass or how much human review remains.Inputs:

Parameter	Type	Description
`version_id`	int	From `list_versions`.

Returns: a single object.

Field	Type	Description
`defined`	int	Number of defined (non-imported) functions.
`named`	int	Number that have any name binding.
`coverage_pct`	float	`named / defined * 100`.
`oracle_named`	int	Names contributed by the Emscripten Oracle.
`human_named`	int	Names set by a human operator.
`agent_named`	int	Names proposed by an agent (including via MCP).

Example response:

{
  "defined": 380,
  "named": 201,
  "coverage_pct": 52.9,
  "oracle_named": 148,
  "human_named": 12,
  "agent_named": 41
}

list_functions(version_id, include_imports)

List every function in a version with its current annotation state. To find work remaining for an agent pass, filter for name == null or confidence < threshold.Inputs:

Parameter	Type	Default	Description
`version_id`	int	required	Version to query.
`include_imports`	bool	`false`	Include imported (host-provided) functions.

Returns: array of objects, one per function.

Field	Type	Description
`index`	int	Wasm function table index.
`stable_id`	string	Stable content identity. Use this as the key to `get_symbol` and `propose_symbol`.
`type`	string \| null	Wasm type signature (e.g. `"(i32, i32) -> i32"`).
`name`	string \| null	Current name, or null if unnamed.
`provenance`	string \| null	Who assigned the name: `oracle`, `human`, `agent`, `export`, `diff-carry`.
`confidence`	float \| null	Confidence score (0.0–1.0), or null if unnamed.

stable_id is the cross-version key. The same logical function carries the same stable_id across rebuilds even when its table index shifts. See core concepts for how stable identity is computed.

get_function_facts(version_id, func_index)

Fetch the grounded FunctionFacts object for one defined function. Use this when an external MCP client wants to produce its own proposal while staying constrained to evidence from the binary and KB.Inputs:

Parameter	Type	Description
`version_id`	int	From `list_versions`.
`func_index`	int	Wasm function table index.

Returns: null if the function is missing or imported, otherwise:

Field	Type	Description
`func_index`	int	Wasm function table index.
`stable_id`	string	Stable function identity.
`type_signature`	string	Wasm type signature.
`call_targets`	array	Direct import call targets and `<indirect>` sites.
`referenced_strings`	array	Strings referenced through decoded constants.
`raw_name`	string \| null	Name-section hint, if present.
`instruction_mnemonics`	array	Decoded opcode mnemonics.
`is_exported`	bool	Whether the function is exported.

get_symbol(stable_id)

Fetch the complete annotation record for a single function by its stable identity. Returns null if no annotation exists yet.Inputs:

Parameter	Type	Description
`stable_id`	string	From `list_functions`.

Returns: null, or a single object.

Field	Type	Description
`stable_id`	string	Echo of the input key.
`name`	string \| null	Current name.
`summary`	string \| null	Free-text description of purpose, parameters, and behaviour.
`type_signature`	string \| null	Wasm type signature.
`provenance`	string	Source of the annotation (`oracle`, `human`, `agent`, `export`, `diff-carry`).
`confidence`	float	Confidence score (0.0–1.0).
`locked`	bool	If true, only a human operator can overwrite this entry.
`evidence`	array	Evidence trail: a list of objects describing why this name was assigned.

locked: true means a human called warden set-name with the default --lock flag. propose_symbol will refuse to write to a locked symbol. The response will have written: false.

search_symbols(query, limit)

Search every version for function symbols whose name matches a substring. Use this to locate a function by name across the whole project without scanning each version’s list_functions payload. The match is case-insensitive and results are sorted by name. This is a read; it has no side effects.Inputs:

Parameter	Type	Default	Description
`query`	string	required	Case-insensitive substring matched against the symbol name.
`limit`	int	`50`	Maximum number of matches to return.

Returns: array of objects, one per matching symbol, sorted by name.

Field	Type	Description
`stable_id`	string	Stable function identity.
`name`	string	Current name (only named symbols are matched).
`provenance`	string	Source of the name (`oracle`, `human`, `agent`, `export`, `diff-carry`).
`confidence`	float	Confidence score (0.0–1.0).

get_diff(from_version_id, to_version_id)

Compare two ingested versions and report how functions map across the rebuild. Use this after ingesting a new module drop to see what carried over, what changed, and what is new before you run another agent pass. It returns the stored diff report if one was saved; otherwise it computes the diff on the fly with carry=False and store=False, so this stays a pure read with no side effects.Inputs:

Parameter	Type	Description
`from_version_id`	int	Older version (the baseline).
`to_version_id`	int	Newer version to compare against the baseline.

Returns: a single object (the stored or freshly computed diff report).

Field	Type	Description
`from`	string	Label of the baseline version.
`to`	string	Label of the newer version.
`summary`	object	Per-classification counts (unchanged, modified, added, removed, runtime churn).
`changes`	array	One entry per function change, with its classification, indices, stable ids, score, review flag, runtime flag, and any carried name.

export_kb_text(version_id)

Export the knowledge base for a version as a single plain-text deliverable. Use this to hand a model or a human a readable dump of every named function, its summary, provenance, and confidence without parsing structured rows. This is a read; it has no side effects.Inputs:

Parameter	Type	Description
`version_id`	int	Version to export.

Returns: a single string: the full plain-text export, one row per function, with a header line and a leading # WARDEN KB export comment. It is git-diffable and stable across runs.

analyze_version(version_id)

Run the deterministic concurrency and struct analyzers over a version, the same passes that warden analyze runs. The recovered thread model and struct layouts are written to the KB, so this tool performs economy-gated writes and is not a pure read. It needs no API key and no network.Inputs:

Parameter	Type	Description
`version_id`	int	Version to analyze.

Returns: a single object with the persisted facts as JSON-friendly lists.

Field	Type	Description
`thread_facts`	array	One object per recorded thread-model fact (atomic sites and the like) read back from the KB.
`structs`	array	One object per recovered struct layout, each with its `name`, `source_function`, and `fields` (offset, size, type, name).

The atomic sites and struct layouts written here go through the same provenance/confidence economy as every other write, so re-running analyze_version is idempotent and never overwrites a higher-confidence or locked entry.

run_agent_pass(version_id, backend, only_unconfident, rounds)

Run the same propose → verify → write-back pass that warden agent runs. This lets an MCP client trigger WARDEN’s built-in offline, OpenAI/Codex, or Anthropic backend instead of reimplementing the loop.Inputs:

Parameter	Type	Default	Description
`version_id`	int	required	Version to annotate.
`backend`	string	`"auto"`	`auto`, `offline`, `openai`, `codex`, `oai`, or `anthropic`.
`only_unconfident`	bool	`true`	Skip symbols that are locked or already at confidence `>= 0.5`.
`rounds`	int	`1`	Number of bottom-up rounds to run. Each later round re-enriches facts so a function named earlier informs its callers; the loop stops early at a fixpoint.

Returns: one summary object.

Field	Type	Description
`backend`	string	Backend that actually ran.
`considered`	int	Functions visited.
`proposed`	int	Proposals returned by the backend.
`written`	int	Proposals written to the KB.
`rejected_by_verifier`	int	Proposals blocked by `verify_proposal`.
`rejected_by_economy`	int	Proposals blocked by `KnowledgeBase.upsert_symbol`.
`skipped_existing`	int	Locked or already-confident functions skipped before backend call.
`specialist_proposed`	int	Specialist (atomics, struct) proposals that went through the verifier and economy.
`specialist_written`	int	Specialist proposals that the economy actually wrote.
`rounds_run`	int	How many bottom-up rounds actually ran (may be fewer than `rounds` at a fixpoint).
`details`	array	Per-function write, verifier reject, or economy reject details.

Provider-backed runs use the server process environment. Set OPENAI_API_KEY for the OpenAI/Codex backend or ANTHROPIC_API_KEY for the Anthropic backend before the MCP client starts the server.

propose_symbol(stable_id, name, summary, confidence)

Propose a name and optional summary for a function. This is the only write tool. It always records provenance: "agent" and actor: "agent:mcp" in the evidence trail. These values are injected by the server and cannot be supplied by the caller.Inputs:

Parameter	Type	Default	Description
`stable_id`	string	required	Target function’s stable identity.
`name`	string	required	Proposed function name.
`summary`	string	`""`	Optional description of purpose, parameters, and behaviour.
`confidence`	float	`0.5`	Agent’s self-assessed confidence (0.0–1.0).

Returns: a single object.

Field	Type	Description
`written`	bool	Whether the KB was updated.
`reason`	string	Human-readable explanation of the decision.

Example: accepted write

{ "written": true, "reason": "no prior annotation; slot was empty" }

Example: rejected write

{ "written": false, "reason": "symbol is locked by a human operator" }

The provenance/confidence economy

propose_symbol goes through exactly the same economy gate as every other write path in WARDEN. The CLI, the library, and the agent pipeline all converge on KnowledgeBase.upsert_symbol. There is no separate bypass for MCP. The authority ordering from highest to lowest is:

human (locked)  >  human (unlocked)  >  oracle  >  agent  >  diff-carry  >  (unnamed)

The rules propose_symbol must satisfy:

Unnamed slots are always accepted

If no annotation exists yet, the write goes through unconditionally.

A locked human annotation blocks every agent write

written: false, regardless of confidence.

Any human annotation (unlocked) blocks every agent write

Agents sit below humans in the hierarchy.

An oracle annotation blocks an agent write unless the incoming confidence exceeds the existing score

An agent that is very confident about a name can displace a weak oracle match, but this is intentionally rare.

An existing agent annotation is overwritten only if the incoming confidence is strictly higher

Running the same agent crew twice on the same KB converges; it does not thrash.

This means you can run an MCP-driven agent crew repeatedly (on every version drop, for instance) without requiring a human gating step. Work that humans or the Oracle have already done is never regressed.

See core concepts for the full rationale behind the provenance/confidence economy.

Library usage

If you want to embed the server inside a larger Python process rather than launching it as a subprocess, use build_server directly:

from warden.mcp import build_server

server = build_server("project.db")
server.run()   # blocks; serves over stdio

build_server raises RuntimeError with an actionable message if the mcp package is not installed, and FileNotFoundError if the project database does not exist.

Limitations (alpha)

The MCP server is alpha. The tool surface is functional but incomplete.

Limitation	Status
stdio only. No HTTP or SSE transport. The server must run as a subprocess managed by the client.	Current
Function-oriented. Struct layouts are surfaced read-only through `analyze_version`, but globals and memory-region annotations are not yet exposed as MCP tools.	Planned
No pagination. `list_functions` returns all rows in one response; large modules may produce large payloads.	Planned
Per-call connections. Each tool call opens and closes a fresh `KnowledgeBase` connection. Safe for concurrent callers; not optimized for high-throughput loops.	Planned

These are tracked in roadmap.

Getting started

The pipeline

Reference

Project

Installation

Starting the server

Wiring into an MCP client

Exposed tools

The provenance/confidence economy

Library usage

Limitations (alpha)

​Installation

​Starting the server

​Wiring into an MCP client

​Exposed tools

​The provenance/confidence economy

​Library usage

​Limitations (alpha)

Installation

Starting the server

Wiring into an MCP client

Exposed tools

The provenance/confidence economy

Library usage

Limitations (alpha)