Not sure where to start? See the roadmap for the planned phases and their
current status, or open an issue on GitHub and ask.
The maintainers are happy to point you at something appropriately scoped.
Dev setup
Python 3.10 or later is required. The core path has no native dependencies.Install with dev extras
make dev is equivalent to pip install -e '.[all]' followed by pre-commit install. The
[all] group is shorthand for [agents,mcp,dev]:| Extra | What it adds |
|---|---|
agents | openai>=1.68, anthropic>=0.40 (LLM crew; falls back to offline heuristic without a key) |
mcp | mcp>=1.2 (Model Context Protocol tool surface) |
dev | pytest, pytest-cov, ruff, mypy, pre-commit |
pip install -e . is
sufficient. The hard dependencies are just typer and rich.Pre-commit hooks
make dev installs pre-commit hooks that run ruff, ruff-format, mypy, and a set of
standard file-health checks (trailing whitespace, YAML/TOML syntax, merge-conflict markers, a
binary-file size guard at 512 KB) on every commit. If a hook rejects your commit, fix the
reported issue and re-stage. Do not skip hooks with --no-verify.
Development workflow
make help to see all targets. make check is the gate: CI runs the same target, so
passing it locally means your PR will be green.
Project layout
Each package undersrc/warden/ maps to one conceptual pipeline stage. New functionality
should go in the matching package; cross-cutting utilities go in the nearest sensible parent.
Testing
The suite is intentionally runnable on a bare checkout with no native toolchain installed.The samples fixture (no toolchain needed)
src/warden/samples.py is a pure-Python .wasm emitter. It produces three related modules
(reference_module(), app_v1(), app_v2()) that exercise the entire pipeline end-to-end:
ingest, fingerprinting, Oracle match, diff, and carry-over. These are generated at test time in
memory. No Emscripten, WABT, or wasm-tools is required.
Shared conftest fixtures
tests/conftest.py exposes the following fixtures. Use these rather than creating ad-hoc
modules in individual test files.
| Fixture | Type | Provides |
|---|---|---|
reference_wasm | bytes | Labeled runtime module with a name section |
app_v1_wasm | bytes | Stripped v1 target |
app_v2_wasm | bytes | v2 target with one modified and one new function |
demo_glue | str | Sample Emscripten JS glue |
sample_dir | dict[str, Path] | All artifacts written to tmp_path |
kb | KnowledgeBase | Isolated in-memory-equivalent DB in tmp_path, auto-closed |
Writing tests
- Every non-trivial change should come with a test. PRs that add behavior without tests will be asked to add them before merge.
- Name test files
test_<module>.py, mirroring the package they cover. - Use pytest’s built-in
tmp_pathfixture for any filesystem work; never write into the repo tree. - Tests that require optional extras (
openai,anthropic,mcp) must be guarded withpytest.importorskip("openai"),pytest.importorskip("anthropic"), or an equivalent skip so CI passes without those extras. - Avoid network calls in tests. If a test genuinely requires them, mark it with a custom marker and document that it is skipped in standard CI.
Code style
Style is enforced by ruff (lint and format) and mypy (static types).make fmt applies
ruff’s auto-fixes; make check runs both tools and treats failures as errors.
Key ruff settings from pyproject.toml:
| Setting | Value |
|---|---|
| Line length | 100 |
| Target | Python 3.10 |
| Selected rule sets | E, F, I (isort), UP, B, W |
| Ignored | E501 (long lines tolerated in data/tables), B008 (Typer’s Argument/Option-in-defaults pattern is intentional) |
mypy is configured with
ignore_missing_imports = true and warn_unused_ignores = true. Match the annotation density
of the file you are editing. If the surrounding code is fully annotated, your additions must
be too.
Docstrings follow the same density rule: match what is already in the module. Most public
functions have a one-line summary; complex functions include a short description of parameters
and return value. Do not add docstrings to private helpers that lack them in existing code.
from __future__ import annotations appears at the top of every source file; keep that
convention.
The two invariants
Any change that touchesidentity/ or kb/ must preserve these properties.
Stable identity determinism
Stable identity determinism
The
stable_id computed for a given function body must be identical across Python
interpreter runs, platforms, and WARDEN versions. It is the primary key of the knowledge
base: if it drifts, every annotation stored under the old key becomes orphaned.If you change the fingerprint algorithm in a way that would alter existing stable_id
values, you must write a migration in kb/ and bump the schema version. Include a test
that verifies the new algorithm produces the same stable_id for all fixtures in
tests/conftest.py (or documents the intentional remap if a migration is present).The provenance economy
The provenance economy
Every write to the
symbols table must go through KnowledgeBase.upsert_symbol. That
method enforces the rank/confidence ordering described in concepts: human writes
are sovereign; agents may only fill empty slots or overwrite lower-confidence agent output;
other automated sources resolve by rank and then confidence.Do not insert or update the symbols table directly. A PR that bypasses upsert_symbol
will not merge.PR etiquette
Branch from main
Use a short, descriptive branch name:
feat/diff-carry-weights, fix/leb128-signed-overflow, docs/mcp-tool-surface.Keep commits focused
One logical change per commit. The commit message subject should complete the sentence
“This commit…” and use the imperative mood. Squash fixup commits before opening the PR.
Pass make check
Run
make check locally before pushing. CI runs the same target (ruff check, mypy,
and pytest), so a local green means a CI green.Include tests and docs
PRs that add behavior without tests will be asked to add them before merge. Bug-fix PRs
should add a regression test. If your change affects a CLI flag, a KB schema column, a
public API, or a documented pipeline stage, update the relevant file in
docs/ or the
inline help strings in cli.py.Where to start
Good first contributions map to the early phases in the roadmap.Parser completeness
src/warden/ingest/ handles the common section types. Edge-case sections, import table
coverage, and data segments are good targets that do not require architecture changes.Oracle corpus expansion
Adding Emscripten runtime signatures to
seed_signatures.json in src/warden/oracle/ is
high-value work with a clear interface and no architectural dependency.Diff carry-over heuristics
The similarity classifier in
src/warden/identity/ has room for better structural and
type-sensitive weights. Improvements there sharpen both the Oracle and the diff.Export formats
src/warden/export/ accepts new emitters: a Ghidra script export, DWARF-like annotation
output, or a structured JSON schema for third-party tooling.Test coverage
make cov shows what is not yet exercised. Any module below 80% line coverage is a
reasonable, self-contained target.Documentation
docs/LIMITATIONS.md tracks known gaps. Fixing a limitation and removing it from that list
is a clean, contained contribution.