Fix pipeline — deterministic test-suite reorganiser
The axm-audit fix subcommand and the audit_fix MCP tool
(AuditFixTool in
axm_audit.tools.audit_fix) drive a deterministic pipeline that
batch-relocates, splits, merges and renames test files to satisfy the
AXM 3-level pyramid + canonical naming conventions. It consumes findings from three rules:
TEST_QUALITY_PYRAMID_LEVEL(unit vs integration vs e2e)TEST_QUALITY_FILE_NAMING(test_{symA}__{symB}.pyfor integration/e2e)TEST_QUALITY_NO_PACKAGE_SYMBOL(out-of-pipeline; surfaced for manual review)
The pipeline runs in dry-run by default and mutates only on --apply.
Pipeline architecture
0.5 NON-CANONICAL-RELOCATE tests/functional/* → tests/integration/
0. FLATTEN heterogeneous Test* classes → top-level funcs
1. RELOCATE PYRAMID_LEVEL mismatch → git mv across tiers
1.5 FLATTEN_LAYOUT tests/<tier>/<subdir>/ → flat layout
2. SPLIT FILE_NAMING verdict=SPLIT → anvil moves units
3. COLLIDE / MERGE FILE_NAMING verdict=COLLIDE → anvil moves units
4. RENAME FILE_NAMING verdict=NAME_MISMATCH → git mv
The whole pipeline runs inside a fixed-point loop (MAX_ITERATIONS=6),
since each mutation can expose new findings the audit could not see on
the previous iteration. Iteration stops early when a pass emits zero ops.
In dry-run mode the loop runs exactly once (no mutation).
NO_PACKAGE_SYMBOL findings are out-of-pipeline — the verdict is
context-dependent (legitimate formal check vs. candidate for deletion)
and surfaced in a separate report section pointing the user to
/scenario-rename or manual inspection.
Module layout (hexagonal split)
The applicator lives at src/axm_audit/core/fix/, 13 modules organised by
hexagonal layer:
core/fix/
├── __init__.py — public API: run, format_report, PipelineReport, FileOp
├── models.py — FileOp, OpKind, PipelineReport + constants
│ (NON_DETERMINISTIC_RULES, CANONICAL_TIERS, MAX_ITERATIONS, TOP_K)
├── io_primitives.py — _cst_load/save/top_level/unwrap + _git_mv
├── paths.py — _tier_for_path, _retier, _safe_filename,
│ _module_path_for_test_file, _file_depth_from_project
├── tests_ast.py — read-only AST: tests, classes (pathological detection),
│ helpers, markers (usefixtures), imports analysis
├── cst_rewrite.py — write CST: flatten class, rename, delete, reorder,
│ depth patch (__file__), imports (insert/dedupe/
│ backfill) + project import index cache
├── findings.py — audit ingestion, canonical filename, collect_unfixable
├── layout_and_move.py — relocate_non_canonical_tiers (Stage 0.5),
│ flatten_tier_layout (Stage 1.5),
│ _rewrite_cross_test_imports,
│ _safe_move_units (wraps anvil),
│ _resolve_helper_conflicts / _resolve_conftest_shadowing
├── stages_plan.py — plan_flatten / plan_relocate / plan_naming (pure)
├── stages_execute.py — _execute_flatten/_relocate/_rename/_split/_merge
│ + execute() dispatcher
├── extract_helpers.py — post-pipeline helper extraction to
│ tests/<tier>/_helpers.py or conftest.py
├── pipeline.py — run() + fixed-point loop + _ruff_format_tests
└── report.py — format_report CLI output
Dependency layers
report → models
pipeline → models, stages_plan, stages_execute, extract_helpers,
layout_and_move, cst_rewrite, findings
extract_helpers → cst_rewrite, io_primitives, paths, tests_ast
stages_execute → cst_rewrite, findings, io_primitives, layout_and_move,
models, paths, tests_ast
stages_plan → findings, models, paths, tests_ast
layout_and_move → cst_rewrite, io_primitives, models, paths, tests_ast
findings → models, paths, tests_ast, (lazy: stages_plan)
cst_rewrite → io_primitives, paths, tests_ast
tests_ast → (stdlib only)
paths → (stdlib only)
io_primitives → libcst
models → (stdlib only)
One lazy cycle: findings.collect_unfixable → stages_plan.plan_flatten
(needed to surface pathological FILE_NAMING cases the pipeline can't
auto-fix).
Real-world surprises (documented in the code)
-
Anvil's
rename=param doesn't help cross-file collisions — it validates target absence under the original name before applying the rename. The pipeline works around it by renaming the symbol in source first (via_rename_top_level_in_sourceincst_rewrite.py), then handing anvil a clean conflict-free move. -
audit_project()cache breaks after in-flight mutations — calling it post-apply raisesFileNotFoundErroron cached paths.collect_unfixable()infindings.pyswallows the exception defensively. -
if TYPE_CHECKING:imports are invisible to anvil —MockerFixtureimported only inside that block is treated as not-imported when the symbol using it (only via type annotation) is moved._backfill_missing_imports()incst_rewrite.pywalksif TYPE_CHECKING:blocks and reproduces the wrapper at the target.
Bug-class history
The pipeline went through 14 iteration passes during productisation. The four resolved bug classes worth remembering:
B1 — pathological-AND-heterogeneous Test* classes
A class with divergent canonicals AND a feature blocking flatten
(self.<attr>, custom base, __init__) survives Stage 0 silently and
breaks Stage 2 SPLIT (which can only route 4 of 5 canonical targets).
_file_has_pathological_class() (tests_ast.py) is a cheap pre-check
used by plan_naming (stages_plan.py) to skip SPLIT planning for
affected files. Pathological cases now surface as collect_unfixable
entries pointing to /scenario-rename.
B2 — SPLIT/MERGE/RENAME planned for non-canonical tier paths
Planner emitted ops for tests/functional/test_X.py but executor
skipped them silently. plan_naming (stages_plan.py) now filters
so SPLIT/COLLIDE/RENAME only land on canonical paths; non-canonical
files are routed through Stage 0.5 RELOCATE first.
B3 — convergence (unanimity rule)
The fixed-point loop oscillated on files containing tests with mixed
tier verdicts (one integration + one unit). plan_relocate used
to skip cur == lvl findings and only count mismatches, so when the
file lived in integration/ it saw "1 finding → unit", relocated,
then on the next pass saw "1 finding → integration", and oscillated
forever.
Resolution: plan_relocate (stages_plan.py) now counts every test's
target level (including cur == lvl ones). A file is relocated only
when all tests agree on a single target distinct from the current
tier. Mixed files are left for manual /scenario-rename or hand-splitting.
Conservative on purpose: sacrifices some auto-resolution for guaranteed
convergence.
B4 — non-canonical tier directories ignored
Files under tests/functional/ (or any non-canonical tier) were tier-less,
so SPLIT/MERGE/RENAME silently refused them. relocate_non_canonical_tiers()
in layout_and_move.py now runs as Stage 0.5 (before Stage 1 RELOCATE)
and moves every non-canonical tier file into tests/integration/.
Subsequent stages then see only canonical paths. CANONICAL_TIERS in
models.py documents the allow-list (unit, integration, e2e).
tests/fixtures/ is surgically excluded via the private
_NON_TEST_DIR_NAMES set in layout_and_move.py — by AXM convention
that directory holds static test data (corpora, snapshots, baselines)
consumed by real tests, not test files. Other non-canonical tiers
(tests/functional/, tests/hooks/, ...) are still relocated.
B5 — non-atomic apply could corrupt a real suite
run(apply=True) mutated tests/ in place with no transaction: any
exception mid-cascade (e.g. an unguarded KeyError on a runtime
suite-derived key in the import backfill, surfacing as an opaque
"test_basic") left the tree half-written, and a botched move/split
could emit a module that no longer parses. The skill's parity/idempotence
invariants were only checked on the corpus, never enforced on the live
tree.
run(apply=True) is now wrapped by _run_apply_atomic():
- snapshot —
_snapshot_tests()copies the wholetests/tree into a temp dir outside the project (so a botched in-place mutation can never touch the backup); - mutate — the fixed-point loop + post-polish run as before;
- collect gate —
_collect_only_gate()does an in-processcompile()sweep of everytest_*.py(the cheap, deterministic equivalent ofpytest --collect-onlyfor the dominant failure class: a file that no longer parses — a real subprocess per apply made the suite time out); - promote / rollback — on success the snapshot is discarded; on
any exception or a failed gate,
_restore_tests()restores the tree byte-identical and aFixApplyErrorwith an actionable message is raised (never the bare"test_basic"token; the original cause is chained via__cause__).
Import-backfill hardening (cst_rewrite.py): _extend_recoverable()
guards both donor lookups against KeyError (a miss degrades to
"unresolved", never crashes), and a name that genuinely resolves to no
donor is reported as a structured unresolved import for <name> warning
(stashed under the _UNRESOLVED_KEY sentinel by
_resolve_recoverable_imports and re-emitted by
_backfill_missing_imports) instead of being dropped silently.
TYPE_CHECKING-only donor imports (e.g. MockerFixture used solely in a
method-parameter annotation under if TYPE_CHECKING:) are carried into
every split/relocate target that references them, so collection no longer
fails with a post-fix NameError.
Hardened edge cases (collateral fixes)
Bug classes uncovered during cross-corpus validation. Each entry explains a non-obvious code path you'd otherwise wonder about:
_make_pkgsignature drift across files —_resolve_helper_conflicts()(layout_and_move.py) renames the source-side helper toH__from_<stem>when source and target define helpers with the same name but different bodies.shared_helpers="duplicate"then copies cleanly.@pytest.mark.usefixtures("X")invisible to anvil — anvil walks AST refs on moving symbols, but marker arguments are string literals, so fixtures injected via marker used to stay in source and disappear when source was stripped._collect_marker_fixtures_to_move()(tests_ast.py) now scans moving units forusefixturesmarkers and adds the referenced fixtures to the move list when they're source-defined and not target-visible.- Decorator-referenced module-level constants dropped on SPLIT — a
test decorated with
@_aliaswhere_alias = pytest.mark.skipif(...)is a top-level Assign used to produce un-collectable target files: anvil moved the test but left_alias(and any constants it referenced likeCASES) behind, then_finalize_split_anchorgit-mv-renamed source to the anchor target with the moved constants already stripped from source — so even the anchor lost them._collect_module_level_deps_to_copy()(tests_ast.py) computes the transitive closure of free-name references inside decorators of the moving units, restricted to top-levelAssign/AnnAssigndefs in source._copy_module_level_deps_to_target()(layout_and_move.py) splices those statements into the target as text after anvil runs, without touching source — so the anchor's source-rename keeps the constants too. Closure follows chains (B = A + 1→ also carryA = 1), preserves source order, and is surgical: unrelated top-level names are not carried. - SPLIT / MERGE module-level load-time order — when a moving unit
references names from its decorator chain (e.g.
def helper(): ...+CONST = helper()+_alias = pytest.mark.skipif(not CONST, ...)+@_alias @pytest.mark.parametrize("c", CONST)), the combination of anvil'sshared_helpers="duplicate"mode and_copy_module_level_deps_to_targetcan leave the target with top-level statements in an order that breaks module load (_alias = ... not CONST ...evaluating beforeCONST = helper(), orCONST = helper()evaluating beforedef helper)._topological_reorder_decorator_deps()(layout_and_move.py) runs as the last step of_safe_move_unitsand is target-centric: it builds a top-level dependency graph from the target tree alone, spanningAssign/AnnAssign/FunctionDef/AsyncFunctionDef/ClassDef. FreeName(Load)references are walked inAssign.value/AnnAssign.value+annotationandFunctionDef.decorator_list/ClassDef.decorator_list+bases+keywords— bodies of defs are NOT load-evaluated and are intentionally skipped. The hoist set is the transitive closure of names referenced by any module-level decorator; hoisted statements are topologically sorted (deps before dependents) and reinserted before the first remaining statement that load-time-references any hoisted name. The pre-anvil source tree is used only as a stable tiebreaker between valid linearisations — not as the source of truth for ordering, which matters when a target receives content from multipleFileOps (split-with-multiple-targets, merge, or both): no single source tree describes all contributors. Statements outside the hoist set keep their existing relative position — the pass is surgical and idempotent (running it twice on a correctly-ordered target is a no-op). Path(__file__).parents[N]drift after relocate — a file moved fromtests/unit/core/test_X.py(4-deep) totests/integration/test_X.py(3-deep) keeps itsFIXTURES = Path(__file__).parents[2] / "fixtures"constant, which now points to project root instead oftests/._patch_file_dunder_depth()(cst_rewrite.py) detects the depth delta from_file_depth_from_project()and rewrites both surface forms (parents[N]subscript and.parent.parent...chains) via libcst. The id-collection pre-pass MUST run after_DunderPatchersince libcst rebuilds nodes during any visit (even no-op transforms), so originalmoduleids are stale by the time_PatchChainOnceruns.- Conftest fixture shadowing on MERGE — when source has no local
fixture
X(relies on conftest's body) but target has a localXwith a different body, the moved tests bind to target's local at runtime and fail withSymbol not found._resolve_helper_conflictsalso renames the source-side helper when target lacks it BUT a conftest on target's ancestor chain provides a fixture of the same name._collect_conftest_fixtures()(tests_ast.py) walks the chain. - Helper-rename misses transitive references —
_resolve_helper_conflictsused to check helper collisions only against names directly referenced by the moving units. A test consuming a fixture which itself calls a helper would miss the helper at rename time and bind to target's same-named but body-different version. Fix: extendreferencedwith a fixed-point closure that walks names → source helpers → names-they-reference until no new helpers appear. - Promoted helper's dependencies left behind — when
_extract_shared_helpers_in_tier()(extract_helpers.py) promotes a fixture totests/conftest.py, its helper dependencies (already living intests/<tier>/_helpers.py) are not imported by the destination conftest._synth_import_from_helpers()incst_rewrite.pyis a last-resort backfill in_backfill_missing_imports()that scans everytests/<tier>/_helpers.pyfor a top-level def of the missing name and synthesises afrom tests.<tier>._helpers import <name>statement. _git_mvsilent overwrite — theshutil.movefallback used whengit mvrefused (target exists) used to destroy pre-existing files._git_mv()inio_primitives.pynow raisesFileExistsErroron pre-existing targets;_execute_relocate/_execute_rename(stages_execute.py) re-route through_safe_move_units(via the_reroute_through_safe_movehelper). The pre-call existence check isexists()(notis_file()) so a pre-existing directory at the target doesn't slip through._git_mvrace when two ops target the same destination in one iteration —shutil.Error("Destination path already exists")is caught and translated toFileExistsError, then re-routed through_safe_move_units(same path as the silent-overwrite case above)._retierate the.pyextension for tests attests/root — fortests/test_X.py(no tier subdir yet) the function used to doparts[1] = target_lvland return a directory path, then_safe_move_unitscrashed withIsADirectoryErroronast.parse(target.read_text()).paths.py:_retiernow branches onlen(parts) == 2: inject the tier betweentestsand the file instead of substituting at index 1.- Overlong renamed identifiers (E501) —
_bounded_renamein_safe_move_units(layout_and_move.py) falls back to a 6-char sha1 digest of the stem (__from_<digest>) when the verbose form would push the def line past 88 chars. Stem hashing keeps cross-source-file uniqueness; the verbose form is preferred when it fits. - Split docstring placement (E402) — SPLIT seeds a new file with a
module docstring before anvil prepends imports.
_reorder_module_statements(cst_rewrite.py) used to treat any leadingast.Expr/Constant/stras part of the import head, leaving the docstring after the imports (E402 trigger on every subsequent import). It now detects the docstring as a separate statement (PEP 257: must be the very first body element to count as one) and unconditionally promotes it to position 0 in the rewritten body.
Forward-looking — possible migration to axm-ast
The higher-level helpers in tests_ast.py (_top_level_test_classes,
_top_level_helpers, _collect_imported_names) could move to axm-ast
if/when that package exposes raw ast.Module access (it currently wraps
everything in Pydantic models). The fine-grained walkers
(_class_is_pathological, _marker_fixtures_in_unit, _func_body_hash)
are too pytest-specific to belong in a general library — keep them here.
Convergence + parity invariants
A correct pipeline run must satisfy:
- Idempotence — a second dry-run after
--applyplans zero ops. - Parity — pass count and coverage % are unchanged across
--apply. Drift in either direction is a red flag (a test was silently dropped/duplicated, or a fixture extraction broke isolation). - Monotonicity — pyramid score never decreases across iterations.
These invariants are enforced by source-level tests under tests/. They
are not a runtime burden — the pipeline runs the same regardless.
Out of pipeline (agent-driven follow-ups)
TEST_QUALITY_DUPLICATE_TESTS→/dedup-testsTEST_QUALITY_PRIVATE_IMPORTS→/private-imports-clearTEST_QUALITY_PYRAMID_LEVEL(residual mixed-tier files) →/pyramid-relocateTEST_QUALITY_TAUTOLOGY→/tautology-clearPRACTICE_TEST_MIRROR→/mirror-fixTEST_QUALITY_NO_PACKAGE_SYMBOL→/scenario-renameor manual deletion review.