Test Quality Rules

The test_quality category surfaces rules that reason about the test tree itself — what it imports, how it asserts, which fixtures do real I/O. Unlike testing (which just gates coverage), test_quality rules guard against brittleness: tests that couple to implementation details, tautological asserts, or mock patterns that drift from production behavior.

CLI

Text Only

axm-audit test-quality [PATH] [--json] [--mismatches-only] [--agent]

Runs the test_quality category and prints the four sections (private imports → pyramid → duplicates → tautologies). Use --json for the machine-readable superset (format_test_quality_json), --agent for the compact agent renderer, or --mismatches-only to focus on pyramid folder↔level violations. Exits 1 when the aggregate score falls below PASS_THRESHOLD. See the CLI reference for details.

Private Imports

Rule ID: TEST_QUALITY_PRIVATE_IMPORTS Class: axm_audit.core.rules.test_quality.PrivateImportsRule Severity: ERROR Score: max(0, 100 - n_violations * 5)

Flags tests/**/test_*.py imports of _prefixed symbols from first-party packages. Importing private helpers couples the test suite to implementation details, so a simple refactor of a private function turns into a multi-file chore.

DELETE / REFACTOR / PROMOTE triage

Every private-import finding falls into one of three buckets — the same taxonomy used by the original DECISION_PRIVATE_IMPORTS.md roadmap note:

Bucket	Meaning	AXM example
DELETE	The test asserts a private helper directly; the scenario is already covered by a public-API test. Drop the redundant test.	`axm-engine/tests/unit/test_hooks_internal.py::test__normalize_params` — removed after `test_hook_run.py` covered the same path via the public `Hook.run()` entry point.
REFACTOR	The test is valuable but reaches through a private surface. Replace the private symbol with a fixture, factory, or public seam.	`axm-audit/tests/unit/core/test_coupling_scoring.py` stopped importing `_compute_fan_out` and now drives scoring via a `CouplingMetricRule` instance.
PROMOTE	The private helper is de-facto public; the right fix is to drop the `_` prefix and export it.	`axm-nexus/tests/test_registry.py` triggered promoting `_ResourceCatalog._load` to `ResourceCatalog.load` plus an `__all__` entry.

What it flags

For every test file the rule walks ast.ImportFrom nodes whose module starts with a package under src/. Each imported symbol is inspected:

Symbol shape	Flagged?	Notes
`_private`	yes	Classified via `axm_ast.extract_module_info`
`_PrivateClass`	yes	Kind = `class`
`_UPPER_CASE`	no¹	`_[A-Z][A-Z0-9_]+` matches — constants only
`__dunder__`	no	Always skipped
public name	no	Not `_`-prefixed

¹ Set include_constants=True on the rule to surface _UPPER_CASE constants as well. Each finding records test_file, line, import_module, private_symbol and a symbol_kind (function, class, constant, variable, unknown).

Same-package submodule exemption

Imports of _prefixed modules (not symbols) from the same top-level first-party package as the test file are not flagged. For example, in a package mypkg, from mypkg.sub import _helper resolves to the submodule mypkg/sub/_helper.py and is allowed when the test lives under that package's test tree. Cross-package imports of private submodules (e.g. pkg_b test importing from pkg_a import _helper) remain flagged.

Owning-package detection:

Single-package projects: every test belongs to the lone package.
Multi-package projects: the owner is inferred from the test path under tests/ (e.g. tests/pkg_b/test_x.py → pkg_b); tests not nested under a recognized package directory are treated as cross-package.

Fix recipes

Export the symbol: drop the _ prefix and add it to the package __all__ / __init__.py.
Pull the helper into a test-only fixture or factory.
Move the assertion one layer up so it exercises the public entry point.

Pyramid v6

Rule ID: TEST_QUALITY_PYRAMID_LEVEL Class: axm_audit.core.rules.test_quality.pyramid_level.PyramidLevelRule Severity: WARNING Score: max(0, 100 - n_mismatches * 2)

Classifies every tests/**/test_*.py function into unit, integration, or e2e based on five soft-signal rules (R1–R5) and reports findings when the classified level does not match the folder the test lives in.

Scoping rules

Rule	What it catches	Example signal
R1 — import-IO attribution	Module-level `import httpx` only counts as I/O when the function body references `httpx`. Cuts false positives from shared imports in pure-function tests.	`imports httpx`
R2 — public-only rescue	Tests that import only public (`__all__`) symbols and do no I/O stay `unit`, even under `tests/integration/`. Fires before the generic `has_public → integration` branch.	reason `"public API import, no real I/O"`
R3 — per-function attr-IO + guards	Attr-IO (`.write_text`, `.mkdir`, `open()`) is traced through helper calls up to depth 2; a hard-writer-attr guardrail keeps `mock-neutralized` from downgrading a test whose body genuinely writes. Also covers `tmp_path`-as-arg taint and fixture-arg tracking.	`attr:.write_text()`, `fixture-arg:tmp_path_factory`, `tmp_path-as-arg`
R4 — conftest-aware fixtures	Fixtures defined in ancestor `conftest.py` files (walked up to `tests/` or the package root) are resolved and flagged when they perform real I/O.	`conftest-fixture-io:tmp_db`
R5 — mock neutralization	When `@patch` / `mock.patch` targets an I/O symbol and no hard signal fires (`tmp_path+write/read`, writer `attr:`), the test's `has_real_io` flips back to `False`. Never applies under subprocess / CLI runner.	`mock-neutralized:module.open,module.write_text`

The hard-writer-attr guardrail is explicit in R3: if the function body contains a writer attribute call like .write_text, R5 cannot neutralize it, even under @patch("module.open"). This prevents a misplaced patch decorator from lying about real I/O.

Classification branches

`has_real_io`	`has_subprocess`	`imports_public`	`imports_internal`	Level	Reason
*	True	*	*	`e2e`	subprocess / CLI runner invocation
False	False	True	False	`unit`	public API import, no real I/O (pure function)
True	False	*	*	`integration`	real I/O (with/without imports)
False	False	False	True	`unit`	internal import, no real I/O
False	False	False	False	`unit`	no real I/O, no package import

Duplicates

Rule ID: TEST_QUALITY_DUPLICATE_TESTS Class: axm_audit.core.rules.test_quality.DuplicateTestsRule Severity: WARNING Score: max(0, 100 - n_clustered_pairs * 5)

Clusters likely-duplicate test functions across the tests/**/test_*.py tree using three structural signals and four rescue anti-signals. A "clustered pair" counts against the score only when no rescue fires; ambiguous clusters are surfaced but do not dock points.

Signals

Signal	What it catches	AXM example
S1 — call + assert fingerprint	Same SUT call signature (`mod.func(2)`) and same normalized assert pattern across the tree.	`axm-ticket/tests/unit/test_parse_symbols.py::test_parses_single_symbol` vs `..._parses_two_symbols` — both reduced to `parse(STR) == LIST`; S1 clustered, P1 rescued them on distinct literals.
S2 — cross-file same-name + high similarity	Tests with identical names across files whose statement-set Jaccard ≥ `0.95`.	Two `test_handles_empty_input` — one in `test_parser.py`, one in `test_lexer.py` — with identical bodies; S2 flagged a real copy-paste.
S3 — intra-file Jaccard ≥ threshold	Statement-set similarity ≥ `ast_similarity_threshold` (default `0.8`) within the same file.	`axm-mail/tests/test_format.py::test_format_plain` vs `test_format_html` — 0.92 Jaccard; S3 clustered, P2 rescued on different `@patch` targets.

Anti-signals (rescues)

Rescue	Trigger	AXM example
P1 — distinct literals	Pair differs on ≥ 2 distinct str/bytes literals per side → `ambiguous_distinct_literals`.	Varying-input parametrize-like tests kept as separate behaviors.
P2 — patch context	Pair exercises different `(decorator, with, mocker)` patch shapes → `ambiguous_patch_context`.	`test_retry_on_5xx` vs `test_retry_on_timeout` — same body shape, different `mocker.patch` targets.
P3 — template pair	Cross-file pair with a ≥ 4-char token diff in filename stem and body ≤ 4 child nodes → `ambiguous_template_pair`.	Per-adapter smoke tests (`test_postgres.py` vs `test_sqlite.py`).
P4 — body size	Intra-file pair whose largest body has ≤ 8 child nodes → `ambiguous_body_size`.	Trivially small smoke tests that look alike by accident.

Tautology Triage v4

Rule ID: TEST_QUALITY_TAUTOLOGY Class: axm_audit.core.rules.test_quality.tautology.TautologyRule Severity: WARNING Score: max(0, 100 - n_findings * 2)

Detects test functions whose asserts can never fail, then triages each finding into DELETE / STRENGTHEN / UNKNOWN by walking a fixed 22-step order over delete-side, precondition, and strengthen-side checks. The rule emits one entry per finding in metadata["verdicts"]; no source rewriting happens here — downstream tooling consumes the verdicts.

Detection patterns

Pattern	Example	Trigger
`trivially_true`	`assert True`, `assert [1]`	Constant truthy / non-empty literal
`self_compare`	`assert x == x`, `assertEqual(x, x)`	Both sides AST-equal
`isinstance_only`	`assert isinstance(r, dict)`	All asserts are shallow `isinstance`
`none_check_only`	`assert x is not None`	All asserts are not-None
`len_tautology`	`assert len(r) >= 0`	Length comparison always true
`mock_echo`	`mock.f.return_value = 1; assert f() == 1`	Asserts the value just stubbed

The 22-step triage ladder

Steps fire in order; the first matching step wins. The ladder has three bands: delete-side (N-prefixed precondition checks that force DELETE), precondition (structural rescues that short-circuit before the strengthen ladder), and strengthen-side (uniqueness / edge-case signals that keep the test).

Delete-side preconditions

Step	Verdict	Fires when	AXM example
`step_n2_import_smoke`	DELETE	Body is `from X import Y; assert Y is not None`-shaped	`axm-engine/tests/services/tools/test_protocol_tools.py::test_imports_ok` — redundant with `test_protocol_tools_init`.
`step_n2b_lazy_import_sut`	STRENGTHEN	Same shape, but test sits in a `test_init.py` lazy-import surface	`axm-nexus/tests/test_package_init.py::test_lazy_imports` — kept: guards boot ordering.
`step_n2c_toplevel_import_not_none`	DELETE	`assert X is not None` where X is top-level-imported AND used by ≥ 1 sibling	`axm-audit/tests/unit/core/test_rules_loaded.py::test_rule_loaded` — sibling already exercises the rule.
`step_n1_no_siblings`	STRENGTHEN	File has a single test — nothing to dedupe against	`axm-commons/tests/test_retry.py::test_retry_once` — sole test, kept.

Precondition rescues

Step	Verdict	Fires when	AXM example
`step_0_self_compare`	STRENGTHEN	`self_compare` pattern — always rescued (author signals intent)	`axm-market/tests/test_ohlc.py::test_bar_equals_itself` — contract conformance.
`step_0c_contract_conformance`	STRENGTHEN	`isinstance(x, T)` where T is a local Protocol / stdlib ABC	`axm-portfolio/tests/test_positions.py::test_position_is_mapping`.
`step_0d_explicit_contract_name`	STRENGTHEN	Test name encodes a contract (`_satisfies_`, `_is_a_`, …)	`axm-ticket/tests/test_schema.py::test_satisfies_ticket_protocol`.

Strengthen-side uniqueness signals

Step	Verdict	Fires when	AXM example
`step_1a_unique_fn`	STRENGTHEN	SUT is not exercised by any sibling	`axm-sentiment/tests/test_lexicon.py::test_load_lexicon`.
`step_2_unique_io`	STRENGTHEN	Uses `tmp_path`/filesystem I/O not exercised by any sibling	`axm-bib/tests/integration/test_pdf_extract.py::test_extract_from_real_pdf`.
`step_3_unique_parametrize`	STRENGTHEN	Carries `@parametrize` while no sibling does	`axm-screener/tests/test_filters.py::test_filter_matrix`.
`step_4_boundary_literal`	STRENGTHEN	Exercises a boundary literal (`0`, `-1`, `""`, `b""`) unseen in siblings	`axm-backtest/tests/test_pnl.py::test_pnl_on_zero_volume`.
`step_4c_significant_setup`	STRENGTHEN	≥ 4 non-trivial setup statements combined with a weak assert	`axm-broker/tests/test_router.py::test_complex_routing_setup`.
`step_1b_different_args`	STRENGTHEN	Same SUT, different literal args — runs before `step_0b` to rescue varying-args cases	`axm-ast/tests/test_parser.py::test_parses_single_line` vs `test_parses_multiline`.
`step_4b_name_edge`	STRENGTHEN	Name mentions an edge-case keyword (`empty`, `null`, `overflow`, …)	`axm-mail/tests/test_threading.py::test_handles_empty_thread`.
`step_4f_intentional_weakness`	STRENGTHEN	Docstring/comment explicitly flags a deliberately weak assertion	`axm-smelt/tests/test_rewrite.py::test_smoke_pass`.
`step_4d_mocked_sut_contract`	STRENGTHEN	Mocked SUT is invoked and the result is `isinstance`-checked	`axm-n8n/tests/test_client.py::test_client_returns_dict`.
`step_4e_homogeneity_check`	STRENGTHEN	`isinstance()` runs inside a loop / `all()` / `any()` — homogeneity contract	`axm-office/tests/test_docx.py::test_all_paragraphs_are_runs`.

Delete-side constructor checks (after strengthen rescues)

Step	Verdict	Fires when	AXM example
`step_0b_n_copies_constructor`	DELETE	Pure constructor + weak assert with ≥ 1 identical-args sibling	`axm-word/tests/test_doc.py::test_new_doc` duplicated by `test_new_empty_doc`.
`step_0b2_impure_sibling_covers_ctor`	DELETE	Pure-ctor test whose constructor is already exercised by an impure sibling	`axm-anvil/tests/test_forge.py::test_forge_init`.
`step_5_default_unknown`	UNKNOWN	Terminator — no step matched	`axm-formal/tests/test_proof.py::test_trivially_true` — left for human review.

The step order is load-bearing: strengthen-side rescues (step_2–step_4e) fire before the delete-side constructor checks (step_0b / step_0b2) so that a weak constructor test carrying real edge-case signal is kept rather than deleted. Note: the triage code itself uses the names without the second underscore (e.g. step0b_n_copies_constructor); the step_ forms above are the documentation convention for this page.

Finding shape

metadata["verdicts"] is a list[dict]; each entry exposes:

file — path relative to the project root
test — test function name
line — line number of the triggering assert
pattern — one of the six detection patterns above
rule — triage step that fired (e.g. step_0b_n_copies_constructor)
verdict — DELETE / STRENGTHEN / UNKNOWN
reason — human-readable explanation from the triage step

Validation

The v6 / v4 stacks were validated against the internal AXM corpus and an external open-source corpus. Results:

Corpus	Findings	DELETE verdicts	False positives
AXM internal (`axm-workspaces/**`)	169	17	0
External corpus	126	—	1

Key numbers: 169 total findings on the internal corpus, 17 DELETE verdicts confirmed by manual review with 0 false positives; 126 findings on the external corpus with exactly 1 false positive flagged during triage (step_0b_n_copies_constructor on a factory-style constructor test). The delete-side ladder is therefore safe to apply automatically on AXM packages; external application should keep the DELETE verdict gated behind a human-in-the-loop review.