Scoring & Grades
Composite Quality Score
The quality score is a weighted average across 10 categories, on a 100-point scale:
| Category | Tool | Weight |
|---|---|---|
| Linting | Ruff | 20% |
| Type Safety | mypy | 15% |
| Complexity | radon | 15% |
| Security | Bandit | 10% |
| Dependencies | pip-audit + deptry | 10% |
| Testing | pytest-cov | 15% |
| Architecture | AST analysis | 10% |
| Practices | AST analysis | 5% |
pie title Category Weights
"Linting" : 20
"Type Safety" : 15
"Complexity" : 15
"Testing" : 15
"Security" : 10
"Dependencies" : 10
"Architecture" : 10
"Practices" : 5
Each category produces a score from 0 to 100. The composite score is:
score = lint × 0.20 + type × 0.15 + complexity × 0.15
+ security × 0.10 + deps × 0.10 + testing × 0.15
+ architecture × 0.10 + practices × 0.05
Why no Structure category?
Structure validation (project layout, pyproject.toml completeness) is handled
by axm-init with 16 dedicated checks. axm-audit focuses on code quality.
Category Scoring
Lint Score
Pass threshold: ≥ 80 (≤ 10 issues).
Format Score
Pass threshold: ≥ 80 (≤ 4 unformatted files).
Diff Size Score
score = 100 if lines ≤ ideal
score = 0 if lines ≥ max
score = 100 − (lines − ideal) × 100 / (max − ideal) otherwise
Defaults: ideal = 400, max = 1200. Configurable via pyproject.toml:
[tool.axm-audit]
diff_size_ideal = 400 # lines — perfect score ceiling
diff_size_max = 1200 # lines — zero score floor
Pass threshold: ≥ 80 (≤ 560 lines with defaults).
Type Score
Pass threshold: ≥ 80 (≤ 4 errors).
Complexity Score
High complexity = cyclomatic complexity ≥ 10. Pass threshold: ≥ 80 (≤ 2 complex functions).
Security Score
Average of two sub-scores:
- Bandit:
max(0, 100 − high_count × 15 − medium_count × 5)— vulnerability scanning - Hardcoded secrets:
max(0, 100 − count × 25)— regex pattern detection
Dependencies Score
Average of two sub-scores:
- pip-audit:
max(0, 100 − vuln_count × 15)— known CVEs - deptry:
max(0, 100 − issue_count × 10)— unused/missing deps
Testing Score
Uses pytest-cov to measure line coverage. Pass threshold: ≥ 80%.
Architecture Score
Average of four sub-scores:
- Circular imports:
max(0, 100 − cycle_count × 20) - God classes:
max(0, 100 − god_class_count × 15) - Coupling:
max(0, 100 − N(modules > threshold) × 5)— fan-out exceeding 10 imports - Duplication:
max(0, 100 − duplicate_pair_count × 10)
Practices Score
Average of five sub-scores:
- Docstring coverage:
int(coverage_pct × 100) - Bare excepts:
max(0, 100 − count × 20) - Blocking I/O:
max(0, 100 − count × 15)— detectstime.sleepin async contexts and HTTP calls withouttimeoutparameter - Logging presence:
int(coverage_pct × 100) - Test mirroring:
max(0, 100 − missing_count × 15)
Grading Scale
| Grade | Score | Meaning |
|---|---|---|
| A | ≥ 90 | Excellent — production-ready |
| B | ≥ 80 | Good — minor issues |
| C | ≥ 70 | Acceptable — needs attention |
| D | ≥ 60 | Poor — significant issues |
| F | < 60 | Failing — critical problems |
Severity Levels
Each individual check carries a severity:
| Severity | Effect | Example |
|---|---|---|
error |
Blocks audit pass | Missing pyproject.toml |
warning |
Non-blocking | High complexity function |
info |
Informational only | Docstring coverage stats |
Type Safety
All results use Pydantic models (AuditResult, CheckResult, Severity) with extra = "forbid" for strict validation — safe for both human and agent consumption.