Skip to content

Scoring & Grades

Composite Quality Score

The quality score is a weighted average across 10 categories, on a 100-point scale:

Category Tool Weight
Linting Ruff 20%
Type Safety mypy 15%
Complexity radon 15%
Security Bandit 10%
Dependencies pip-audit + deptry 10%
Testing pytest-cov 15%
Architecture AST analysis 10%
Practices AST analysis 5%
pie title Category Weights
    "Linting" : 20
    "Type Safety" : 15
    "Complexity" : 15
    "Testing" : 15
    "Security" : 10
    "Dependencies" : 10
    "Architecture" : 10
    "Practices" : 5

Each category produces a score from 0 to 100. The composite score is:

score = lint × 0.20 + type × 0.15 + complexity × 0.15
      + security × 0.10 + deps × 0.10 + testing × 0.15
      + architecture × 0.10 + practices × 0.05

Why no Structure category?

Structure validation (project layout, pyproject.toml completeness) is handled by axm-init with 16 dedicated checks. axm-audit focuses on code quality.

Category Scoring

Lint Score

score = max(0, 100 − issue_count × 2)

Pass threshold: ≥ 80 (≤ 10 issues).

Format Score

score = max(0, 100 − unformatted_count × 5)

Pass threshold: ≥ 80 (≤ 4 unformatted files).

Diff Size Score

score = 100                    if lines ≤ ideal
score = 0                      if lines ≥ max
score = 100 − (lines − ideal) × 100 / (max − ideal)   otherwise

Defaults: ideal = 400, max = 1200. Configurable via pyproject.toml:

[tool.axm-audit]
diff_size_ideal = 400   # lines — perfect score ceiling
diff_size_max = 1200    # lines — zero score floor

Pass threshold: ≥ 80 (≤ 560 lines with defaults).

Type Score

score = max(0, 100 − error_count × 5)

Pass threshold: ≥ 80 (≤ 4 errors).

Complexity Score

score = max(0, 100 − high_complexity_count × 10)

High complexity = cyclomatic complexity ≥ 10. Pass threshold: ≥ 80 (≤ 2 complex functions).

Security Score

Average of two sub-scores:

  • Bandit: max(0, 100 − high_count × 15 − medium_count × 5) — vulnerability scanning
  • Hardcoded secrets: max(0, 100 − count × 25) — regex pattern detection

Dependencies Score

Average of two sub-scores:

  • pip-audit: max(0, 100 − vuln_count × 15) — known CVEs
  • deptry: max(0, 100 − issue_count × 10) — unused/missing deps

Testing Score

score = coverage_percentage

Uses pytest-cov to measure line coverage. Pass threshold: ≥ 80%.

Architecture Score

Average of four sub-scores:

  • Circular imports: max(0, 100 − cycle_count × 20)
  • God classes: max(0, 100 − god_class_count × 15)
  • Coupling: max(0, 100 − N(modules > threshold) × 5) — fan-out exceeding 10 imports
  • Duplication: max(0, 100 − duplicate_pair_count × 10)

Practices Score

Average of five sub-scores:

  • Docstring coverage: int(coverage_pct × 100)
  • Bare excepts: max(0, 100 − count × 20)
  • Blocking I/O: max(0, 100 − count × 15) — detects time.sleep in async contexts and HTTP calls without timeout parameter
  • Logging presence: int(coverage_pct × 100)
  • Test mirroring: max(0, 100 − missing_count × 15)

Grading Scale

Grade Score Meaning
A ≥ 90 Excellent — production-ready
B ≥ 80 Good — minor issues
C ≥ 70 Acceptable — needs attention
D ≥ 60 Poor — significant issues
F < 60 Failing — critical problems

Severity Levels

Each individual check carries a severity:

Severity Effect Example
error Blocks audit pass Missing pyproject.toml
warning Non-blocking High complexity function
info Informational only Docstring coverage stats

Type Safety

All results use Pydantic models (AuditResult, CheckResult, Severity) with extra = "forbid" for strict validation — safe for both human and agent consumption.