Architecture

Overview

axm-smelt follows a layered architecture with clear separation of concerns:

graph TB
    CLI["CLI (cli.py)"] --> Pipeline["smelt() / check()"]
    MCP["SmeltTool (AXMTool)"] --> Pipeline
    Pipeline --> Detector["detect_format()"]
    Pipeline --> Counter["count() — tiktoken"]
    Pipeline --> Strategies["Strategy pipeline"]
    Strategies --> Registry["_REGISTRY / _PRESETS"]
    Pipeline --> Report["SmeltReport"]

Layers

1. Public API (`init.py`)

Three exported functions:

smelt(text?, strategies?, preset?, *, parsed?) — run the pipeline and return a SmeltReport. Accepts either text (str) or parsed (dict/list); at least one is required.
check(text?, *, parsed?) — dry-run every registered strategy and return per-strategy savings estimates. Same input contract as smelt.
count(text, model?) — count tokens via tiktoken (o200k_base by default)

2. CLI (`cli.py`)

Four commands via cyclopts: compact, check, count, version. All read from stdin or --file. compact also accepts --strategies, --preset, and --output. The CLI calls the same core functions as the Python API — no business logic lives in the CLI layer.

3. MCP Tool (`tools/smelt.py`)

SmeltTool(AXMTool) and SmeltCheckTool(AXMTool) expose the pipeline as MCP tools registered under the axm.tools entry point group. When data is already a dict or list, the tools pass it via parsed= to skip the serialize→deserialize round-trip.

4. Pipeline (`core/pipeline.py`)

smelt() composes three helpers (resolve_input and resolve_strategies are module-level public; _apply_strategies is private):

resolve_input(text, parsed) — normalizes inputs into (text, parsed). If parsed is provided it is JSON-serialized; if neither argument is given, raises ValueError
resolve_strategies(strategies, preset) — returns strategy instances from explicit names, a preset name, or the "safe" default
_apply_strategies(ctx, strats, current_tokens) — applies strategies in order with a token-count guard: each strategy.apply(ctx) receives and returns a SmeltContext; the strategy is only accepted if it strictly reduces tokens (or reduces text length at equal tokens). Strategies that regress are silently discarded

Between helper calls, smelt() detects the format via detect_format_parsed() (JSON is probed inline to capture the already-parsed object; the remaining probes are try_xml, try_yaml, try_markdown), counts input tokens, and builds the initial SmeltContext. After _apply_strategies returns, it counts output tokens and computes savings_pct.

The savings baseline depends on how the input arrived. For the text= path the baseline is the provided raw string, unchanged. For the parsed= path there is no canonical textual form, so the baseline is the pretty serialization json.dumps(parsed, indent=2, ensure_ascii=False) rather than the compact dump produced by resolve_input. Measuring against the already-minified compact form would structurally under-report the reduction; the pretty baseline reflects the indented form a user would otherwise have read, so savings_pct matches the perceived reduction. report.original still carries the compact serialization (the pipeline's working text).

check() runs every registered strategy independently on the original SmeltContext and records per-strategy savings without chaining. Only strategies with positive savings (> 0%) are included in strategy_estimates; strategies that regress or break even are omitted.

5. Strategies (`strategies/`)

Each strategy is a class implementing SmeltStrategy (name, category, apply(ctx) -> SmeltContext). Strategies are registered in _REGISTRY and composed into presets via _PRESETS:

Serialization key ordering — every JSON re-serialization site uses a single canonical policy: json.dumps(..., sort_keys=True, separators=(",", ":"), ensure_ascii=False). The structural strategies (minify, drop_nulls, round_numbers, flatten, dedup_values_with_refs) emit object keys in sorted order, matching SmeltContext.text. This makes the serialized output independent of which code path produced it (a value round-tripped through a strategy has the same byte layout as the same value via SmeltContext.text), which is the stronger determinism guarantee.

Preset	Strategies
`safe`	`minify`, `collapse_whitespace`
`moderate`	`minify`, `drop_nulls`, `flatten`, `dedup_values_with_refs`, `tabular`, `strip_quotes`, `collapse_whitespace`, `compact_tables`, `strip_html_comments`
`aggressive`	`minify`, `drop_nulls`, `flatten`, `tabular`, `round_numbers`, `dedup_values_with_refs`, `strip_quotes`, `collapse_whitespace`, `compact_tables`, `strip_html_comments`

Strategy class	Name	Category
`MinifyStrategy`	`minify`	whitespace
`CollapseWhitespaceStrategy`	`collapse_whitespace`	whitespace
`CompactTablesStrategy`	`compact_tables`	whitespace
`DropNullsStrategy`	`drop_nulls`	structural
`FlattenStrategy`	`flatten`	structural
`TabularStrategy`	`tabular`	structural
`DedupValuesStrategy`	`dedup_values_with_refs`	structural
`StripQuotesStrategy`	`strip_quotes`	cosmetic
`StripHtmlCommentsStrategy`	`strip_html_comments`	cosmetic
`RoundNumbersStrategy`	`round_numbers`	cosmetic

6. Format Detection (`core/detector.py`)

Heuristic detection returns a Format enum value (JSON, YAML, XML, TOML, CSV, MARKDOWN, TEXT). Strategies that are format-specific (e.g., minify for JSON) check the first character before attempting to parse.

7. Models (`core/models.py`)

SmeltContext — frozen dataclass carrying the detected format plus one source-of-truth representation (text or parsed); the other is derived deterministically and cached on first access. SmeltContext.text derives from parsed via the canonical sorted-key serialization (sort_keys=True, compact separators), the same policy every strategy applies. Strategies build a new SmeltContext instead of mutating the existing one, so the two representations cannot drift. SmeltReport — Pydantic model carrying the compaction metrics plus the counter_backend used. Format — string enum.

Data Flow

sequenceDiagram
    participant User
    participant API as smelt() / CLI
    participant Pipeline
    participant Strategies

    User->>API: smelt(text, preset="moderate")
    API->>Pipeline: _resolve_input(text, parsed)
    Pipeline-->>API: (text, parsed)
    API->>Pipeline: detect_format(text)
    API->>Pipeline: count(text) -> original_tokens
    API->>Pipeline: _resolve_strategies(None, "moderate")
    Pipeline-->>API: strats
    Pipeline->>Pipeline: SmeltContext(text, format)
    API->>Pipeline: _apply_strategies(ctx, strats, original_tokens)
    loop For each strategy in strats
        Pipeline->>Strategies: strategy.apply(ctx)
        Strategies-->>Pipeline: ctx (accepted if tokens decrease, else discarded)
    end
    Pipeline-->>API: (ctx, applied)
    Pipeline->>Pipeline: count(ctx.text) -> compacted_tokens
    Pipeline-->>User: SmeltReport

Architecture

Overview

Layers

1. Public API (__init__.py)

2. CLI (cli.py)

3. MCP Tool (tools/smelt.py)

4. Pipeline (core/pipeline.py)

5. Strategies (strategies/)

6. Format Detection (core/detector.py)

7. Models (core/models.py)