Stop Burning Money on AI Coding

AISmush is a drop-in proxy that makes Claude Code 90% cheaper — without changing how you work.

$0.03 sessions that used to cost $30.

Get AISmush Free See How It Works
New in v0.7

Multi-Provider Routing

Route tasks across Claude, DeepSeek, OpenRouter's 290+ models, and your own local servers — Ollama, LM Studio, llama.cpp, vLLM. AISmush picks the cheapest model that can handle each turn.

You: add authentication to the API
AISmush: Planning → Claude ($3/M)
AISmush: Tool results → Ollama (free)
AISmush: Code gen → DeepSeek ($0.27/M)
Done. 90% saved. Local models handled the bulk.

Claude Code is incredible. The bill isn't.

A heavy coding session burns through $20-50 in API costs. Most of those tokens are spent on mechanical tasks — reading files, processing tool results, making simple edits — that don't need Claude's $15/M token brain.

$30+
Typical session
100% Claude
$3
Same session
with AISmush

Eight Weapons Against Token Waste

Game Changer

AI-Generated Project Agents

One command scans your codebase, sends it to AI for deep analysis, and generates Claude Code agents customized to YOUR project — your patterns, your frameworks, your architecture.

Not generic templates. Agents that know your specific file structure, your naming conventions, your test framework, your build commands.

  • Scans your codebase in seconds
  • 5-7 AI calls for deep analysis (~$0.03)
  • Generates agents, skills, and CLAUDE.md
  • Each agent assigned the cheapest model that can do the job
  • Resumes where it left off if interrupted
$ aismush --scan


# Analyzing your codebase...
Detected: Rust + TypeScript + React
Type: fullstack web app (complex)

# Generating project-specific agents:
├─ rust-expert (sonnet) ✓
├─ frontend-engineer (sonnet) ✓
├─ test-runner (haiku) ✓
├─ debugger (sonnet) ✓
└─ explorer (haiku) ✓

Created 5 agents, 8 skills, CLAUDE.md
Core Feature

Smart Model Routing + Blast-Radius Analysis

AISmush automatically detects what kind of work each turn requires and routes it to the cheapest model that can handle it — across every provider you have configured.

Routes across Claude, DeepSeek, OpenRouter (290+ models), and local servers: Ollama, LM Studio, llama.cpp, vLLM, Jan. Planning and architecture? Claude. Tool results and edits? Local Ollama — free.

  • Zero latency overhead — pure heuristic routing
  • Claude for reasoning, local models for execution
  • Blast-radius aware — parses imports to know which files are critical
  • Editing a shared type? Claude. Editing a leaf file? Ollama (free).
  • Automatic failover between providers
  • Error recovery detection (3+ errors → Claude)
# What happens behind the scenes:

"Plan the auth system" → Claude ($0.45)
Tool result: Read file → Ollama (free)
Tool result: Edit file → Ollama (free)
Tool result: Run tests → DeepSeek ($0.001)
"Debug this error" → Claude ($0.12)
Tool result: Grep → Ollama (free)

Session: $0.58 instead of $12.40
Biggest Token Saver

Structural Summarization — 3-5x Fewer Tokens

The biggest single improvement to token usage. Older tool results in your conversation get replaced with compact structural summaries — just function signatures, type definitions, and imports.

Your last 4 messages stay fully intact. Only older code results get summarized. JSON, YAML, and error results are never touched.

  • 200-line file becomes ~30 lines (3-5x reduction)
  • Saves thousands of tokens per request in long sessions
  • Content-type aware — only summarizes code, never data
  • Supports Rust, TypeScript, Python, Go, and more
  • Combined with standard compression: 60-80% total reduction
# Old message tool_result (6,000 tokens):
use std::collections::HashMap;
use crate::db::Db;
// ... 180 lines of implementation ...
// comments, function bodies, tests ...

# After structural summary (1,200 tokens):
[Structural summary (200 lines -> 28 lines)]
use std::collections::HashMap;
use crate::db::Db;
pub struct ProxyState { ... }
impl ProxyState { ... }
pub async fn handle() -> Response { ... }
fn compress_text() -> String { ... }

5x reduction. API surface preserved.
Token Saver

Context Compression

Every tool result passes through our compression engine before reaching the AI. We strip what the AI doesn't need while keeping what it does.

Content-type aware — we know the difference between code (strip comments), JSON (never touch), and logs (deduplicate aggressively). Inspired by RTK's approach.

  • 20-50% fewer tokens on code results
  • Never corrupts JSON, YAML, or data formats
  • Smart truncation preserves function signatures
  • Aggressive log deduplication
# Before compression (2,400 tokens):
// Helper function for auth
// TODO: refactor this later
/* Old implementation
   removed in v2 */
fn validate(token: &str) {
    ...
}

# After compression (1,200 tokens):
fn validate(token: &str) {
    ...
}

50% saved. Zero information lost.
Nobody Has Solved This

Deep Memory — Full Conversation Capture

Every developer's frustration: "I already told you this yesterday."

Other tools remember tool names. AISmush captures entire conversations — your questions, the AI's answers, the reasoning, the decisions. Searchable by meaning, not just keywords.

  • Full conversation capture (not just tool names)
  • Local semantic search — MiniLM-L6-v2, ~10ms, no cloud
  • "auth bug" finds conversations about "JWT validation"
  • Search via dashboard, CLI, or API
  • Auto-injects relevant past context into new sessions
  • Configurable retention (30 days default, or forever)
$ aismush --search "auth token bug"

# Found 3 relevant conversations:

1. [3 days ago] score: 0.92
  You: "Fix the JWT token expiry bug"
  AI: "The issue is in validate_token().
      The expiry check uses > instead of >=..."
  Tools: Read(jwt.rs), Edit(jwt.rs), Bash(cargo test)

2. [5 days ago] score: 0.87
  You: "Add refresh token support"
  AI: "I'll create a refresh endpoint that..."
  - Added WebSocket broadcast server
Reliability

Context Window Management

Claude handles 200K tokens. DeepSeek handles 64K. Long sessions blow past DeepSeek's limit, causing failures and lost work.

AISmush automatically manages the mismatch. Old tool results get trimmed, large contexts route to Claude, and your work is never blocked.

  • Under 55K: both providers work fine
  • 55-64K: trim old tool results for DeepSeek
  • Over 64K: auto-route to Claude (200K window)
  • Never breaks tool_use/tool_result pairing
# Context growing during long session:

Turn 1: 5K tokens → DeepSeek
Turn 10: 25K tokens → DeepSeek
Turn 20: 48K tokens → DeepSeek
Turn 25: 58K tokens → compress + DeepSeek
Turn 30: 72K tokens → auto-route to Claude

# Without AISmush: DeepSeek fails at 64K.
# With AISmush: seamless handoff.
Transparency

Real-Time Cost Dashboard

See exactly what you're saving. Every request tracked: which provider, how many tokens, what it cost, what it would have cost on Claude alone.

  • Live dashboard at localhost:1849/dashboard
  • Per-request cost breakdown
  • Savings percentage with all-Claude comparison
  • Request history with full detail
  • Memory viewer
  • Stats persist across sessions in SQLite
Session Stats

Requests: 142
Claude turns: 12 (planning/debugging)
DeepSeek turns: 130 (execution)

Actual cost: $1.82
All-Claude cost: $18.40
Saved: $16.58 (90.1%)
NEW

Plan Orchestrator — Say "Go" and Walk Away

Ask Claude to make a plan, then say "run plan". AISmush analyzes every step, maps each one to the best specialized agent, figures out what can run in parallel, and executes the entire thing autonomously.

  • Reads plans Claude already generates — no custom format needed
  • Maps steps to specialized agents (rust-expert, data-engineer, etc.)
  • Independent steps run in parallel for maximum speed
  • Context from completed steps feeds forward automatically
  • Verifies results with cargo check/test after completion
  • Always asks for confirmation before executing
You: make a plan to add auth

Claude writes plan with 5 steps...

You: go

PLAN: Add authentication (5 steps)
Wave 1: Step 1 rust-expert
Step 2 data-engineer
Wave 2: Step 3 backend-engineer
Wave 3: Step 4,5 test-runner

Ready to execute? [Go / No]

Three Steps. That's It.

1

Install

One command. Single binary. No dependencies.

2

Scan

aismush --scan generates agents for your project.

3

Code

aismush-start launches Claude Code. You save 90%.

Three Ways to Run

Smart Routing (Default)

Routes across Claude, DeepSeek, and OpenRouter's 290+ models. Max cloud savings (~90%). Configure providers with aismush --setup.

aismush-start

Local + Cloud (Best Savings)

Local models (Ollama, LM Studio, llama.cpp, vLLM) handle tool results and edits for free. Cloud only when the task needs it. Up to 95%+ savings.

aismush-start --local

Direct Mode (Claude Only)

No secondary provider needed. Still get compression, memory, agents, and tracking. Dashboard shows potential savings.

aismush-start --direct

Install in 10 Seconds

Pure Rust. No C dependencies. Native builds for every platform.

curl -fsSL https://raw.githubusercontent.com/Skunk-Tech/aismush/main/install.sh | bash
Linux x86_64 macOS Apple Silicon macOS Intel Windows GitHub

Then run aismush --setup for interactive provider configuration — connects and tests each provider (Claude, DeepSeek, OpenRouter, Ollama, and more).

Works on Debian 12+, Ubuntu 22.04+, any modern Linux, macOS (Intel & ARM), and Windows.

What's Coming Next

Team Dashboard

Shared savings dashboard for engineering teams. Track ROI across developers.

Cost Budgets

Set hourly and daily spend limits per provider. Auto-switch to cheaper models when budgets are hit.

Model Benchmarking

Auto-test model quality per task type against your own codebase. Know which model is actually best for your work.