AISmush is a drop-in proxy that makes Claude Code 90% cheaper — without changing how you work.
$0.03 sessions that used to cost $30.
Route tasks across Claude, DeepSeek, OpenRouter's 290+ models, and your own local servers — Ollama, LM Studio, llama.cpp, vLLM. AISmush picks the cheapest model that can handle each turn.
A heavy coding session burns through $20-50 in API costs. Most of those tokens are spent on mechanical tasks — reading files, processing tool results, making simple edits — that don't need Claude's $15/M token brain.
One command scans your codebase, sends it to AI for deep analysis, and generates Claude Code agents customized to YOUR project — your patterns, your frameworks, your architecture.
Not generic templates. Agents that know your specific file structure, your naming conventions, your test framework, your build commands.
AISmush automatically detects what kind of work each turn requires and routes it to the cheapest model that can handle it — across every provider you have configured.
Routes across Claude, DeepSeek, OpenRouter (290+ models), and local servers: Ollama, LM Studio, llama.cpp, vLLM, Jan. Planning and architecture? Claude. Tool results and edits? Local Ollama — free.
The biggest single improvement to token usage. Older tool results in your conversation get replaced with compact structural summaries — just function signatures, type definitions, and imports.
Your last 4 messages stay fully intact. Only older code results get summarized. JSON, YAML, and error results are never touched.
Every tool result passes through our compression engine before reaching the AI. We strip what the AI doesn't need while keeping what it does.
Content-type aware — we know the difference between code (strip comments), JSON (never touch), and logs (deduplicate aggressively). Inspired by RTK's approach.
Every developer's frustration: "I already told you this yesterday."
Other tools remember tool names. AISmush captures entire conversations — your questions, the AI's answers, the reasoning, the decisions. Searchable by meaning, not just keywords.
Claude handles 200K tokens. DeepSeek handles 64K. Long sessions blow past DeepSeek's limit, causing failures and lost work.
AISmush automatically manages the mismatch. Old tool results get trimmed, large contexts route to Claude, and your work is never blocked.
See exactly what you're saving. Every request tracked: which provider, how many tokens, what it cost, what it would have cost on Claude alone.
Ask Claude to make a plan, then say "run plan". AISmush analyzes every step, maps each one to the best specialized agent, figures out what can run in parallel, and executes the entire thing autonomously.
One command. Single binary. No dependencies.
aismush --scan generates agents for your project.
aismush-start launches Claude Code. You save 90%.
Routes across Claude, DeepSeek, and OpenRouter's 290+ models. Max cloud savings (~90%). Configure providers with aismush --setup.
aismush-start
Local models (Ollama, LM Studio, llama.cpp, vLLM) handle tool results and edits for free. Cloud only when the task needs it. Up to 95%+ savings.
aismush-start --local
No secondary provider needed. Still get compression, memory, agents, and tracking. Dashboard shows potential savings.
aismush-start --direct
Pure Rust. No C dependencies. Native builds for every platform.
Then run aismush --setup for interactive provider configuration — connects and tests each provider (Claude, DeepSeek, OpenRouter, Ollama, and more).
Works on Debian 12+, Ubuntu 22.04+, any modern Linux, macOS (Intel & ARM), and Windows.
Shared savings dashboard for engineering teams. Track ROI across developers.
Set hourly and daily spend limits per provider. Auto-switch to cheaper models when budgets are hit.
Auto-test model quality per task type against your own codebase. Know which model is actually best for your work.