What is IncryptRouter?
IncryptRouter (Incrypt Smart Router) is a single, production-ready proxy that sits between your OpenClaw agents — or any OpenAI-compatible client — and multiple LLM providers. Instead of sending every request to one expensive model, it classifies each request by complexity, compresses conversation history to cut tokens, and routes to the cheapest capable model. You get one OpenAI-compatible HTTP endpoint, your own API keys (free-tier first), optional response caching, and automatic failover. No wallet or x402 required.
Why routing matters
If you use OpenClaw or any single-model API with a premium model as default, most traffic is overkill: autocomplete, short Q&A, and syntax fixes get sent to a $15–25/M-token model. IncryptRouter fixes that by sending each request to the cheapest model that can handle it: simple tasks go to free or low-cost providers (Groq, Cerebras, HuggingFace), standard tasks to mid-tier models, and only hard reasoning hits premium models. You add API keys as you need them and scale by editing .env.
Core capabilities
- Smart routing — Classifies each request as simple, standard, or complex and sends it to the cheapest capable provider. Free tiers are used first when keys are present.
- Context compression — Compacts conversation history before calling the model (rule-based cleanup, dictionary coding, RLE-style shortening), reducing token usage and cost. Inspired by claw-compactor.
- Caching — Caches responses by request fingerprint; repeated or near-identical requests are served from cache with configurable TTL and size.
- Fallback — If the chosen provider times out or errors, the router tries the next provider in the same tier automatically.
- Observability — Per-request metadata (tier, model, provider, latency, cache hit) and a
/statsendpoint for dashboards or debugging.
How the pipeline works
Every request to /v1/chat/completions goes through a deterministic pipeline before hitting any provider.
- Cache gate — The request (messages + options) is hashed. On cache hit, the stored response is returned with
meta.cacheHit: true; no compaction or provider call. - Compaction — If compression is enabled, the message list is run through rule-based cleanup (dedupe, collapse newlines/spaces), optional dictionary coding (frequent phrases → short codes), and RLE-style shortening to reduce token count.
- Classifier — A heuristic (length, keywords like "analyze", "report", thresholds) maps the request to simple, standard, or complex. No external API is used.
- Routing policy — For the chosen tier, the router picks from configured providers that have an API key in
.env. Free-first: simple tier uses free providers; standard/complex use paid providers when keys are present. - Provider mesh + failover — The router calls the first provider in the tier; on timeout or error it tries the next in order. The response is normalized to an OpenAI-shaped structure.
- Response and telemetry — The response is cached and metadata (tier, model, provider, latency, cache hit, token counts) is appended and stored for
/stats.
Request flow
Routing tiers
Classification is heuristic (message length and keywords). You can override with forceTier in the request body.
| Tier | Typical tasks | Providers (when key in .env) |
|---|---|---|
| Simple | Short Q&A, formatting, lookups, "what is X" | Groq, Cerebras, HuggingFace, Together (free-tier) |
| Standard | Email drafting, summaries, medium-length reasoning | Together, OpenAI gpt-4o-mini, Anthropic Haiku, Gemini Flash |
| Complex | Analysis, reports, coding, deep reasoning | OpenAI gpt-4o, Anthropic Sonnet |
vs ClawRouter, claw-compactor, TokenWatch
IncryptRouter unifies routing, compression, and caching in one service you host yourself, with your API keys. ClawRouter uses x402/USDC and BlockRun's model list; claw-compactor is offline scripts; TokenWatch is a proxy with rate limits. Use IncryptRouter when you want one process, your own keys, free-first routing, built-in compaction and caching, and a single OpenAI-compatible endpoint.
/stats + telemetry in response.API reference
Base URL: http://localhost:3140 (or your host). All responses JSON.
- POST /v1/chat/completions — OpenAI-compatible chat completion. Body:
messages(required), optionalmodel,max_tokens,temperature,forceTier(simple|standard|complex). Streaming not supported. Response includesmeta: tier, model, provider, cacheHit, compressedFromTokens?, latencyMs?, fallbackUsed?. - GET /health — Liveness. Returns
{ status: "ok", service: "incrypt-smart-router" }. - GET /stats — Cache size and last 50 requests (tier, model, provider, latency, tokens, cache hit) for debugging or dashboards.
Configuration
Set in .env. At least one free-tier key (e.g. GROQ_API_KEY) is recommended.
PORT(default 3140),COMPRESSION_ENABLED(default true),CACHE_TTL_SECONDS(default 3600),CACHE_MAX_SIZE(default 1000),LOG_LEVEL(default info)- Provider keys:
GROQ_API_KEY,CEREBRAS_API_KEY,HF_TOKEN,TOGETHER_API_KEY,OPENAI_API_KEY,ANTHROPIC_API_KEY,GOOGLE_AI_API_KEY(adapter not yet implemented)
Install and quick start
One-command install clones the repo (default ~/incrypt_smart_router), runs npm install, npm run build, and creates .env from .env.example. Add at least one API key (e.g. Groq free tier at console.groq.com), then npm start. Server listens on http://localhost:3140.
OpenClaw setup
After the router is running, add a custom model provider with baseUrl: "http://localhost:3140/v1". In ~/.openclaw/openclaw.json under models.providers, add "incrypt-router" with that baseUrl and an OpenAI-compatible model entry (e.g. id: "incrypt/auto"). Set the agent's primary model to incrypt-router/incrypt/auto, then restart the gateway: openclaw gateway restart. You can also tell your OpenClaw agent in natural language: "Install and use https://github.com/GHX5T-SOL/incrypt_smart_router" or "Set up Incrypt Smart Router and use it as my LLM backend".
Development
Source lives in src/ (router, compaction, classifier, cache, failover, providers, server). Tests in src/**/*.test.ts. Commands: npm run build, npm run test, npm run dev, npm run typecheck. See the repo docs for OPENCLAW_INTEGRATION and ARCHITECTURE.