Technical Guide
Everything engineers need to know about Clawback: setup, audit mechanics, proxy architecture, and security model.
1. Overview
Clawback is an automated LLM cost audit and optimization proxy. It intercepts your existing LLM API calls via a single environment variable change, logs a statistically significant sample of requests, replays them against alternative models, and delivers a report showing exactly where you can reduce spend without degrading output quality. After the audit, an optional paid proxy continuously routes each call to the most cost-effective model that meets your quality threshold.
Architecture
- Log each request/response pair to Cloudflare KV
- At call threshold, send logs to webhook server
- Replay each call against alternative models
- Score output quality with 95% confidence intervals
- Generate HTML savings report
- Email report to customer
2. Setup
Prerequisites
- Node.js 20+ (for local tooling and the setup script)
- A
.envfile in your project root with your LLM API keys - At least one active provider: OpenAI, Anthropic, or Google
Option A: Browser setup
Walk through the guided setup at /onboarding. It detects your providers, generates the correct env vars, and triggers your first audit automatically.
Option B: Terminal (one command)
Option C: Manual setup
Add the proxy base URL for each provider you use. Your API keys stay in your local .env and are never sent to Clawback.
What the setup script does
- Detects your project root and locates the
.envfile - Backs up
.envto.env.clawback-backup-YYYYMMDD-HHMMSS - Scans for existing
OPENAI_API_KEY,ANTHROPIC_API_KEY, andGOOGLE_API_KEY - Adds the corresponding
*_BASE_URLenv vars pointing to the Clawback proxy - Sends a registration webhook with a SHA-256 hash of your API key (not the key itself) so the proxy can associate your traffic
- Runs a single test call to confirm the proxy is reachable
3. How the Audit Works
Call logging
The Clawback proxy intercepts each LLM request, logs the request/response metadata to Cloudflare KV, and passes the original call through to the provider unchanged. Your application receives the same response it would without the proxy. Latency overhead is sub-millisecond (edge routing only, no payload inspection in the hot path).
150-call trigger
Once 150 calls have been logged for your account, the audit kicks off automatically. No manual trigger needed. 150 calls provides a statistically significant sample across your usage patterns. Most teams hit this threshold within the first day.
Replay process
The audit engine replays your logged calls against cost-optimized alternatives from each provider:
- GPT-4o-mini (OpenAI)
- Claude Haiku 4.5 (Anthropic)
- Gemini 2.5 Flash (Google)
- Llama 3.3 70B (via Groq)
- DeepSeek Chat (DeepSeek)
Each call is replayed with identical inputs: same system prompt, same user message, same parameters. Replay runs on Clawback's infrastructure using our API accounts. You are never billed for replay calls.
Quality scoring
Each alternative model's output is scored against your original response on three dimensions:
- Semantic similarity: does the meaning match?
- Format preservation: does the structure (JSON, markdown, lists) match?
- Task completion: does the output accomplish what the prompt asked for?
Results are reported with 95% confidence intervals so you see statistical reliability, not just averages.
Report generation and delivery
After replay and scoring complete, the audit engine generates a per-endpoint report showing which calls can safely use an alternative model and which ones genuinely need the model you are paying for. The report is emailed to the address associated with your account and is also available in the dashboard.
4. How the Paid Proxy Works
Model routing
Based on the audit results, the proxy builds a per-endpoint routing table. Each endpoint (identified by system prompt hash + model + path) is mapped to the cheapest model that met your quality threshold. You can pin any endpoint to a specific model to override the automatic routing.
Response caching
The proxy uses hash-based deduplication for repeat prompts. If the same input (system prompt + user message + parameters) is seen within the cache TTL, the cached response is returned instantly. Cache hit rates of 10 to 25% are typical for production workloads with templated prompts.
Provider fallback
If the target model returns a rate limit (429) or server error (5xx), the proxy automatically retries with the next cheapest qualifying model from the routing table. Failover is transparent to your application. No code changes, no retry logic on your side.
Cost attribution
Tag calls with a x-clawback-feature header to get per-feature cost breakdowns in the dashboard. See exactly what each part of your product costs in LLM spend.
Budget caps and spend alerts
Set a monthly budget cap per feature or account-wide. When spend reaches 80% of the cap, you receive an email alert. At 100%, the proxy can either hard-stop (reject calls) or soft-stop (continue but alert on every call). Configure this in the dashboard.
5. Security
API keys never leave your machine
Your LLM provider API keys remain in your local .env file. The SDK reads them at call time and sends them directly to the proxy over HTTPS. The proxy forwards them to the provider in the same request. Keys are never logged, stored, or persisted by Clawback.
What we see vs. what we don't
| We See | We Don't See |
|---|---|
| Request/response metadata (model, token count, latency) | Your raw API keys (only a SHA-256 hash for identification) |
| Prompt/completion content during replay (in-memory only) | Any data after the audit completes (discarded from memory) |
| Aggregated cost and quality metrics | Your source code, infrastructure, or internal systems |
Request/response handling
During the audit replay phase, request/response pairs are held in memory for scoring. After the quality scores are computed, the raw content is discarded. No prompts or completions are written to disk or persistent storage in plaintext.
Customer identification
Clawback identifies your account using a SHA-256 hash of your API key. The hash is computed locally by the setup script and sent during registration. The actual key is never transmitted to Clawback infrastructure.
Transport security
HTTPS is enforced on all proxy endpoints. The proxy runs on Cloudflare Workers, which terminates TLS at the edge. Connections to upstream LLM providers also use HTTPS exclusively.
6. Supported Providers
| Provider | Env Var | Proxy URL |
|---|---|---|
| OpenAI | OPENAI_BASE_URL |
https://proxy.clawback.run/openai/v1 |
| Anthropic | ANTHROPIC_BASE_URL |
https://proxy.clawback.run/anthropic |
GOOGLE_API_BASE_URL |
https://proxy.clawback.run/google |
7. Undo / Uninstall
Removing Clawback takes one command. The setup script creates a timestamped backup of your .env before making any changes.
Restore from backup
Or manually remove the proxy URLs
Delete these lines from your .env:
8. Additional FAQ
What counts as a "call" for the audit threshold?
Any completion request that passes through the proxy. Chat completions, function calls, embeddings. Each request/response pair counts as one call. Most teams hit the threshold within the first day.
How does quality scoring work?
Clawback replays your exact production inputs against each alternative model, then scores the output against your original response using semantic similarity, format preservation, and task completion. Results are reported with 95% confidence intervals so you see statistical reliability, not just averages.
What if my use case is too specialized for alternative models?
That happens. Roughly 20 to 30% of endpoints show no viable alternative. The audit report tells you which endpoints can save money and which ones genuinely need the model you're using. Both findings are valuable.
Does the proxy add latency?
The routing decision adds sub-millisecond overhead. The proxy runs on Cloudflare's edge network. If an alternative model responds faster (Haiku and Flash typically do), you may see lower total latency.
Can I exclude specific endpoints from optimization?
Yes. Clawback Pro lets you pin any endpoint to a specific model. If your summarization pipeline needs Claude Opus 4, lock it. Clawback only optimizes the endpoints you allow.
9. FAQ
Common questions from customers are answered on the landing page. See the full FAQ section for details on pricing, data handling, latency impact, model pinning, and more.