Technical Guide

Everything engineers need to know about Clawback: setup, audit mechanics, proxy architecture, and security model.

1. Overview

Clawback is an automated LLM cost audit and optimization proxy. It intercepts your existing LLM API calls via a single environment variable change, logs a statistically significant sample of requests, replays them against alternative models, and delivers a report showing exactly where you can reduce spend without degrading output quality. After the audit, an optional paid proxy continuously routes each call to the most cost-effective model that meets your quality threshold.

Architecture

Your App
Existing code, unchanged
API request (unchanged)
Response (unchanged)
Clawback Proxy
Cloudflare Worker, edge
Logs call, forwards unmodified
Forwarded request
Provider response
LLM Provider
OpenAI, Anthropic, Google
Async (does not block requests)
Audit Engine
  • Log each request/response pair to Cloudflare KV
  • At call threshold, send logs to webhook server
  • Replay each call against alternative models
  • Score output quality with 95% confidence intervals
  • Generate HTML savings report
  • Email report to customer

2. Setup

Prerequisites

Option A: Browser setup

Walk through the guided setup at /onboarding. It detects your providers, generates the correct env vars, and triggers your first audit automatically.

Option B: Terminal (one command)

curl -sL clawback.run | bash

Option C: Manual setup

Add the proxy base URL for each provider you use. Your API keys stay in your local .env and are never sent to Clawback.

# OpenAI OPENAI_BASE_URL=https://proxy.clawback.run/openai/v1 # Anthropic ANTHROPIC_BASE_URL=https://proxy.clawback.run/anthropic # Google GOOGLE_API_BASE_URL=https://proxy.clawback.run/google

What the setup script does

  1. Detects your project root and locates the .env file
  2. Backs up .env to .env.clawback-backup-YYYYMMDD-HHMMSS
  3. Scans for existing OPENAI_API_KEY, ANTHROPIC_API_KEY, and GOOGLE_API_KEY
  4. Adds the corresponding *_BASE_URL env vars pointing to the Clawback proxy
  5. Sends a registration webhook with a SHA-256 hash of your API key (not the key itself) so the proxy can associate your traffic
  6. Runs a single test call to confirm the proxy is reachable

3. How the Audit Works

Call logging

The Clawback proxy intercepts each LLM request, logs the request/response metadata to Cloudflare KV, and passes the original call through to the provider unchanged. Your application receives the same response it would without the proxy. Latency overhead is sub-millisecond (edge routing only, no payload inspection in the hot path).

150-call trigger

Once 150 calls have been logged for your account, the audit kicks off automatically. No manual trigger needed. 150 calls provides a statistically significant sample across your usage patterns. Most teams hit this threshold within the first day.

Replay process

The audit engine replays your logged calls against cost-optimized alternatives from each provider:

Each call is replayed with identical inputs: same system prompt, same user message, same parameters. Replay runs on Clawback's infrastructure using our API accounts. You are never billed for replay calls.

Quality scoring

Each alternative model's output is scored against your original response on three dimensions:

Results are reported with 95% confidence intervals so you see statistical reliability, not just averages.

Report generation and delivery

After replay and scoring complete, the audit engine generates a per-endpoint report showing which calls can safely use an alternative model and which ones genuinely need the model you are paying for. The report is emailed to the address associated with your account and is also available in the dashboard.

4. How the Paid Proxy Works

Model routing

Based on the audit results, the proxy builds a per-endpoint routing table. Each endpoint (identified by system prompt hash + model + path) is mapped to the cheapest model that met your quality threshold. You can pin any endpoint to a specific model to override the automatic routing.

Response caching

The proxy uses hash-based deduplication for repeat prompts. If the same input (system prompt + user message + parameters) is seen within the cache TTL, the cached response is returned instantly. Cache hit rates of 10 to 25% are typical for production workloads with templated prompts.

Provider fallback

If the target model returns a rate limit (429) or server error (5xx), the proxy automatically retries with the next cheapest qualifying model from the routing table. Failover is transparent to your application. No code changes, no retry logic on your side.

Cost attribution

Tag calls with a x-clawback-feature header to get per-feature cost breakdowns in the dashboard. See exactly what each part of your product costs in LLM spend.

# Example: tag a call with a feature name curl https://proxy.clawback.run/openai/v1/chat/completions \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -H "x-clawback-feature: summarization" \ -d '{ "model": "gpt-4o", "messages": [...] }'

Budget caps and spend alerts

Set a monthly budget cap per feature or account-wide. When spend reaches 80% of the cap, you receive an email alert. At 100%, the proxy can either hard-stop (reject calls) or soft-stop (continue but alert on every call). Configure this in the dashboard.

5. Security

API keys never leave your machine

Your LLM provider API keys remain in your local .env file. The SDK reads them at call time and sends them directly to the proxy over HTTPS. The proxy forwards them to the provider in the same request. Keys are never logged, stored, or persisted by Clawback.

What we see vs. what we don't

We See We Don't See
Request/response metadata (model, token count, latency) Your raw API keys (only a SHA-256 hash for identification)
Prompt/completion content during replay (in-memory only) Any data after the audit completes (discarded from memory)
Aggregated cost and quality metrics Your source code, infrastructure, or internal systems

Request/response handling

During the audit replay phase, request/response pairs are held in memory for scoring. After the quality scores are computed, the raw content is discarded. No prompts or completions are written to disk or persistent storage in plaintext.

Customer identification

Clawback identifies your account using a SHA-256 hash of your API key. The hash is computed locally by the setup script and sent during registration. The actual key is never transmitted to Clawback infrastructure.

Transport security

HTTPS is enforced on all proxy endpoints. The proxy runs on Cloudflare Workers, which terminates TLS at the edge. Connections to upstream LLM providers also use HTTPS exclusively.

6. Supported Providers

Provider Env Var Proxy URL
OpenAI OPENAI_BASE_URL https://proxy.clawback.run/openai/v1
Anthropic ANTHROPIC_BASE_URL https://proxy.clawback.run/anthropic
Google GOOGLE_API_BASE_URL https://proxy.clawback.run/google
How it works with your SDK All three major SDKs (openai, @anthropic-ai/sdk, @google/generative-ai) respect the base URL environment variable. Setting the env var is all that's needed. No code changes, no wrapper functions, no monkey-patching.

7. Undo / Uninstall

Removing Clawback takes one command. The setup script creates a timestamped backup of your .env before making any changes.

Restore from backup

cp .env.clawback-backup-YYYYMMDD-HHMMSS .env

Or manually remove the proxy URLs

Delete these lines from your .env:

OPENAI_BASE_URL=https://proxy.clawback.run/openai/v1 ANTHROPIC_BASE_URL=https://proxy.clawback.run/anthropic GOOGLE_API_BASE_URL=https://proxy.clawback.run/google
No residual processes Clawback does not install any background services, daemons, or agents on your machine. The proxy runs entirely on Cloudflare's edge network. Removing the env vars is a complete uninstall.

8. Additional FAQ

What counts as a "call" for the audit threshold?

Any completion request that passes through the proxy. Chat completions, function calls, embeddings. Each request/response pair counts as one call. Most teams hit the threshold within the first day.

How does quality scoring work?

Clawback replays your exact production inputs against each alternative model, then scores the output against your original response using semantic similarity, format preservation, and task completion. Results are reported with 95% confidence intervals so you see statistical reliability, not just averages.

What if my use case is too specialized for alternative models?

That happens. Roughly 20 to 30% of endpoints show no viable alternative. The audit report tells you which endpoints can save money and which ones genuinely need the model you're using. Both findings are valuable.

Does the proxy add latency?

The routing decision adds sub-millisecond overhead. The proxy runs on Cloudflare's edge network. If an alternative model responds faster (Haiku and Flash typically do), you may see lower total latency.

Can I exclude specific endpoints from optimization?

Yes. Clawback Pro lets you pin any endpoint to a specific model. If your summarization pipeline needs Claude Opus 4, lock it. Clawback only optimizes the endpoints you allow.

9. FAQ

Common questions from customers are answered on the landing page. See the full FAQ section for details on pricing, data handling, latency impact, model pinning, and more.