Observability, Replays & Compression for App Builders

Observe, Replay and
Compress LLM traffic.

TokenBee sits between your RAG applications and your LLM providers. Get high-fidelity query-aware compression, session replays, and complete observability with our unified SDKs.

Get started for free Read the docs

App or Agent

One gateway, many providers.

Your application sends requests to TokenBee, where we can compress your prompts and reduce token usage automatically, then forward them to the LLM provider of your choice.

Normalize responses across models so you can switch providers easily
Observe and debug production AI traffic end-to-end
Control costs with query-aware token compression and caching

Supported Providers

OpenAI

Anthropic

Gemini

xAI

Mistral

$npm install @tokenbee/sdk

import { TokenBee, TokenBeeModel, CompressionRate } from '@tokenbee/sdk';

// Initialize with your TokenBee key + provider key (BYOK)
const bee = new TokenBee({
  apiKey: 'tb_live_...',
  llmKey: 'sk-your-provider-key'
});

const res = await bee.send({
  model: TokenBeeModel.OpenAIGPT4o,
  input: {
    messages: [{ role: 'user', content: 'Explain token compression' }],
    compression: 'auto',
    rate: CompressionRate.High,
  }
});

console.log(res);

~50%Token Savings

~2msGateway Latency

UnifiedSDK Experience

BYOKYour Keys, Your Control

Why TokenBee?

Intelligence at the edge.

Query-Aware Compression

We use a proprietary semantic compression engine optimized for LLM calls. It detects and removes redundant semantic tokens without degrading output quality.

Unlike dumb token removal, TokenBee extracts your actual intent and compresses surrounding context dynamically. Save up to 50% without losing reasoning quality.

App Session Replays

RAG pipelines often fail silently. TokenBee records every step, prompt, and tool call. Play back any session frame-by-frame (opt-out anytime with strict privacy controls).

BYOK Architecture

Bring Your Own Key. Your provider API keys pass through the SDK and are forwarded directly to the LLM. TokenBee never stores them.

Maintain full control over your provider billing, rate limits, and security posture.

Upcoming: Semantic Agent Compression

We are building a specialized compression mode for autonomous coding agents. Minimize context bloat in deep RAG loops without losing reasoning precision.

Coming Soon

Free while in Beta

Simple, transparent pricing

Choose the plan that's right for your traffic. All paid features are currently unlocked for early adopters.

Free

$0/ mo

Start optimizing your LLM costs

1M tokens/mo

Automatic prompt compression
Multi-provider routing
Basic observability

Current Plan

Starter

$9/ mo

For indie developers

20M tokens/mo

Automatic compression (balanced)
Multi-provider support
30-day log retention
Cost & token savings insights

Claim Free Beta Access

⭐ Popular

Pro

$29/ mo

For growing apps

100M tokens/mo

Advanced compression (aggressive)
Full observability dashboard
Priority support
Detailed savings analytics

Claim Free Beta Access

Scale

Custom

Pay as you grow

$0.20 / 1M tokens

Unlimited usage
Volume discounts
Dedicated support

Contact sales

Ready to ship faster?

Join developers saving money and debugging easier with TokenBee.

Start saving tokens now

Observe, Replay andCompress LLM traffic.