first commit

2026-04-26 21:52:23 +03:00
commit 880f412e2c
2662 changed files with 866266 additions and 0 deletions
--- a/docs/features/observability/default.mdx
+++ b/docs/features/observability/default.mdx
@@ -0,0 +1,600 @@
+---
+title: "Built-in Observability"
+description: "Monitor and analyze every AI request and response in real-time. Track performance, debug issues, and gain insights into your AI application's behavior with comprehensive request tracing."
+icon: "cube"
+---
+
+## Overview
+
+Bifrost includes **built-in observability**, a powerful feature that automatically captures and stores detailed information about every AI request and response that flows through your system. This provides structured, searchable data with real-time monitoring capabilities, making it easy to debug issues, analyze performance patterns, and understand your AI application's behavior at scale.
+
+All LLM interactions are captured with comprehensive metadata including inputs, outputs, tokens, costs, and latency. The logging plugin operates **asynchronously** with zero impact on request latency.
+
+![Live Log Stream Interface](../../media/ui-live-log-stream.gif)
+
+---
+
+## What's Captured
+
+Bifrost traces comprehensive information for every request, without any changes to your application code.
+
+![Complete Request Tracing Overview](../../media/ui-request-tracing-overview.png)
+
+### **Request Data**
+- **Input Messages**: Complete conversation history and user prompts
+- **Model Parameters**: Temperature, max tokens, tools, and all other parameters
+- **Provider Context**: Which provider and model handled the request
+- **Prompt Tracking**: When the [Prompts plugin](/features/prompt-repository/prompts-plugin) is active, the log captures the selected prompt name, version number, and ID for full traceability
+
+### **Response Data**
+- **Output Messages**: AI responses, tool calls, and function results
+- **Performance Metrics**: Latency and token usage
+- **Status Information**: Success or error details
+
+### **Retry & Key Selection** <sup>v1.5.0-prerelease4+</sup>
+
+When Bifrost retries a request (rate-limit or network error) the following fields are recorded:
+
+| Field | Meaning |
+|-------|---------|
+| `selected_key_id` / `selected_key_name` | The API key that **successfully** served the request. `null` when all attempts failed — use `attempt_trail` to see which keys were tried. |
+| `number_of_retries` | Total number of attempts minus one. **Does not indicate which key was used on each attempt.** |
+| `attempt_trail` | Ordered array of every attempt, with key used and failure reason. `fail_reason` is `null` on the final attempt. |
+
+**Example `attempt_trail`** — two rate-limit rotations then success on a third key:
+
+```json
+"attempt_trail": [
+  { "attempt": 0, "key_id": "key-a", "key_name": "Key A", "fail_reason": "rate_limit_error" },
+  { "attempt": 1, "key_id": "key-b", "key_name": "Key B", "fail_reason": "rate_limit_error" },
+  { "attempt": 2, "key_id": "key-c", "key_name": "Key C", "fail_reason": null }
+]
+```
+
+Network-error retries reuse the same key; only rate-limit errors rotate to a different key:
+
+```json
+"attempt_trail": [
+  { "attempt": 0, "key_id": "key-a", "key_name": "Key A", "fail_reason": "network_error" },
+  { "attempt": 1, "key_id": "key-a", "key_name": "Key A", "fail_reason": "rate_limit_error" },
+  { "attempt": 2, "key_id": "key-b", "key_name": "Key B", "fail_reason": null }
+]
+```
+
+`attempt_trail` is `null` / absent when the request succeeded on the first try without retries.
+
+### **Custom Metadata**
+- **Logging Headers**: Capture configured request headers (e.g., `X-Tenant-ID`) into log metadata
+- **Ad-hoc Headers**: Any `x-bf-lh-*` prefixed header is automatically captured into metadata
+- See [Logging Headers](#logging-headers) below for full details
+
+### **Multimodal & Tool Support**
+- **Audio Processing**: Speech synthesis and transcription inputs/outputs
+- **Vision Analysis**: Image URLs and vision model responses
+- **Tool Execution**: Function calling arguments and results
+
+![Multimodal Request Tracing](../../media/ui-multimodal-tracing.png)
+
+---
+
+## How It Works
+
+The logging plugin intercepts all requests flowing through Bifrost using the plugin architecture, ensuring your LLM requests maintain optimal performance:
+
+1. **PreLLMHook**: Captures request metadata (provider, model, input messages, parameters).
+2. **Async Processing**: Logs are written in background goroutines with `sync.Pool` optimization.
+3. **PostLLMHook**: Updates log entry with response data (output, tokens, cost, latency, errors).
+4. **Real-time Updates**: WebSocket broadcasts keep the UI synchronized.
+
+All logging operations are non-blocking, ensuring your LLM requests maintain optimal performance.
+
+---
+
+## Configuration
+
+Configure request tracing to control what gets logged and where it's stored.
+
+<Tabs group="tracing-config">
+
+<Tab title="Using Web UI">
+
+![Tracing Configuration Interface](../../media/ui-tracing-config.png)
+
+1. Navigate to **http://localhost:8080**
+2. Go to **"Settings"**
+3. Toggle **"Enable Logs"** 
+
+</Tab>
+
+<Tab title="Using API">
+
+**Enable/Disable Tracing:**
+```bash
+curl --location 'http://localhost:8080/api/config' \
+--header 'Content-Type: application/json' \
+--method PUT \
+--data '{
+    "client_config": {
+        "enable_logging": true,
+        "disable_content_logging": false,
+        "drop_excess_requests": false,
+        "initial_pool_size": 300,
+         "enforce_auth_on_inference": false,
+        "allow_direct_keys": false,
+        "prometheus_labels": [],
+        "allowed_origins": []
+    }
+}'
+```
+
+**Check Current Configuration:**
+```bash
+curl --location 'http://localhost:8080/api/config'
+```
+
+**Response includes tracing status:**
+```json
+{
+    "client_config": {
+        "enable_logging": true,
+        "disable_content_logging": false,
+        "drop_excess_requests": false
+    },
+    "is_db_connected": true,
+    "is_cache_connected": true, 
+    "is_logs_connected": true
+}
+```
+
+</Tab>
+
+<Tab title="Using config.json">
+
+In your `config.json` file, you can enable logging and configure the log store:
+```json
+{
+    "client": {
+        "enable_logging": true,
+        "disable_content_logging": false,
+        "drop_excess_requests": false,
+        "initial_pool_size": 300,
+        "allow_direct_keys": false
+    },
+    "logs_store": {
+        "enabled": true,
+        "type": "sqlite",
+        "config": {
+            "path": "./logs.db"
+        }
+    }
+}
+```
+- **`enable_logging`**: Master toggle for request tracing.
+- **`disable_content_logging`**: Disable logging of request/response content, but still log usage metadata (latency, cost, token count, etc.).
+- **`logs_store`**: Check [Log Store Options](#log-store-options) for more details.
+
+</Tab>
+
+<Tab title="Using Go SDK">
+
+When using Bifrost as a Go SDK, initialize the logging plugin manually:
+
+```go
+package main
+
+import (
+    "context"
+    bifrost "github.com/maximhq/bifrost/core"
+    "github.com/maximhq/bifrost/core/schemas"
+    "github.com/maximhq/bifrost/framework/logstore"
+    "github.com/maximhq/bifrost/framework/pricing"
+    "github.com/maximhq/bifrost/plugins/logging"
+)
+
+func main() {
+    ctx := context.Background()
+    logger := schemas.NewLogger()
+    
+    // Initialize log store (SQLite)
+    store, err := logstore.NewLogStore(ctx, &logstore.Config{
+        Enabled: true,
+        Type:    logstore.LogStoreTypeSQLite,
+        Config: &logstore.SQLiteConfig{
+            Path: "./logs.db",
+        },
+    }, logger)
+    if err != nil {
+        panic(err)
+    }
+    
+    // Initialize pricing manager (required for cost calculation)
+    pricingManager := pricing.NewPricingManager(logger)
+    
+    // Initialize logging plugin
+    loggingPlugin, err := logging.Init(ctx, logger, store, pricingManager)
+    if err != nil {
+        panic(err)
+    }
+    
+    // Initialize Bifrost with logging plugin
+    client, err := bifrost.Init(ctx, schemas.BifrostConfig{
+        Account: &yourAccount,
+        LLMPlugins: []schemas.LLMPlugin{loggingPlugin},
+    })
+    if err != nil {
+        panic(err)
+    }
+    defer client.Shutdown()
+    
+    // All requests are now logged automatically
+}
+```
+
+</Tab>
+
+</Tabs>
+
+---
+
+## Accessing & Filtering Logs
+
+Retrieve and analyze logs with powerful filtering capabilities via the UI, API, and WebSockets.
+
+![Advanced Log Filtering Interface](/media/ui-log-filtering.gif)
+
+### Web UI
+
+When running the Gateway, access the built-in dashboard at `http://localhost:8080`. The UI provides:
+- Real-time log streaming
+- Advanced filtering and search
+- Detailed request/response inspection
+- Token and cost analytics
+
+### API Endpoints
+
+Query logs programmatically using the `GET` request.
+
+```bash
+curl 'http://localhost:8080/api/logs?' \
+'providers=openai,anthropic&' \
+'models=gpt-4o-mini&' \
+'status=success,error&' \
+'start_time=2024-01-15T00:00:00Z&' \
+'end_time=2024-01-15T23:59:59Z&' \
+'min_latency=1000&' \
+'max_latency=5000&' \
+'min_tokens=10&' \
+'max_tokens=1000&' \
+'min_cost=0.001&' \
+'max_cost=10&' \
+'content_search=python&' \
+'limit=100&' \
+'offset=0'
+```
+**Available Filters:**
+
+| Filter | Description | Example |
+|--------|-------------|---------|
+| `providers` | Filter by AI providers | `openai,anthropic` |
+| `models` | Filter by specific models | `gpt-4o-mini,claude-3-sonnet` |
+| `status` | Request status | `success,error,processing` |
+| `objects` | Request types | `chat.completion,embedding` |
+| `start_time` / `end_time` | Time range (RFC3339) | `2024-01-15T10:00:00Z` |
+| `min_latency` / `max_latency` | Response time (ms) | `1000` to `5000` |
+| `min_tokens` / `max_tokens` | Token usage range | `10` to `1000` |
+| `min_cost` / `max_cost` | Cost range (USD) | `0.001` to `10` |
+| `content_search` | Search in messages | `"error handling"` |
+| `limit` / `offset` | Pagination | `100`, `200` |
+
+**Response Format**
+
+```json
+{
+    "logs": [...],
+    "pagination": {
+        "limit": 100,
+        "offset": 0,
+        "sort_by": "timestamp",
+        "order": "desc"
+    },
+    "stats": {
+        "total_requests": 1234,
+        "success_rate": 0.85,
+        "average_latency": 100,
+        "total_tokens": 10000,
+        "total_cost": 100
+    }
+}
+```
+
+Perfect for analytics, debugging specific issues, or building custom monitoring dashboards.
+
+### WebSocket
+
+Subscribe to real-time log updates for live monitoring:
+
+```javascript
+const ws = new WebSocket('ws://localhost:8080/ws')
+
+ws.onmessage = (event) => {
+  const logUpdate = JSON.parse(event.data)
+  console.log('New log entry:', logUpdate)
+}
+```
+
+---
+
+## Log Store Options
+
+Choose the right storage backend for your scale and requirements.
+
+The logging plugin is **automatically enabled** in Gateway mode with SQLite storage by default. You can configure it to use PostgreSQL by setting the `logs_store` configuration in your `config.json` file.
+
+### **Current Support**
+
+<Tabs group="log-store-types">
+<Tab title="SQLite (Default)">
+
+- **Best for**: Development, small-medium deployments
+- **Performance**: Excellent for read-heavy workloads
+- **Setup**: Zero configuration, single file storage
+- **Limits**: Single-writer, local filesystem only
+
+```json
+{
+    "logs_store": {
+        "enabled": true,
+        "type": "sqlite",
+        "config": {
+            "path": "./logs.db"
+        }
+    }
+}
+```
+
+</Tab>
+<Tab title="PostgreSQL">
+
+- **Best for**: High-volume production deployments
+- **Performance**: Excellent concurrent writes and complex queries
+- **Features**: Advanced indexing, partitioning, replication
+- **Requirement**: PostgreSQL database must be UTF8 encoded (see [PostgreSQL UTF8 Requirement](../../quickstart/gateway/setting-up#postgresql-utf8-requirement))
+
+```json
+{
+    "logs_store": {
+        "enabled": true,
+        "type": "postgres",
+        "config": {
+            "host": "localhost",
+            "port": "5432",
+            "user": "bifrost",
+            "password": "postgres",
+            "db_name": "bifrost",
+            "ssl_mode": "disable"
+        }
+    }
+}
+```
+
+</Tab>
+</Tabs>
+
+### **Planned Support**
+
+- **MySQL**: For traditional MySQL environments.
+- **ClickHouse**: For large-scale analytics and time-series workloads.
+
+---
+
+## Supported Request Types
+
+The logging plugin captures all Bifrost request types:
+
+- Text Completion (streaming and non-streaming)
+- Chat Completion (streaming and non-streaming)
+- Responses (streaming and non-streaming)
+- Embeddings
+- Speech Generation (streaming and non-streaming)
+- Transcription (streaming and non-streaming)
+- Video Generation
+
+---
+
+## Logging Headers
+
+Capture specific HTTP request headers into the **metadata** field of every LLM and MCP log entry. This enables request tracing, tenant identification, and custom debugging without modifying your application code.
+
+### How It Works
+
+There are two ways headers get captured into log metadata:
+
+**1. Configured Logging Headers** — Define a list of header names in the configuration. The logging plugin looks up each configured header (case-insensitive) and stores its value in the metadata.
+
+**2. `x-bf-lh-*` Prefix (Automatic)** — Any request header with the `x-bf-lh-` prefix is automatically captured into metadata with no configuration needed. The prefix is stripped and the remainder becomes the metadata key.
+
+| Request Header | Metadata Key | Metadata Value |
+|----------------|-------------|----------------|
+| `x-bf-lh-tenant-id: acme` | `tenant-id` | `acme` |
+| `x-bf-lh-env: production` | `env` | `production` |
+| `x-bf-lh-region: us-east-1` | `region` | `us-east-1` |
+
+Both methods can be used together — configured headers and `x-bf-lh-*` headers are merged into the same metadata map.
+
+### Configuring Logging Headers
+
+<Tabs group="logging-headers-config">
+<Tab title="Web UI">
+
+1. Navigate to **Config** > **Logging**
+2. Ensure **Enable Logs** is toggled on
+3. Scroll to **Logging Headers**
+
+![Logging Headers Configuration](../../media/ui-logging-headers-setting.png)
+
+4. Enter a comma-separated list of header names (e.g., `X-Tenant-ID, X-Correlation-ID`)
+5. Click **Save Changes**
+
+Changes take effect immediately — no restart required.
+
+</Tab>
+<Tab title="API">
+
+Include `logging_headers` in the `client_config` when updating the configuration:
+
+```bash
+curl -X PUT http://localhost:8080/api/config \
+  -H "Content-Type: application/json" \
+  -d '{
+    "client_config": {
+      "logging_headers": ["X-Tenant-ID", "X-Correlation-ID"]
+    }
+  }'
+```
+
+</Tab>
+<Tab title="config.json">
+
+Add `logging_headers` to the `client` section:
+
+```json
+{
+  "client": {
+    "enable_logging": true,
+    "logging_headers": ["X-Tenant-ID", "X-Correlation-ID"]
+  }
+}
+```
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `logging_headers` | `string[]` | No | List of header names to capture in log metadata. Case-insensitive. No restart required. |
+
+</Tab>
+</Tabs>
+
+### Usage Examples
+
+**Configured headers:**
+
+```bash
+# Config has: logging_headers: ["X-Tenant-ID", "X-Correlation-ID"]
+curl http://localhost:8080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "X-Tenant-ID: tenant-123" \
+  -H "X-Correlation-ID: req-abc-456" \
+  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'
+```
+
+Log metadata: `{"x-tenant-id": "tenant-123", "x-correlation-id": "req-abc-456"}`
+
+**Ad-hoc `x-bf-lh-*` headers (no config needed):**
+
+```bash
+curl http://localhost:8080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "x-bf-lh-env: production" \
+  -H "x-bf-lh-version: v2.1.0" \
+  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'
+```
+
+Log metadata: `{"env": "production", "version": "v2.1.0"}`
+
+### Viewing Metadata in the UI
+
+Metadata is displayed in the log detail view for both LLM and MCP logs as individual key-value entries alongside other request details.
+
+![Log Entry with Metadata](../../media/ui-log-metadata-display.png)
+
+### Combining with Required Headers
+
+[Required headers](../governance/required-headers) and logging headers serve different purposes and can be used together:
+
+| Feature | Purpose | Effect on Request |
+|---------|---------|-------------------|
+| **Required Headers** | Enforce header presence | Rejects request if missing (400) |
+| **Logging Headers** | Capture header values | No effect on request — only logs metadata |
+
+A common pattern is to require a header **and** log it:
+
+```json
+{
+  "client": {
+    "required_headers": ["X-Tenant-ID"],
+    "logging_headers": ["X-Tenant-ID"]
+  }
+}
+```
+
+---
+
+## When to Use
+
+### Built-in Observability
+
+Use the built-in logging plugin for:
+
+- **Local Development**: Quick setup with SQLite, no external dependencies
+- **Self-hosted Deployments**: Full control over your data with PostgreSQL
+- **Simple Use Cases**: Basic monitoring and debugging needs
+- **Privacy-sensitive Workloads**: Keep all logs on your infrastructure
+
+### vs. Maxim Plugin
+
+Switch to the [Maxim plugin](./maxim) for:
+
+- Advanced evaluation and testing workflows
+- Prompt engineering and experimentation
+- Multi-team governance and collaboration
+- Production monitoring with alerts and SLAs
+- Dataset management and annotation pipelines
+
+### vs. OTel Plugin
+
+Switch to the [OTel plugin](./otel) for:
+
+- Integration with existing observability infrastructure
+- Correlation with application traces and metrics
+- Custom collector configurations
+- Compliance and enterprise requirements
+
+---
+
+## Performance
+
+The logging plugin is designed for **zero-impact observability**:
+
+- **Async Operations**: All database writes happen in background goroutines
+- **Sync.Pool**: Reuses memory allocations for LogMessage and UpdateLogData structs
+- **Batch Processing**: Efficiently handles high request volumes
+- **Automatic Cleanup**: Removes stale processing logs every 30 seconds
+
+In benchmarks, the logging plugin adds **< 0.1ms overhead** to request processing time.
+
+---
+
+## Connectors
+
+<CardGroup cols={2}>
+  <Card title="Maxim AI" icon="infinity" href="/features/observability/maxim">
+    Comprehensive LLM observability and evaluation.
+  </Card>
+  <Card title="OpenTelemetry" icon="bolt" href="/features/observability/otel">
+    OTLP integration for distributed tracing.
+  </Card>
+  <Card title="Prometheus" icon="chart-line" href="/features/observability/prometheus">
+    Native Prometheus metrics.
+  </Card>
+  <Card title="Datadog" icon="dog" href="/enterprise/datadog-connector">
+    Native APM, LLM Observability, and metrics.
+  </Card>
+</CardGroup>
+
+---
+
+## Next Steps
+
+- **[Gateway Setup](../../quickstart/gateway/setting-up)** - Get Bifrost running with tracing enabled
+- **[Provider Configuration](../../quickstart/gateway/provider-configuration)** - Configure multiple providers for better insights
+- **[Telemetry](../telemetry)** - Prometheus metrics and dashboards
+- **[Governance](../governance)** - Virtual keys and usage limits
--- a/docs/features/observability/maxim.mdx
+++ b/docs/features/observability/maxim.mdx
@@ -0,0 +1,225 @@
+---
+title: "Maxim AI"
+description: "Integrate Maxim SDK for comprehensive LLM observability, tracing, and evaluation."
+icon: "infinity"
+---
+
+## Overview
+
+Bifrost provides comprehensive LLM observability through the **Maxim plugin**, enabling seamless tracking, evaluation, and analysis of AI interactions. The plugin automatically forwards all LLM requests and responses to Maxim's platform for detailed monitoring and performance insights.
+
+![Maxim Logs](https://github.com/maximhq/bifrost/blob/main/docs/media/maxim-logs.png?raw=true)
+
+---
+
+## Setup
+
+The Maxim plugin enables seamless observability and evaluation of LLM interactions by forwarding inputs/outputs to Maxim's platform:
+
+<Tabs group="setup-method">
+<Tab title="Go SDK">
+
+```go
+package main
+
+import (
+    "context"
+    bifrost "github.com/maximhq/bifrost/core"
+    "github.com/maximhq/bifrost/core/schemas"
+    maxim "github.com/maximhq/bifrost/plugins/maxim"
+)
+
+func main() {
+    // Initialize Maxim plugin
+    maximPlugin, err := maxim.Init(maxim.Config{
+        ApiKey:    "your_maxim_api_key",
+        LogRepoId: "your_default_repo_id", // Optional: fallback repository
+    })
+    if err != nil {
+        panic(err)
+    }
+
+    // Initialize Bifrost with the plugin
+    client, err := bifrost.Init(context.Background(), schemas.BifrostConfig{
+        Account: &yourAccount,
+        LLMPlugins: []schemas.LLMPlugin{maximPlugin},
+    })
+    if err != nil {
+        panic(err)
+    }
+    defer client.Shutdown()
+
+    // All requests will now be traced to Maxim
+}
+```
+
+</Tab>
+<Tab title="config.json">
+
+For HTTP transport, configure via environment variables:
+
+```json
+{
+  "plugins": [
+    {
+      "enabled": true,
+      "name": "maxim",
+      "config": {        
+        "api_key": "your_maxim_api_key",
+        "log_repo_id": "your_default_repo_id"
+      }
+    }
+  ]
+}
+```
+
+</Tab>
+</Tabs>
+
+## Configuration
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `ApiKey` | `string` | ✅ Yes | Your Maxim API key for authentication |
+| `LogRepoId` | `string` | ❌ No | Default log repository ID (can be overridden per request) |
+
+## Repository Selection
+
+The plugin uses repository selection with the following priority:
+
+1. **Header/Context Repository** - Highest priority
+2. **Default Repository** (from plugin config) - Fallback
+3. **Skip Logging** - If neither is available
+
+<Tabs group="repository-selection"  >
+<Tab title="Go SDK">
+
+```go
+ctx := context.Background()
+
+// Use specific repository for this request
+ctx = context.WithValue(ctx, maxim.LogRepoIDKey, "project-specific-repo")
+```
+
+</Tab>
+<Tab title="Gateway"> 
+
+```bash
+# Use default repository (from config)
+curl -X POST http://localhost:8080/v1/chat/completions \
+  -d '{"model": "gpt-4", "messages": [...]}'
+
+# Override with specific repository
+curl -X POST http://localhost:8080/v1/chat/completions \
+  -H "x-bf-maxim-log-repo-id: project-specific-repo" \
+  -d '{"model": "gpt-4", "messages": [...]}'
+```
+
+</Tab>
+</Tabs>
+
+
+## Custom Trace Management
+
+### Trace Propagation
+
+The plugin supports custom session, trace, and generation IDs for advanced tracing scenarios:
+
+<Tabs group="trace-propagation">
+<Tab title="Go SDK">
+```go
+ctx := context.Background()
+
+// Prefer typed keys from the Maxim plugin
+ctx = context.WithValue(ctx, maxim.TraceIDKey, "custom-trace-123")
+ctx = context.WithValue(ctx, maxim.GenerationIDKey, "custom-gen-456")
+ctx = context.WithValue(ctx, maxim.SessionIDKey, "user-session-789")
+
+// Optionally set human-friendly names
+ctx = context.WithValue(ctx, maxim.TraceNameKey, "checkout-flow")
+ctx = context.WithValue(ctx, maxim.GenerationNameKey, "rerank-step")
+```
+</Tab>
+<Tab title="Gateway">
+```bash
+curl -X POST http://localhost:8080/v1/chat/completions \
+  -H "x-bf-maxim-trace-id: custom-trace-123" \
+  -H "x-bf-maxim-generation-id: custom-gen-456" \
+  -H "x-bf-maxim-session-id: user-session-789" \
+  -H "x-bf-maxim-trace-name: checkout-flow" \
+  -H "x-bf-maxim-generation-name: rerank-step" \
+  -d '{"model": "gpt-4", "messages": [...]}'
+```
+</Tab>
+</Tabs>
+
+### Custom Tags
+
+You can add custom tags to traces for enhanced filtering and analytics:
+
+<Tabs group="custom-tags">
+<Tab title="Go SDK">
+
+```go
+ctx := context.Background()
+
+// Pass arbitrary tag key-values via context map
+tags := map[string]string{
+    "environment":  "production",
+    "user-id":      "user-123",
+    "feature-flag": "new-ui",
+}
+ctx = context.WithValue(ctx, maxim.TagsKey, tags)
+```
+
+</Tab>
+<Tab title="Gateway">
+
+```bash
+curl -X POST http://localhost:8080/v1/chat/completions \
+  -H "x-bf-maxim-environment: production" \
+  -H "x-bf-maxim-user-id: user-123" \
+  -H "x-bf-maxim-feature-flag: new-ui" \
+  -d '{"model": "gpt-4", "messages": [...]}'
+```
+
+Reserved keys are `session-id`, `trace-id`, `trace-name`, `generation-id`, `generation-name`, `log-repo-id`. All other `x-bf-maxim-*` headers are treated as tags.
+
+</Tab>
+</Tabs>
+
+## Supported Request Types
+
+The plugin supports the following Bifrost request types:
+
+- Text Completion
+- Chat Completion
+
+## Monitoring & Analytics
+
+Once configured, monitor your AI apps in the [Maxim Dashboard](https://getmaxim.ai/). Maxim is an end-to-end evaluation & observability platform built to help teams ship AI agents faster while maintaining high quality.
+
+* **Experiment / Prompt Engineering**
+  Playground++ for prompt design: versioning, comparison (A/B), visual chaining, low-code tooling.
+
+* **Simulation & Evaluation**
+  Test agents over thousands of scenarios, both automated (statistical, programmatic) and human-in-the-loop for edge cases. Custom and off-the-shelf evaluators.
+
+* **Observability / Monitoring**
+  Real-time traces, logging, debugging of multi-agent workflows, live issue tracking, alerts when quality or performance degrade.
+
+* **Data Engine & Dataset Management**
+  Support for multi-modal datasets, import & continuous curation, feedback/annotation pipelines, data splitting for experiments.
+
+* **Governance, Security & Compliance**
+  Features like SOC 2 Type II compliance, enterprise security controls, permissions, auditability.
+
+* **Alerts & SLAs**: Threshold-based notifications to keep quality and latency in guardrails
+
+## Next Steps
+
+Now that you have observability set up with the Maxim plugin, explore these related topics:
+
+- **[Tracing](./default)** - Deep-dive into request/response logging and correlation
+- **[Telemetry](../telemetry)** - Prometheus metrics, dashboards, and alerting
+- **[Governance](../governance/virtual-keys)** - Virtual keys, per-team controls, and usage limits
--- a/docs/features/observability/otel.mdx
+++ b/docs/features/observability/otel.mdx
@@ -0,0 +1,978 @@
+---
+title: "OpenTelemetry (OTel)"
+description: "Integrate with OpenTelemetry collectors for enterprise observability and distributed tracing"
+icon: "bolt"
+---
+
+## Overview
+
+<Frame>
+  <img src="/media/grafana-otel-traces.png" alt="Okta Applications page" />
+</Frame>
+
+The **OTel plugin** enables seamless integration with OpenTelemetry Protocol (OTLP) collectors, allowing you to send LLM traces to your existing observability infrastructure. Connect Bifrost to platforms like Grafana Cloud, Datadog, New Relic, Honeycomb, or self-hosted collectors.
+
+All traces follow OpenTelemetry semantic conventions, making it easy to correlate LLM operations with your broader application telemetry.
+
+---
+
+## Supported Trace Formats
+
+The plugin supports multiple trace formats to match your observability platform:
+
+| Format | Description | Use Case | Status |
+|--------|-------------|----------|----------|
+| `genai_extension` | OpenTelemetry GenAI semantic conventions | **Recommended** - Standard OTel format with rich LLM metadata | ✅ Released |
+| `vercel` | Vercel AI SDK format | For Vercel AI SDK compatibility | 🔄 Coming soon |
+| `open_inference` | Arize OpenInference format | For Arize Phoenix and OpenInference tools | 🔄 Coming soon | 
+
+---
+
+## Configuration
+
+### Required Fields
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `service_name` | `string` | ❌ No | Service name to be used for tracing, defaults to `bifrost` |
+| `collector_url` | `string` | ✅ Yes | OTLP collector endpoint URL |
+| `trace_type` | `string` | ✅ Yes | One of: `genai_extension`, `vercel`, `open_inference` |
+| `protocol` | `string` | ✅ Yes | Transport protocol: `http` or `grpc` |
+| `headers` | `object` | ❌ No | Custom headers for authentication (supports `env.VAR_NAME`) |
+| `tls_ca_cert` | `string` | ❌ No | File path to client CA certificate for TLS. Optional. Works with both gRPC and HTTP protocol |
+
+### Environment Variable Substitution
+
+Headers support environment variable substitution using the `env.` prefix:
+
+```json
+{
+  "headers": {
+    "Authorization": "env.OTEL_API_KEY",
+    "X-Custom-Header": "env.CUSTOM_VALUE"
+  }
+}
+```
+
+### Resource Attributes
+
+The plugin supports the standard `OTEL_RESOURCE_ATTRIBUTES` environment variable. Any attributes defined in this variable will be automatically attached to every span emitted by the plugin.
+
+```bash
+export OTEL_RESOURCE_ATTRIBUTES="deployment.environment=production,service.version=1.2.3,team.name=platform"
+```
+
+These attributes appear as resource-level metadata on all traces:
+
+```json
+{
+  "resource": {
+    "attributes": {
+      "service.name": "bifrost",
+      "deployment.environment": "production",
+      "service.version": "1.2.3",
+      "team.name": "platform"
+    }
+  }
+}
+```
+
+This is useful for:
+- **Environment identification** - Distinguish between production, staging, and development traces
+- **Service versioning** - Track which version of your service generated the trace
+- **Team attribution** - Tag traces with team ownership for filtering and alerting
+- **Custom metadata** - Add any key-value pairs relevant to your observability needs
+
+---
+
+## Setup
+
+<Tabs group="setup-method">
+<Tab title="UI">
+![Otel UI setup](../../media/otel-ui-setup.png)
+</Tab>
+<Tab title="Go SDK">
+
+```go
+package main
+
+import (
+    "context"
+    bifrost "github.com/maximhq/bifrost/core"
+    "github.com/maximhq/bifrost/core/schemas"
+    "github.com/maximhq/bifrost/framework/pricing"
+    otel "github.com/maximhq/bifrost/plugins/otel"
+)
+
+func main() {
+    ctx := context.Background()
+    logger := schemas.NewLogger()
+    
+    // Initialize pricing manager (required for cost calculation)
+    pricingManager := pricing.NewPricingManager(logger)
+    
+    // Initialize OTel plugin
+    otelPlugin, err := otel.Init(ctx, &otel.Config{
+        ServiceName:  "bifrost",
+        CollectorURL: "http://localhost:4318",
+        TraceType:    otel.TraceTypeGenAIExtension,
+        Protocol:     otel.ProtocolHTTP,
+        Headers: map[string]string{
+            "Authorization": "env.OTEL_API_KEY",
+        },
+    }, logger, pricingManager)
+    if err != nil {
+        panic(err)
+    }
+    
+    // Initialize Bifrost with the plugin
+    client, err := bifrost.Init(ctx, schemas.BifrostConfig{
+        Account: &yourAccount,
+        LLMPlugins: []schemas.LLMPlugin{otelPlugin},
+    })
+    if err != nil {
+        panic(err)
+    }
+    defer client.Shutdown()
+    
+    // All requests are now traced to OTel collector
+}
+```
+
+</Tab>
+<Tab title="config.json">
+
+For Gateway mode, configure via `config.json`:
+
+```json
+{
+  "plugins": [
+    {
+      "enabled": true,
+      "name": "otel",
+      "config": {
+        "service_name": "bifrost",
+        "collector_url": "http://localhost:4318",
+        "trace_type": "genai_extension",
+        "protocol": "http",
+        "headers": {
+          "Authorization": "env.OTEL_API_KEY"
+        }
+      }
+    }
+  ]
+}
+```
+
+If you need to connect to an OTEL collector that requires TLS, configure `tls_ca_cert`:
+
+```json
+{
+  "plugins": [
+    {
+      "enabled": true,
+      "name": "otel",
+      "config": {
+        "service_name": "bifrost",
+        "collector_url": "localhost:4317",
+        "trace_type": "genai_extension",
+        "protocol": "grpc",
+        "tls_ca_cert": "/path/to/your/ca.cert",
+        "headers": {
+          "Authorization": "env.OTEL_API_KEY"
+        }
+      }
+    }
+  ]
+}
+```
+
+</Tab>
+</Tabs>
+
+---
+
+## Quick Start with Docker
+
+Get started quickly with a complete observability stack using the included Docker Compose configuration:
+
+```yml
+services:
+  otel-collector:
+    image: otel/opentelemetry-collector-contrib:latest
+    container_name: otel-collector
+    command: ["--config=/etc/otelcol/config.yaml"]
+    configs:
+      - source: otel-collector-config
+        target: /etc/otelcol/config.yaml
+    ports:
+      - "4317:4317"   # OTLP gRPC
+      - "4318:4318"   # OTLP HTTP
+      - "8888:8888"   # Collector /metrics
+      - "9464:9464"   # Prometheus scrape endpoint
+      - "13133:13133" # Health check
+      - "1777:1777"   # pprof
+      - "55679:55679" # zpages
+    restart: unless-stopped
+    depends_on:
+      - tempo
+
+  tempo:
+    image: grafana/tempo:latest
+    container_name: tempo
+    command: [ "-config.file=/etc/tempo.yaml" ]
+    configs:
+      - source: tempo-config
+        target: /etc/tempo.yaml
+    ports:
+      - "3200:3200"   # tempo HTTP API
+    expose:
+      - "4317"        # OTLP gRPC (internal)
+    volumes:
+      - tempo-data:/var/tempo
+    restart: unless-stopped
+
+  prometheus:
+    image: prom/prometheus:latest
+    container_name: prometheus
+    depends_on:
+      - otel-collector
+    command:
+      - "--config.file=/etc/prometheus/prometheus.yml"
+      - "--storage.tsdb.path=/prometheus"
+      - "--web.console.libraries=/usr/share/prometheus/console_libraries"
+      - "--web.console.templates=/usr/share/prometheus/consoles"
+      - "--web.enable-remote-write-receiver"
+    ports:
+      - "9090:9090"
+    volumes:
+      - prometheus-data:/prometheus
+    configs:
+      - source: prometheus-config
+        target: /etc/prometheus/prometheus.yml
+    restart: unless-stopped
+
+  grafana:
+    image: grafana/grafana:latest
+    container_name: grafana
+    depends_on:
+      - prometheus
+      - tempo
+    environment:
+      GF_SECURITY_ADMIN_USER: admin
+      GF_SECURITY_ADMIN_PASSWORD: admin
+      GF_AUTH_ANONYMOUS_ENABLED: "true"
+      GF_AUTH_ANONYMOUS_ORG_ROLE: Viewer
+      GF_PLUGINS_ALLOW_LOADING_UNSIGNED_PLUGINS: "grafana-pyroscope-app,grafana-exploretraces-app,grafana-metricsdrilldown-app"
+      GF_PLUGINS_ENABLE_ALPHA: "true"
+      GF_INSTALL_PLUGINS: ""
+    ports:
+      - "4000:3000"
+    volumes:
+      - grafana-data:/var/lib/grafana
+    configs:
+      - source: grafana-datasources
+        target: /etc/grafana/provisioning/datasources/datasources.yml
+    restart: unless-stopped
+
+configs:
+  otel-collector-config:
+    content: |
+      receivers:
+        otlp:
+          protocols:
+            grpc:
+              endpoint: 0.0.0.0:4317
+            http:
+              endpoint: 0.0.0.0:4318
+
+      processors:
+        batch:
+
+      exporters:
+        prometheus:
+          endpoint: 0.0.0.0:9464
+          namespace: otel
+          const_labels:
+            source: otelcol
+            
+        otlp/tempo:
+          endpoint: tempo:4317
+          tls:
+            insecure: true
+            
+        debug:
+          verbosity: detailed
+
+      extensions:
+        health_check:
+          endpoint: 0.0.0.0:13133
+        pprof:
+          endpoint: 0.0.0.0:1777
+        zpages:
+          endpoint: 0.0.0.0:55679
+
+      service:
+        extensions: [health_check, pprof, zpages]
+        telemetry:
+          logs:
+            level: debug
+          metrics:
+            level: detailed
+        pipelines:
+          traces:
+            receivers: [otlp]
+            processors: [batch]
+            exporters: [debug, otlp/tempo]
+          metrics:
+            receivers: [otlp]
+            processors: [batch]
+            exporters: [debug, prometheus]
+          logs:
+            receivers: [otlp]
+            processors: [batch]
+            exporters: [debug]
+
+  tempo-config:
+    content: |
+      server:
+        http_listen_port: 3200
+        log_level: info
+
+      distributor:
+        receivers:
+          otlp:
+            protocols:
+              grpc:
+                endpoint: 0.0.0.0:4317
+
+      ingester:
+        max_block_duration: 5m
+        trace_idle_period: 10s
+
+      compactor:
+        compaction:
+          block_retention: 1h
+
+      storage:
+        trace:
+          backend: local
+          wal:
+            path: /var/tempo/wal
+          local:
+            path: /var/tempo/blocks
+
+      metrics_generator:
+        registry:
+          external_labels:
+            source: tempo
+        storage:
+          path: /var/tempo/generator/wal
+          remote_write:
+            - url: http://prometheus:9090/api/v1/write
+
+  prometheus-config:
+    content: |
+      global:
+        scrape_interval: 15s
+      scrape_configs:
+        - job_name: "otelcol-internal"
+          static_configs:
+            - targets: ["otel-collector:8888"]
+        - job_name: "otelcol-exporter"
+          static_configs:
+            - targets: ["otel-collector:9464"]
+        - job_name: "tempo"
+          static_configs:
+            - targets: ["tempo:3200"]
+
+  grafana-datasources:
+    content: |
+      apiVersion: 1
+      datasources:
+        - name: Prometheus
+          uid: prometheus
+          type: prometheus
+          access: proxy
+          orgId: 1
+          url: http://prometheus:9090
+          isDefault: true
+          editable: true
+        - name: Tempo
+          uid: tempo
+          type: tempo
+          access: proxy
+          orgId: 1
+          url: http://tempo:3200
+          editable: true
+          jsonData:
+            tracesToMetrics:
+              datasourceUid: prometheus
+            nodeGraph:
+              enabled: true
+
+volumes:
+  prometheus-data:
+  grafana-data:
+  tempo-data:
+```
+
+This launches:
+- **OTel Collector** - Receives traces on ports 4317 (gRPC) and 4318 (HTTP)
+- **Tempo** - Distributed tracing backend
+- **Prometheus** - Metrics collection
+- **Grafana** - Visualization dashboard
+
+Access Grafana at `http://localhost:3000` (default credentials: admin/admin)
+
+<Frame>
+  <img src="/media/grafana-otel-traces.png" alt="Okta Applications page" />
+</Frame>
+
+---
+
+## Popular Platform Integrations
+
+<Tabs group="platforms">
+<Tab title="Grafana Cloud">
+
+```json
+{
+  "plugins": [
+    {
+      "enabled": true,
+      "name": "otel",
+      "config": {
+        "service_name": "bifrost",
+        "collector_url": "https://otlp-gateway-prod-us-central-0.grafana.net/otlp",
+        "trace_type": "genai_extension",
+        "protocol": "http",
+        "headers": {
+          "Authorization": "env.GRAFANA_CLOUD_API_KEY"
+        }
+      }
+    }
+  ]
+}
+```
+
+Set environment variable:
+```bash
+export GRAFANA_CLOUD_API_KEY="Basic <your-base64-encoded-token>"
+```
+
+</Tab>
+<Tab title="Datadog">
+
+```json
+{
+  "plugins": [
+    {
+      "enabled": true,
+      "name": "otel",
+      "config": {
+        "service_name": "bifrost",
+        "collector_url": "https://trace.agent.datadoghq.com",
+        "trace_type": "genai_extension",
+        "protocol": "http",
+        "headers": {
+          "DD-API-KEY": "env.DATADOG_API_KEY"
+        }
+      }
+    }
+  ]
+}
+```
+
+Set environment variable:
+```bash
+export DATADOG_API_KEY="your-datadog-api-key"
+```
+
+</Tab>
+<Tab title="New Relic">
+
+```json
+{
+  "plugins": [
+    {
+      "enabled": true,
+      "name": "otel",
+      "config": {
+        "service_name": "bifrost",
+        "collector_url": "https://otlp.nr-data.net:4318",
+        "trace_type": "genai_extension",
+        "protocol": "http",
+        "headers": {
+          "api-key": "env.NEW_RELIC_LICENSE_KEY"
+        }
+      }
+    }
+  ]
+}
+```
+
+Set environment variable:
+```bash
+export NEW_RELIC_LICENSE_KEY="your-license-key"
+```
+
+</Tab>
+<Tab title="Honeycomb">
+
+```json
+{
+  "plugins": [
+    {
+      "enabled": true,
+      "name": "otel",
+      "config": {
+        "service_name": "bifrost",
+        "collector_url": "https://api.honeycomb.io",
+        "trace_type": "genai_extension",
+        "protocol": "http",
+        "headers": {
+          "x-honeycomb-team": "env.HONEYCOMB_API_KEY",
+          "x-honeycomb-dataset": "bifrost-traces"
+        }
+      }
+    }
+  ]
+}
+```
+
+Set environment variable:
+```bash
+export HONEYCOMB_API_KEY="your-api-key"
+```
+
+</Tab>
+<Tab title="Langfuse">
+
+[Langfuse](https://langfuse.com) is an open-source LLM observability platform that accepts OpenTelemetry traces via its OTLP endpoint.
+
+<Tabs>
+<Tab title="UI">
+
+Configure the OTel plugin with the following settings:
+
+| Field | Value |
+|-------|-------|
+| **Collector URL** | `https://cloud.langfuse.com/api/public/otel` (EU) or `https://us.cloud.langfuse.com/api/public/otel` (US) |
+| **Trace Type** | `genai_extension` |
+| **Protocol** | `http` (required - Langfuse does not support gRPC) |
+| **Headers** | `Authorization`: `env.LANGFUSE_AUTH` |
+
+</Tab>
+<Tab title="config.json">
+
+```json
+{
+  "plugins": [
+    {
+      "enabled": true,
+      "name": "otel",
+      "config": {
+        "service_name": "bifrost",
+        "collector_url": "https://cloud.langfuse.com/api/public/otel",
+        "trace_type": "genai_extension",
+        "protocol": "http",
+        "headers": {
+          "Authorization": "env.LANGFUSE_AUTH"
+        }
+      }
+    }
+  ]
+}
+```
+
+For US region, use `https://us.cloud.langfuse.com/api/public/otel` instead.
+
+</Tab>
+</Tabs>
+
+Set up the environment variable with your Langfuse API keys:
+
+```bash
+# Generate base64 auth string from your Langfuse API keys
+export LANGFUSE_AUTH="Basic $(echo -n 'pk-lf-xxx:sk-lf-xxx' | base64)"
+```
+
+Replace `pk-lf-xxx` and `sk-lf-xxx` with your Langfuse public and secret keys from your project settings.
+
+<Note>
+Langfuse only supports HTTP protocol. Do not use gRPC.
+</Note>
+
+See the [Langfuse OpenTelemetry documentation](https://langfuse.com/integrations/native/opentelemetry) for more details.
+
+</Tab>
+<Tab title="Self-Hosted">
+
+Use the included Docker Compose stack or point to your own collector:
+
+```json
+{
+  "plugins": [
+    {
+      "enabled": true,
+      "name": "otel",
+      "config": {
+        "service_name": "bifrost",
+        "collector_url": "http://your-collector:4318",
+        "trace_type": "genai_extension",
+        "protocol": "http"
+      }
+    }
+  ]
+}
+```
+
+</Tab>
+</Tabs>
+
+---
+
+## Captured Data
+
+Each trace includes comprehensive LLM operation metadata following OpenTelemetry semantic conventions:
+
+### Span Attributes
+
+- **Span Name**: Based on request type (`gen_ai.chat`, `gen_ai.text`, `gen_ai.embedding`, etc.)
+- **Service Info**: `service.name=bifrost`, `service.version`
+- **Provider & Model**: `gen_ai.provider.name`, `gen_ai.request.model`
+
+### Request Parameters
+
+- Temperature, max_tokens, top_p, stop sequences
+- Presence/frequency penalties
+- Tool configurations and parallel tool calls
+- Custom parameters via `ExtraParams`
+
+### Input/Output Data
+
+- Complete chat history with role-based messages
+- Prompt text for completions
+- Response content with role attribution
+- Tool calls and results
+
+### Performance Metrics
+
+- Token usage (prompt, completion, total)
+- Cost calculations in dollars
+- Latency and timing (start/end timestamps)
+- Error details with status codes
+
+### Example Span
+
+```json
+{
+  "name": "gen_ai.chat",
+  "attributes": {
+    "gen_ai.provider.name": "openai",
+    "gen_ai.request.model": "gpt-4",
+    "gen_ai.request.temperature": 0.7,
+    "gen_ai.request.max_tokens": 1000,
+    "gen_ai.usage.prompt_tokens": 45,
+    "gen_ai.usage.completion_tokens": 128,
+    "gen_ai.usage.total_tokens": 173,
+    "gen_ai.usage.cost": 0.0052
+  }
+}
+```
+
+<Frame>
+  <img src="/media/grafana-otel-span-details.png" alt="Okta Applications page" />
+</Frame>
+
+---
+
+## Supported Request Types
+
+The OTel plugin captures all Bifrost request types:
+
+- **Chat Completion** (streaming and non-streaming) → `gen_ai.chat`
+- **Text Completion** (streaming and non-streaming) → `gen_ai.text`
+- **Embeddings** → `gen_ai.embedding`
+- **Speech Generation** (streaming and non-streaming) → `gen_ai.speech`
+- **Transcription** (streaming and non-streaming) → `gen_ai.transcription`
+- **Responses API** → `gen_ai.responses`
+
+---
+
+## Protocol Support
+
+### HTTP (OTLP/HTTP)
+
+Uses HTTP/1.1 or HTTP/2 with JSON or Protobuf encoding:
+
+```json
+{
+  "collector_url": "http://localhost:4318",
+  "protocol": "http"
+}
+```
+
+Default port: **4318**
+
+### gRPC (OTLP/gRPC)
+
+Uses gRPC with Protobuf encoding for lower latency:
+
+```json
+{
+  "collector_url": "localhost:4317",
+  "protocol": "grpc"
+}
+```
+
+Default port: **4317**
+
+---
+
+## Metrics Push (Cluster Mode)
+
+<Note>
+**Multi-node deployments**: If you are running multiple Bifrost nodes, use push-based metrics for accurate aggregation. Pull-based `/metrics` scraping may miss nodes behind a load balancer.
+</Note>
+
+The OTel plugin supports **push-based metrics export** via OTLP, which is essential for multi-node cluster deployments. Instead of relying on Prometheus scraping each node's `/metrics` endpoint (which can miss nodes behind a load balancer), all nodes actively push metrics to a central OTEL Collector.
+
+### Configuration
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `metrics_enabled` | `boolean` | ❌ No | Enable push-based metrics export (default: `false`) |
+| `metrics_endpoint` | `string` | ✅ Yes (if enabled) | OTLP metrics endpoint URL |
+| `metrics_push_interval` | `integer` | ❌ No | Push interval in seconds (default: `15`, range: 1-300) |
+
+### Example Configuration
+
+<Tabs group="metrics-config">
+<Tab title="HTTP Protocol">
+
+```json
+{
+  "plugins": [
+    {
+      "enabled": true,
+      "name": "otel",
+      "config": {
+        "service_name": "bifrost",
+        "collector_url": "http://otel-collector:4318/v1/traces",
+        "trace_type": "genai_extension",
+        "protocol": "http",
+        "metrics_enabled": true,
+        "metrics_endpoint": "http://otel-collector:4318/v1/metrics",
+        "metrics_push_interval": 15
+      }
+    }
+  ]
+}
+```
+
+</Tab>
+<Tab title="gRPC Protocol">
+
+```json
+{
+  "plugins": [
+    {
+      "enabled": true,
+      "name": "otel",
+      "config": {
+        "service_name": "bifrost",
+        "collector_url": "otel-collector:4317",
+        "trace_type": "genai_extension",
+        "protocol": "grpc",
+        "metrics_enabled": true,
+        "metrics_endpoint": "otel-collector:4317",
+        "metrics_push_interval": 15
+      }
+    }
+  ]
+}
+```
+
+</Tab>
+</Tabs>
+
+### Pushed Metrics
+
+These are the same **Prometheus-style metrics** from the telemetry plugin, pushed via OTLP protocol to a central collector:
+
+| Metric | Type | Description |
+|--------|------|-------------|
+| `bifrost_upstream_requests_total` | Counter | Total requests to upstream providers |
+| `bifrost_success_requests_total` | Counter | Successful upstream requests |
+| `bifrost_error_requests_total` | Counter | Error requests with status code labels |
+| `bifrost_input_tokens_total` | Counter | Total input tokens |
+| `bifrost_output_tokens_total` | Counter | Total output tokens |
+| `bifrost_cache_hits_total` | Counter | Cache hits |
+| `bifrost_cost_total` | Counter | Total cost in USD |
+| `bifrost_upstream_latency_seconds` | Histogram | Upstream request latency |
+| `bifrost_stream_first_token_latency_seconds` | Histogram | Time to first token |
+| `bifrost_stream_inter_token_latency_seconds` | Histogram | Inter-token latency |
+| `http_requests_total` | Counter | Total HTTP requests |
+| `http_request_duration_seconds` | Histogram | HTTP request duration |
+
+### OTEL Collector Configuration
+
+Configure your OTEL Collector to receive OTLP metrics and export to your preferred backend (Datadog, Prometheus, etc.):
+
+```yaml
+receivers:
+  otlp:
+    protocols:
+      grpc:
+        endpoint: 0.0.0.0:4317
+      http:
+        endpoint: 0.0.0.0:4318
+
+processors:
+  batch:
+    timeout: 10s
+    send_batch_size: 1000
+
+exporters:
+  # For Datadog
+  datadog:
+    api:
+      key: ${DD_API_KEY}
+  
+  # Or for Prometheus remote write
+  prometheusremotewrite:
+    endpoint: "http://prometheus:9090/api/v1/write"
+
+service:
+  pipelines:
+    metrics:
+      receivers: [otlp]
+      processors: [batch]
+      exporters: [datadog]  # or prometheusremotewrite
+```
+
+### Why Push vs Pull?
+
+| Aspect | Pull (`/metrics` scrape) | Push (OTEL metrics) |
+|--------|--------------------------|---------------------|
+| Load balancer | May miss nodes | All nodes push |
+| Service discovery | Required | Not required |
+| Scraper configuration | Per-node endpoints | Single collector |
+| Cluster aggregation | Query-side `sum()` | Collector handles it |
+
+For **single-node deployments**, pull-based `/metrics` scraping works well. For **multi-node clusters**, push-based metrics ensures all nodes are captured.
+
+---
+
+## Advanced Features
+
+### Automatic Span Management
+
+- Spans are tracked with a **20-minute TTL** using an efficient sync.Map implementation
+- Automatic cleanup prevents memory leaks for long-running processes
+- Handles streaming requests with accumulator for chunked responses
+
+### Async Emission
+
+All span emissions happen asynchronously in background goroutines:
+
+```go
+// Zero impact on request latency
+go func() {
+    p.client.Emit(ctx, spans)
+}()
+```
+
+### Streaming Support
+
+The plugin accumulates streaming chunks and emits a single complete span when the stream finishes, providing accurate token counts and costs.
+
+### Environment Variable Security
+
+Sensitive credentials never appear in config files:
+
+```json
+{
+  "headers": {
+    "Authorization": "env.OTEL_API_KEY"
+  }
+}
+```
+
+The plugin reads `OTEL_API_KEY` from the environment at runtime.
+
+---
+
+## When to Use
+
+### OTel Plugin
+
+Choose the OTel plugin when you:
+
+- Have existing OpenTelemetry infrastructure
+- Need to correlate LLM traces with application traces
+- Require compliance with enterprise observability standards
+- Want vendor flexibility (switch backends without code changes)
+- Need multi-service distributed tracing
+
+### vs. Built-in Observability
+
+Use [Built-in Observability](./default) for:
+
+- Local development and testing
+- Simple self-hosted deployments
+- No external dependencies
+- Direct database access to logs
+
+### vs. Maxim Plugin
+
+Use the [Maxim Plugin](./maxim) for:
+
+- Advanced LLM evaluation and testing
+- Prompt engineering and experimentation
+- Team collaboration and governance
+- Production monitoring with alerts
+- Dataset management and curation
+
+---
+
+## Troubleshooting
+
+### Connection Issues
+
+Verify collector is reachable:
+
+```bash
+# Test HTTP endpoint
+curl -v http://localhost:4318/v1/traces
+
+# Test gRPC endpoint (requires grpcurl)
+grpcurl -plaintext localhost:4317 list
+```
+
+### Missing Traces
+
+Check Bifrost logs for emission errors:
+
+```bash
+# Enable debug logging
+bifrost-http --log-level debug
+```
+
+### Authentication Failures
+
+Verify environment variables are set:
+
+```bash
+echo $OTEL_API_KEY
+```
+
+---
+
+## Next Steps
+
+- **[Built-in Observability](./default)** - Local logging for development
+- **[Maxim Plugin](./maxim)** - Advanced LLM evaluation and monitoring
+- **[Telemetry](../telemetry)** - Prometheus metrics and dashboards
--- a/docs/features/observability/prometheus.mdx
+++ b/docs/features/observability/prometheus.mdx
@@ -0,0 +1,306 @@
+---
+title: "Prometheus"
+description: "Monitor Bifrost metrics with Prometheus scraping or Push Gateway for multi-node deployments"
+icon: "chart-line"
+---
+
+## Overview
+
+Bifrost exposes Prometheus metrics via two methods:
+
+1. **Pull-based (Scraping)**: Traditional `/metrics` endpoint that Prometheus can scrape
+2. **Push-based (Push Gateway)**: Push metrics to a Prometheus Push Gateway for cluster deployments
+
+<Note>
+  **For multi-node deployments**: Use the Push Gateway method to ensure accurate metric aggregation. Traditional scraping may miss nodes behind load balancers.
+</Note>
+
+---
+
+## Pull-based Scraping
+
+Bifrost automatically exposes a `/metrics` endpoint when the telemetry plugin is enabled (enabled by default). No additional configuration is needed.
+
+<Info>
+  When Bifrost's authentication is enabled (`auth_config.is_enabled = true`), the `/metrics` endpoint requires Basic auth credentials. You must include the same `admin_username` and `admin_password` from your `auth_config` in the Prometheus scrape configuration. Without this, Prometheus will receive `401 Unauthorized` responses and scraping will silently fail.
+</Info>
+
+### Prometheus Configuration
+
+Add Bifrost to your Prometheus `prometheus.yml`:
+
+```yaml
+scrape_configs:
+  - job_name: 'bifrost'
+    static_configs:
+      - targets: ['bifrost-host:8080']
+    scrape_interval: 15s
+```
+
+If Bifrost authentication is enabled, add `basic_auth` to your scrape config:
+
+```yaml
+scrape_configs:
+  - job_name: 'bifrost'
+    static_configs:
+      - targets: ['bifrost-host:8080']
+    scrape_interval: 15s
+    basic_auth:
+      username: '<admin_username>'
+      password: '<admin_password>'
+```
+
+### Endpoint
+
+```
+GET /metrics
+```
+
+Returns metrics in Prometheus exposition format.
+
+---
+
+## Push-based (Push Gateway)
+
+For multi-node cluster deployments, the Prometheus plugin pushes metrics to a [Prometheus Push Gateway](https://github.com/prometheus/pushgateway). This ensures all nodes' metrics are captured regardless of load balancer routing.
+
+### Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `push_gateway_url` | `string` | ✅ Yes | - | Push Gateway URL (e.g., `http://pushgateway:9091`) |
+| `job_name` | `string` | ❌ No | `bifrost` | Job label for pushed metrics |
+| `instance_id` | `string` | ❌ No | hostname | Instance identifier for metric grouping |
+| `push_interval` | `integer` | ❌ No | `15` | Push interval in seconds (1-300) |
+| `basic_auth` | `object` | ❌ No | - | Basic auth credentials |
+
+### Basic Auth Configuration
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `username` | `string` | ✅ Yes | Basic auth username |
+| `password` | `string` | ✅ Yes | Basic auth password |
+
+---
+
+## Setup
+
+<Tabs group="setup-method">
+<Tab title="UI">
+
+1. Navigate to **Observability** → **Prometheus** in the Bifrost UI
+2. The `/metrics` endpoint is shown at the top for scraping configuration
+3. To enable Push Gateway:
+   - Enter the **Push Gateway URL**
+   - Configure **Job Name** and **Push Interval** as needed
+   - Optionally set a custom **Instance ID**
+   - Enable **Basic Authentication** if required
+   - Toggle **Enable Push Gateway** on
+   - Click **Save Prometheus Configuration**
+
+</Tab>
+<Tab title="Config File">
+
+```json
+{
+  "plugins": [
+    {
+      "name": "telemetry",
+      "enabled": true,
+      "config": {
+        "push_gateway": {
+          "enabled": true,
+          "push_gateway_url": "http://pushgateway:9091",
+          "job_name": "bifrost",
+          "push_interval": 15
+        }
+      }
+    }
+  ]
+}
+```
+
+### With Basic Auth
+
+```json
+{
+  "plugins": [
+    {
+      "name": "telemetry",
+      "enabled": true,
+      "config": {
+        "push_gateway": {
+          "enabled": true,
+          "push_gateway_url": "http://pushgateway:9091",
+          "job_name": "bifrost",
+          "push_interval": 15,
+          "instance_id": "bifrost-node-1",
+          "basic_auth": {
+            "username": "admin",
+            "password": "secret"
+          }
+        }
+      }
+    }
+  ]
+}
+```
+
+</Tab>
+</Tabs>
+
+---
+
+## Available Metrics
+
+The following metrics are available from both the `/metrics` endpoint and Push Gateway:
+
+### HTTP Metrics
+
+| Metric | Type | Description |
+|--------|------|-------------|
+| `http_requests_total` | Counter | Total HTTP requests by path, method, status |
+| `http_request_duration_seconds` | Histogram | HTTP request latency |
+| `http_request_size_bytes` | Histogram | Request body size |
+| `http_response_size_bytes` | Histogram | Response body size |
+
+### Bifrost LLM Metrics
+
+| Metric | Type | Description |
+|--------|------|-------------|
+| `bifrost_upstream_requests_total` | Counter | Total requests to LLM providers |
+| `bifrost_upstream_latency_seconds` | Histogram | Provider request latency |
+| `bifrost_success_requests_total` | Counter | Successful provider requests |
+| `bifrost_error_requests_total` | Counter | Failed provider requests |
+| `bifrost_input_tokens_total` | Counter | Total input tokens processed |
+| `bifrost_output_tokens_total` | Counter | Total output tokens generated |
+| `bifrost_cost_total` | Counter | Total cost in USD |
+| `bifrost_cache_hits_total` | Counter | Cache hits by type |
+| `bifrost_stream_first_token_latency_seconds` | Histogram | Time to first token (streaming) |
+| `bifrost_stream_inter_token_latency_seconds` | Histogram | Inter-token latency (streaming) |
+| `bifrost_key_rotation_events_total` | Counter | Per-attempt retry/rotation events with key identifiers (see below) <sup>v1.5.0-prerelease4+</sup> |
+
+### Default Labels
+
+All Bifrost metrics include these labels:
+
+- `provider` - LLM provider name
+- `model` - Model identifier
+- `method` - Request type (chat, completion, embedding, etc.)
+- `virtual_key_id` / `virtual_key_name` - Virtual key identifiers
+- `selected_key_id` / `selected_key_name` - API key that successfully served the request (`""` when all attempts failed)
+- `number_of_retries` - Total attempts minus one (across all keys)
+- `fallback_index` - Fallback position
+- `team_id` / `team_name` - Team identifiers (if governance enabled)
+- `customer_id` / `customer_name` - Customer identifiers (if governance enabled)
+
+<Note>
+  **v1.5.0-prerelease4+**: `selected_key_id` / `selected_key_name` are only populated when the request succeeds. On final errors both are empty — use `bifrost_key_rotation_events_total` or the `attempt_trail` log field to see which keys were tried.
+</Note>
+
+### Key Rotation Events <sup>v1.5.0-prerelease4+</sup>
+
+`bifrost_key_rotation_events_total` is incremented once per **failed attempt** (not per request), giving you time-series visibility into retry pressure:
+
+| Label | Values | Description |
+|-------|--------|-------------|
+| `provider` | e.g. `openai` | LLM provider |
+| `requested_model` | e.g. `gpt-4o` | Model as requested (before any alias resolution) |
+| `key_id` | UUID | The provider API key that failed on this attempt |
+| `key_name` | string | Human-readable name of the provider API key |
+| `fail_reason` | error type string | Provider error type (e.g. `rate_limit_error`, `network_error`) |
+
+**Example queries:**
+
+```promql
+# Rate-limit events per provider over time
+sum by (provider, fail_reason) (
+  rate(bifrost_key_rotation_events_total[5m])
+)
+
+# Which specific keys are hitting rate limits most often
+topk(5, sum by (provider, key_name, fail_reason) (
+  rate(bifrost_key_rotation_events_total{fail_reason="rate_limit_error"}[1h])
+))
+```
+
+---
+
+## Push Gateway Setup
+
+If you don't have a Push Gateway running, deploy one:
+
+### Docker
+
+```bash
+docker run -d -p 9091:9091 prom/pushgateway
+```
+
+### Kubernetes (Helm)
+
+```bash
+helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
+helm install pushgateway prometheus-community/prometheus-pushgateway
+```
+
+### Configure Prometheus to Scrape Push Gateway
+
+Add to your `prometheus.yml`:
+
+```yaml
+scrape_configs:
+  - job_name: 'pushgateway'
+    honor_labels: true
+    static_configs:
+      - targets: ['pushgateway:9091']
+```
+
+<Note>
+  The `honor_labels: true` setting is important - it preserves the `job` and `instance` labels pushed by Bifrost instead of overwriting them with the Push Gateway's labels.
+</Note>
+
+---
+
+## Pull vs Push: When to Use Each
+
+| Scenario | Recommended Method |
+|----------|-------------------|
+| Single Bifrost instance | Pull (scraping) |
+| Multiple instances, direct access | Pull (scraping) |
+| Multiple instances behind load balancer | **Push (Push Gateway)** |
+| Kubernetes with service mesh | Pull or Push |
+| Serverless / ephemeral instances | **Push (Push Gateway)** |
+
+### Why Push for Clusters?
+
+When multiple Bifrost instances run behind a load balancer:
+
+1. **Scraping randomness**: Each scrape may hit different nodes, missing metrics from others
+2. **Instance tracking**: Push Gateway properly tracks per-instance metrics via `instance` label
+3. **Aggregation**: Downstream tools (Grafana, Datadog) can aggregate across all instances
+
+---
+
+## Troubleshooting
+
+### Push Gateway Connection Failed
+
+```
+failed to push metrics to push gateway: connection refused
+```
+
+- Verify the Push Gateway URL is correct and reachable from Bifrost
+- Check firewall rules between Bifrost and Push Gateway
+- Ensure Push Gateway is running: `curl http://pushgateway:9091/metrics`
+
+### Metrics Not Appearing
+
+- Verify the telemetry plugin is enabled (required for metrics collection)
+- Check Bifrost logs for push errors
+- Verify Prometheus is scraping the Push Gateway with `honor_labels: true`
+
+### Authentication Failed
+
+- Double-check username and password
+- Ensure basic auth is configured on the Push Gateway side
+- Check for special characters that may need escaping