first commit

2026-04-26 21:52:23 +03:00
commit 880f412e2c
2662 changed files with 866266 additions and 0 deletions
--- a/docs/quickstart/README.mdx
+++ b/docs/quickstart/README.mdx
--- a/docs/quickstart/cli/getting-started.mdx
+++ b/docs/quickstart/cli/getting-started.mdx
@@ -0,0 +1,340 @@
+---
+title: "Bifrost CLI"
+description: "Launch any coding agent through Bifrost with a single command. Automatic configuration, model selection, and MCP integration — no environment variables needed."
+icon: "laptop-code"
+---
+
+{/* TODO: Add hero screenshot of the CLI TUI showing the summary/launch screen */}
+
+## What is Bifrost CLI?
+
+<Frame>
+<iframe 
+  src="https://drive.google.com/file/d/1Sh1-xCJrVeccKscaowznopHNdyoGl2zu/preview" 
+  title="Embedded content"
+  className="w-full h-96 rounded-md"
+></iframe>
+</Frame>
+
+Bifrost CLI is an interactive terminal tool that connects your favorite coding agents — Claude Code, Codex CLI, Gemini CLI, and Opencode — to your Bifrost gateway with zero manual configuration.
+
+Instead of setting environment variables, editing config files, and looking up provider paths, you just run `bifrost` and pick your agent, model, and go.
+
+**What it does for you:**
+
+- Automatically configures base URLs, API keys, and model settings for each agent
+- Fetches available models from your Bifrost gateway's `/v1/models` endpoint
+- Installs missing agents via npm if needed
+- Auto-attaches Bifrost's MCP server to Claude Code for tool access
+- Launches agents inside a persistent tabbed terminal UI so you can switch sessions without re-running the CLI
+- Shows per-tab activity badges so you can tell when a session is progressing, idle, or has sent an alert
+- Stores your selections securely (virtual keys go to your OS keyring, never plaintext on disk)
+
+## Installation
+
+```bash
+npx -y @maximhq/bifrost-cli
+```
+
+This downloads and runs the latest Bifrost CLI. No global install required — npx handles everything.
+
+<Note>
+Bifrost CLI requires **Node.js 18+** (for npx) and a running Bifrost gateway. See [Gateway Setup](/quickstart/gateway/setting-up) if you haven't started one yet.
+</Note>
+
+### Install a Specific Version
+
+```bash
+npx -y @maximhq/bifrost-cli --cli-version v1.0.0
+```
+
+## Quick Start
+
+### 1. Start your Bifrost gateway
+
+Make sure your Bifrost gateway is running. The default is `http://localhost:8080`:
+
+```bash
+npx -y @maximhq/bifrost
+```
+
+### 2. Launch the CLI
+
+In another terminal:
+
+To install 
+
+```bash
+npx -y @maximhq/bifrost-cli
+```
+
+If you have already installed run
+
+```bash
+bifrost
+```
+
+<Frame>
+  <img src="/media/cli/welcome-screen.png" alt="CLI welcome screen" />
+</Frame>
+
+### 3. Walk through the setup
+
+The CLI guides you through an interactive setup flow:
+
+**Step 1 — Base URL**
+
+Enter your Bifrost gateway URL. If you're running locally, this is typically `http://localhost:8080`.
+
+{/* TODO: Add screenshot of the base URL input screen */}
+
+**Step 2 — Virtual Key (optional)**
+
+If your Bifrost gateway has [virtual key authentication](/features/governance/virtual-keys) enabled, enter your virtual key here. Otherwise, press Enter to skip.
+
+{/* TODO: Add screenshot of the virtual key input screen */}
+
+**Step 3 — Choose a Harness**
+
+Select which coding agent you want to launch. The CLI shows installation status and version for each:
+
+{/* TODO: Add screenshot of the harness selection screen */}
+
+| Harness | Binary | Provider Path | Notes |
+|---------|--------|---------------|-------|
+| Claude Code | `claude` | `/anthropic` | MCP auto-attach, worktree support |
+| Codex CLI | `codex` | `/openai` | Sets `OPENAI_BASE_URL` to `{base}/openai/v1`; model override via `--model` |
+| Gemini CLI | `gemini` | `/genai` | Model override via `--model` flag |
+| Opencode | `opencode` | `/openai` | Custom models configured automatically through generated OpenCode config |
+
+If a harness isn't installed, the CLI will offer to install it via npm for you.
+
+{/* TODO: Add screenshot of the install confirmation dialog */}
+
+**Step 4 — Select a Model**
+
+The CLI fetches available models from your Bifrost gateway and presents a searchable list. Type to filter, arrow keys to navigate:
+
+{/* TODO: Add screenshot of the model search/selection screen */}
+
+<Tip>
+You can type any model name manually — even if it's not in the list. Just type the full model identifier and press Enter.
+</Tip>
+
+**Step 5 — Launch**
+
+Review your configuration in the summary screen. Press Enter to launch, or use the shortcut keys to adjust any setting:
+
+{/* TODO: Add screenshot of the summary/ready-to-launch screen */}
+
+| Key | Action |
+|-----|--------|
+| `Enter` | Launch the harness |
+| `u` | Change base URL |
+| `v` | Change virtual key |
+| `h` | Change harness |
+| `m` | Change model |
+| `w` | Set worktree name (Claude Code only) |
+| `d` | Open Bifrost dashboard |
+| `r` | Open documentation |
+| `l` | Toggle harness exit logs |
+| `i` | Report an issue on GitHub |
+| `s` | Star the repo on GitHub |
+| `q` | Quit |
+
+The CLI then launches your agent with all the correct environment variables and configuration set automatically.
+
+{/* TODO: Add screenshot of the launch banner showing endpoint, model, vk status */}
+
+## Tabbed Session UI
+
+After launch, Bifrost CLI keeps you inside a tabbed terminal UI instead of exiting after the first session. The bottom tab bar shows:
+
+- The `Bifrost CLI` label and current CLI version
+- One tab per running or recent agent session
+- A status badge for each tab:
+  - `🧠` — the visible screen is actively changing, so the agent is still working
+  - `✅` — the session looks idle and ready
+  - `🔔` — the PTY emitted a real terminal alert
+
+Use `Ctrl+B` at any time to focus the tab bar. From tab mode you can:
+
+| Key | Action |
+|-----|--------|
+| `n` | Open a new tab and launch another agent session |
+| `x` | Close the current tab |
+| `h` / `l` | Move left or right across tabs |
+| `1`-`9` | Jump directly to a tab |
+| `Esc` / `Enter` / `Ctrl+B` | Return to the active session |
+
+<Tip>
+If you press `Ctrl+B` before launching your first session, Bifrost CLI stays open on the Home tab bar so you can create a new tab from there.
+</Tip>
+
+## Configuration
+
+### Config File
+
+The CLI stores its configuration at `~/.bifrost/config.json`. This file is created automatically on first run and updated when you change settings through the TUI.
+
+```json
+{
+  "base_url": "http://localhost:8080",
+  "default_harness": "claude",
+  "default_model": "anthropic/claude-sonnet-4-5-20250929"
+}
+```
+
+| Field | Description |
+|-------|-------------|
+| `base_url` | Your Bifrost gateway URL |
+| `default_harness` | Last used harness ID (`claude`, `codex`, `gemini`, `opencode`) |
+| `default_model` | Last used model identifier |
+
+<Warning>
+Never put your virtual key in the config file. The CLI stores virtual keys securely in your OS keyring (macOS Keychain, Windows Credential Manager, or Linux Secret Service).
+</Warning>
+
+### CLI Flags
+
+```bash
+npx -y @maximhq/bifrost-cli [flags]
+```
+
+| Flag | Description |
+|------|-------------|
+| `-config <path>` | Path to a custom `config.json` file |
+| `-no-resume` | Skip resume flow and open fresh setup |
+| `-worktree <name>` | Create a git worktree for the session (Claude Code only) |
+
+### Using a Custom Config Path
+
+Point the CLI to a project-specific config:
+
+```bash
+npx -y @maximhq/bifrost-cli -config ./my-project/bifrost.json
+```
+
+This is useful when working across multiple Bifrost gateways or projects with different configurations.
+
+## Switching Harnesses and Models
+
+The CLI handles all environment variables, API keys, and provider-specific configuration automatically — you never need to set them yourself. To change your setup, use the shortcut keys from the summary screen:
+
+{/* TODO: Add screenshot of the summary/ready-to-launch screen with shortcut keys highlighted */}
+
+| Key | Action |
+|-----|--------|
+| `h` | Switch to a different harness (Claude Code, Codex, Gemini CLI, Opencode) |
+| `m` | Pick a different model from the gateway's available models |
+| `u` | Change the Bifrost gateway URL |
+| `v` | Update your virtual key |
+| `w` | Set worktree name (Claude Code only) |
+| `d` | Open the Bifrost dashboard in your browser |
+| `r` | Open the CLI documentation |
+| `l` | Toggle harness exit logs |
+| `i` | Report an issue on GitHub |
+| `s` | Star the repo on GitHub |
+| `Enter` | Launch with current settings |
+| `q` | Quit |
+
+When you switch harnesses, the CLI reconfigures everything for the new agent — base URLs, API keys, model flags, and any agent-specific setup. You can switch as many times as you like before launching.
+
+<Tip>
+After a session ends, the CLI returns to the summary screen with your previous configuration intact. Press `h` to switch agents or `m` to try a different model, then `Enter` to re-launch.
+</Tip>
+
+### Opencode Notes
+
+Bifrost CLI applies two OpenCode-specific behaviors automatically:
+
+- **Custom model selection**: when you pick a model in Bifrost CLI, OpenCode is launched with the correct provider-qualified model reference and a generated OpenCode runtime config.
+- **Theme handling**: if your OpenCode `tui.json` already defines a theme, Bifrost preserves it. If not, Bifrost supplies the adaptive `system` theme so OpenCode does not fall back to its default dark-only appearance.
+
+## Session Flow
+
+The CLI is designed for iterative development sessions:
+
+1. **Launch** — Select your agent and model, press Enter
+2. **Work** — Use your agent as normal, all traffic routes through Bifrost
+3. **Switch** — Press `Ctrl+B` any time to open the tab bar and jump to another session or start a new one
+4. **Return** — When an agent exits (or you quit it), the CLI returns to the chooser with your previous configuration intact
+5. **Re-launch** — Change model, switch harness, or re-launch the same setup instantly
+
+Your last selections are remembered across sessions. The next time you run `bifrost`, you'll see the summary screen with your previous configuration ready to go. Press Enter to re-launch immediately, or adjust any setting.
+
+{/* TODO: Add screenshot showing the re-launch flow with a previous session message */}
+
+## Worktree Support
+
+<Note>
+Worktree support is currently available for **Claude Code** only.
+</Note>
+
+Launch Claude Code in an isolated git worktree for parallel development:
+
+```bash
+npx -y @maximhq/bifrost-cli -worktree feature-branch
+```
+
+Or select worktree mode from the TUI during the setup flow. The CLI passes the `--worktree` flag to Claude Code automatically.
+
+## MCP Integration
+
+When launching **Claude Code**, the CLI automatically registers Bifrost's MCP server endpoint (`/mcp`) so all your configured MCP tools are available inside the agent.
+
+If a virtual key is configured, the CLI sets up authenticated MCP access with the correct `Authorization` header — no manual `claude mcp add-json` commands needed.
+
+For other harnesses, the CLI prints the MCP server URL so you can configure it manually in your agent's settings.
+
+## Troubleshooting
+
+### "npm not found in path"
+
+The CLI needs npm to install harnesses. Make sure Node.js is installed:
+
+```bash
+node --version  # Should be 18+
+npm --version
+```
+
+### Agent not found after install
+
+If a harness was installed but the binary still isn't found, you may need to restart your terminal or add npm's global bin directory to your `PATH`:
+
+```bash
+# Check npm global bin path
+npm config get prefix
+
+# Add to PATH (add to ~/.zshrc or ~/.bashrc for persistence)
+export PATH="$(npm config get prefix)/bin:$PATH"
+```
+
+### Models not loading
+
+If the model list doesn't load, check that:
+
+1. Your Bifrost gateway is running and accessible at the configured base URL
+2. You have at least one provider configured in Bifrost
+3. If using virtual keys, your key has permission to list models
+
+### Virtual key not persisting
+
+The CLI stores virtual keys in your OS keyring. On Linux, ensure `gnome-keyring` or `kwallet` is running. If keyring access fails, the CLI will log a warning but continue working — the key will need to be re-entered next session.
+
+## Next Steps
+
+<CardGroup cols={2}>
+  <Card title="Gateway Setup" icon="server" href="/quickstart/gateway/setting-up">
+    Set up a Bifrost gateway if you haven't already
+  </Card>
+  <Card title="Provider Configuration" icon="gear" href="/quickstart/gateway/provider-configuration">
+    Configure AI providers in your Bifrost gateway
+  </Card>
+  <Card title="Virtual Keys" icon="key" href="/features/governance/virtual-keys">
+    Set up authentication and usage limits
+  </Card>
+  <Card title="MCP Gateway" icon="toolbox" href="/mcp/overview">
+    Configure MCP tools for your agents
+  </Card>
+</CardGroup>
--- a/docs/quickstart/gateway/integrations.mdx
+++ b/docs/quickstart/gateway/integrations.mdx
@@ -0,0 +1,70 @@
+---
+title: "Integrations"
+description: "Use Bifrost as a drop-in replacement for existing AI provider SDKs with zero code changes. Just change the base URL and unlock advanced features."
+icon: "plug"
+---
+
+## What are Integrations?
+
+Integrations are protocol adapters that make Bifrost **100% compatible** with existing AI provider SDKs. They translate between provider-specific API formats (OpenAI, Anthropic, Google GenAI) and Bifrost's unified API, enabling you to:
+
+- **Drop-in replacement** - Change only the base URL in your existing code
+- **Zero migration effort** - Keep your current SDK and request/response handling
+- **Instant feature access** - Get governance, caching, fallbacks, and monitoring without code changes
+
+## Quick Example
+
+### Before (Direct Provider)
+```python
+import openai
+
+client = openai.OpenAI(
+    api_key="your-openai-key"
+)
+```
+
+### After (Bifrost Integration)
+```python
+import openai
+
+client = openai.OpenAI(
+    base_url="http://localhost:8080/openai",  # Point to Bifrost
+    api_key="dummy-key"  # Keys handled by Bifrost
+)
+```
+
+**That's it!** Your application now has automatic fallbacks, governance, monitoring, and all Bifrost features.
+
+## Available Integrations
+
+Bifrost provides complete compatibility with these popular AI SDKs:
+
+- **[OpenAI SDK](../../integrations/openai-sdk)**
+- **[Anthropic SDK](../../integrations/anthropic-sdk)**
+- **[Google GenAI SDK](../../integrations/genai-sdk)**
+- **[LiteLLM](../../integrations/litellm-sdk)**
+- **[LangChain](../../integrations/langchain-sdk)**
+- **[Passthrough Endpoints](../../integrations/passthrough)**
+
+## Learn More
+
+For detailed setup guides, compatibility information, and advanced usage:
+
+**➜ [Complete Integration Documentation](../../integrations/what-is-an-integration)**
+
+## Next Steps
+
+Now that you understand integrations, explore these related topics:
+
+### Essential Topics
+
+- **[Provider Configuration](./provider-configuration)** - Set up multiple AI providers for redundancy
+- **[Tool Calling](./tool-calling)** - Enable AI models to use external functions
+- **[Streaming Responses](./streaming)** - Real-time response generation
+- **[Multimodal AI](./multimodal)** - Process images, audio, and multimedia content
+
+### Advanced Topics
+
+- **[Core Features](../../features/)** - Governance, caching, and observability
+- **[Architecture](../../architecture/)** - How Bifrost works internally
+- **[Deployment](../../deployment-guides)** - Production setup and scaling
--- a/docs/quickstart/gateway/multimodal.mdx
+++ b/docs/quickstart/gateway/multimodal.mdx
@@ -0,0 +1,356 @@
+---
+title: "Multimodal Support"
+description: "Process multiple types of content including images, audio, and text with AI models. Bifrost supports vision analysis, image generation, speech synthesis, and audio transcription across various providers."
+icon: "images"
+---
+
+## Vision: Analyzing Images with AI
+
+Send images to vision-capable models for analysis, description, and understanding. This example shows how to analyze an image from a URL using GPT-4o with high detail processing for better accuracy.
+
+```bash
+curl --location 'http://localhost:8080/v1/chat/completions' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/gpt-4o",
+    "messages": [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "What do you see in this image? Please describe it in detail."
+                },
+                {
+                    "type": "image_url",
+                    "image_url": {
+                        "url": "https://pub-cdead89c2f004d8f963fd34010c479d0.r2.dev/Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
+                        "detail": "high"
+                    }
+                }
+            ]
+        }
+    ]
+}'
+```
+
+**Response includes detailed image analysis:**
+```json
+{
+    "choices": [{
+        "message": {
+            "role": "assistant",
+            "content": "I can see a beautiful wooden boardwalk extending through a natural landscape..."
+        }
+    }]
+}
+```
+
+## Image Generation: Generating Images with AI
+
+Generate images from text prompts using OpenAI-compatible image generation models. 
+
+### Basic Image Generation
+
+Generate an image from a text prompt using `dall-e-3`.
+
+```bash
+curl --location 'http://localhost:8080/v1/images/generations' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/dall-e-3",
+    "prompt": "A futuristic city skyline at sunset with flying cars",
+    "size": "1024x1024",
+    "response_format": "url"
+}'
+```
+
+**Response format:**
+```json
+{
+    "created": 1713833628,
+    "data": [
+        {
+            "url": "https://oaidalleapiprodscus.blob.core.windows.net/...",
+            "revised_prompt": "A futuristic city skyline at sunset featuring advanced architecture and flying vehicles.",
+            "index": 0
+        }
+    ],
+    "background": "opaque",
+    "output_format": "png",
+    "quality": "standard",
+    "size": "1024x1024",
+    "usage": {
+        "input_tokens": 15,
+        "output_tokens": 1,
+        "total_tokens": 16
+    },
+    "extra_fields": {
+        "request_type": "image_generation",
+        "provider": "openai",
+        "model_requested": "dall-e-3",
+        "latency": 15265,
+        "chunk_index": 0
+    }
+}
+```
+
+## Audio Understanding: Analyzing Audio with AI
+
+If your chat application supports text input, you can add audio input and output—just include audio in the modalities array and use an audio model, like gpt-4o-audio-preview.
+
+### Audio Input to Model
+
+```bash
+curl --location 'http://localhost:8080/v1/chat/completions' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/gpt-4o-audio-preview",
+    "modalities": ["text"],
+    "messages": [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "Please analyze this audio recording and summarize what was discussed."
+                },
+                {
+                    "type": "input_audio",
+                    "input_audio": {
+                        "data": "<base64-encoded audio data containing the word 'Affirmative'>",
+                        "format": "wav"
+                    }
+                }
+            ]
+        }
+    ]
+}'
+```
+
+### Audio Output from Model
+
+```bash
+{
+    "choices": [
+        {
+            "index": 0,
+            "finish_reason": "stop",
+            "message": {
+                "role": "assistant",
+                "content": "The audio recording captured a brief segment where a speaker simply said \"Affirmative\" in response. There wasn't any detailed discussion or context provided beyond that one-word affirmation. If you have more audio or specific questions, feel free to share!"
+            }
+        }
+    ]
+}
+```
+
+## Text-to-Speech: Converting Text to Audio
+
+Convert text into natural-sounding speech using AI voice models. This example demonstrates generating an MP3 audio file from text using the "alloy" voice. The result is returned as binary audio data.
+
+```bash
+curl --location 'http://localhost:8080/v1/audio/speech' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/tts-1",
+    "input": "Hello! This is a sample text that will be converted to speech using Bifrost speech synthesis capabilities. The weather today is wonderful, and I hope you are having a great day!",
+    "voice": "alloy",
+    "response_format": "mp3"
+}' \
+--output "output.mp3"
+```
+
+**Save audio to file:**
+```bash
+# The --output flag saves the binary audio data directly to a file
+# File size will vary based on input text length
+```
+
+## Speech-to-Text: Transcribing Audio Files
+
+Convert audio files into text using AI transcription models. This example shows how to transcribe an MP3 file using OpenAI's Whisper model, with an optional context prompt to improve accuracy.
+
+```bash
+curl --location 'http://localhost:8080/v1/audio/transcriptions' \
+--form 'file=@"output.mp3"' \
+--form 'model="openai/whisper-1"' \
+--form 'prompt="This is a sample audio transcription from Bifrost speech synthesis."'
+```
+
+**Response format:**
+```json
+{
+    "text": "Hello! This is a sample text that will be converted to speech using Bifrost speech synthesis capabilities. The weather today is wonderful, and I hope you are having a great day!"
+}
+```
+
+## Advanced Vision Examples
+
+### Multiple Images
+
+Send multiple images in a single request for comparison or analysis. This is useful for comparing products, analyzing changes over time, or understanding relationships between different visual elements.
+
+```bash
+curl --location 'http://localhost:8080/v1/chat/completions' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/gpt-4o",
+    "messages": [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "Compare these two images. What are the differences?"
+                },
+                {
+                    "type": "image_url",
+                    "image_url": {
+                        "url": "https://example.com/image1.jpg"
+                    }
+                },
+                {
+                    "type": "image_url",
+                    "image_url": {
+                        "url": "https://example.com/image2.jpg"
+                    }
+                }
+            ]
+        }
+    ]
+}'
+```
+
+### Base64 Images
+
+Process local images by encoding them as base64 data URLs. This approach is ideal when you need to analyze images stored locally on your system without uploading them to external URLs first.
+
+```bash
+# First, encode your local image to base64
+base64_image=$(base64 -i local_image.jpg)
+data_url="data:image/jpeg;base64,$base64_image"
+
+curl --location 'http://localhost:8080/v1/chat/completions' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/gpt-4o",
+    "messages": [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "Analyze this image and describe what you see."
+                },
+                {
+                    "type": "image_url",
+                    "image_url": {
+                        "url": "'$data_url'",
+                        "detail": "high"
+                    }
+                }
+            ]
+        }
+    ]
+}'
+```
+
+## Audio Configuration Options
+
+### Voice Selection for Speech Synthesis
+
+OpenAI provides six distinct voice options, each with different characteristics:
+
+- `alloy` - Balanced, natural voice
+- `echo` - Deep, resonant voice  
+- `fable` - Expressive, storytelling voice
+- `onyx` - Strong, confident voice
+- `nova` - Bright, energetic voice
+- `shimmer` - Gentle, soothing voice
+
+```bash
+# Example with different voice
+curl --location 'http://localhost:8080/v1/audio/speech' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/tts-1",
+    "input": "This is the nova voice speaking.",
+    "voice": "nova",
+    "response_format": "mp3"
+}' \
+--output "sample_nova.mp3"
+```
+
+### Audio Formats
+
+Generate audio in different formats depending on your use case. MP3 for general use, Opus for web streaming, AAC for mobile apps, and FLAC for high-quality audio applications.
+
+```bash
+# MP3 format (default)
+"response_format": "mp3"
+
+# Opus format for web streaming
+"response_format": "opus"
+
+# AAC format for mobile apps
+"response_format": "aac"
+
+# FLAC format for high-quality audio
+"response_format": "flac"
+```
+
+## Transcription Options
+
+### Language Specification
+
+Improve transcription accuracy by specifying the source language. This is particularly helpful for non-English audio or when the audio contains technical terms or specific domain vocabulary.
+
+```bash
+curl --location 'http://localhost:8080/v1/audio/transcriptions' \
+--form 'file=@"spanish_audio.mp3"' \
+--form 'model="openai/whisper-1"' \
+--form 'language="es"' \
+--form 'prompt="This is a Spanish audio recording about technology."'
+```
+
+### Response Formats
+
+Choose between simple text output or detailed JSON responses with timestamps. The verbose JSON format provides word-level and segment-level timing information, useful for creating subtitles or analyzing speech patterns.
+
+```bash
+# Text only response
+curl --location 'http://localhost:8080/v1/audio/transcriptions' \
+--form 'file=@"audio.mp3"' \
+--form 'model="openai/whisper-1"' \
+--form 'response_format="text"'
+
+# JSON with timestamps
+curl --location 'http://localhost:8080/v1/audio/transcriptions' \
+--form 'file=@"audio.mp3"' \
+--form 'model="openai/whisper-1"' \
+--form 'response_format="verbose_json"' \
+--form 'timestamp_granularities[]=word' \
+--form 'timestamp_granularities[]=segment'
+```
+
+<Info>
+Check the [Supported Providers](/providers/supported-providers/overview) page for more information on multimodal capabilities supported by each provider.
+</Info>
+
+## Next Steps
+
+Now that you understand multimodal capabilities, explore these related topics:
+
+### Essential Topics
+
+- **[Streaming Responses](./streaming)** - Real-time multimodal processing
+- **[Tool Calling](./tool-calling)** - Combine with external tools
+- **[Provider Configuration](./provider-configuration)** - Multiple providers for different capabilities
+- **[Integrations](./integrations)** - Drop-in compatibility with existing SDKs
+
+### Advanced Topics
+
+- **[Core Features](../../features/)** - Advanced Bifrost capabilities
+- **[Architecture](../../architecture/)** - How Bifrost works internally
+- **[Deployment](../../deployment-guides)** - Production setup and scaling
--- a/docs/quickstart/gateway/provider-configuration.mdx
+++ b/docs/quickstart/gateway/provider-configuration.mdx
--- a/docs/quickstart/gateway/reranking.mdx
+++ b/docs/quickstart/gateway/reranking.mdx
@@ -0,0 +1,124 @@
+---
+title: "Reranking"
+description: "Reorder documents by relevance to a query using /v1/rerank."
+icon: "book-open-cover"
+---
+
+Use reranking to sort documents by relevance for search, retrieval, and context selection.
+
+## Provider Model Examples
+
+- Cohere: `cohere/rerank-v3.5`
+- vLLM: `vllm/BAAI/bge-reranker-v2-m3`
+- Bedrock: `bedrock/<rerank-model-or-arn>`
+- Vertex AI: `vertex/<ranking-model>`
+
+## Basic Request
+
+```bash
+curl --location 'http://localhost:8080/v1/rerank' \
+--header 'Content-Type: application/json' \
+--data '{
+  "model": "cohere/rerank-v3.5",
+  "query": "What is Bifrost?",
+  "documents": [
+    {"text": "Bifrost is an AI gateway that unifies many LLM providers."},
+    {"text": "Paris is the capital of France."},
+    {"text": "Bifrost exposes an OpenAI-compatible API."}
+  ]
+}'
+```
+
+## Request Parameters
+
+- `model` (required): model in `provider/model` format
+- `query` (required): query used for ranking
+- `documents` (required): array of documents with `text` (optional `id`, `meta`)
+- `top_n` (optional): maximum number of results
+- `max_tokens_per_doc` (optional): provider-dependent document token cap
+- `priority` (optional): provider-dependent priority hint
+- `return_documents` (optional): include matched document content in each result
+- `fallbacks` (optional): fallback models in `provider/model` format
+
+## Example with Options
+
+```bash
+curl --location 'http://localhost:8080/v1/rerank' \
+--header 'Content-Type: application/json' \
+--data '{
+  "model": "cohere/rerank-v3.5",
+  "query": "gateway observability",
+  "top_n": 2,
+  "return_documents": true,
+  "documents": [
+    {"id": "a", "text": "Bifrost supports observability plugins like OTEL and Maxim."},
+    {"id": "b", "text": "Bifrost can run in Kubernetes and ECS."},
+    {"id": "c", "text": "Token counting is available at /v1/responses/input_tokens."}
+  ]
+}'
+```
+
+## vLLM Endpoint Compatibility
+
+When using a `vllm/...` model, Bifrost sends rerank requests to `/v1/rerank` first and automatically retries `/rerank` when the upstream endpoint responds with `404`, `405`, or `501`.
+
+## Response Shape
+
+```json
+{
+  "results": [
+    {
+      "index": 0,
+      "relevance_score": 0.98,
+      "document": {
+        "id": "a",
+        "text": "Bifrost supports observability plugins like OTEL and Maxim."
+      }
+    },
+    {
+      "index": 2,
+      "relevance_score": 0.63,
+      "document": {
+        "id": "c",
+        "text": "Token counting is available at /v1/responses/input_tokens."
+      }
+    }
+  ],
+  "model": "rerank-v3.5",
+  "usage": {
+    "prompt_tokens": 52,
+    "completion_tokens": 0,
+    "total_tokens": 52
+  },
+  "extra_fields": {
+    "request_type": "rerank",
+    "provider": "cohere",
+    "latency": 245,
+    "chunk_index": 0
+  }
+}
+```
+
+## Common Validation Errors
+
+- Missing `query` -> `query is required for rerank`
+- Empty `documents` -> `documents are required for rerank`
+- Blank document text -> `document text is required for rerank at index N`
+- `top_n < 1` -> `top_n must be at least 1`
+
+## Next Steps
+
+Now that you understand reranking, explore these related topics:
+
+### Essential Topics
+
+- **[Multimodal AI](./multimodal)** - Process images, audio, and multimedia content
+- **[Tool Calling](./tool-calling)** - Enable AI models to use external tools and functions
+- **[Provider Configuration](./provider-configuration)** - Multiple providers for redundancy
+- **[Integrations](./integrations)** - Drop-in compatibility with existing SDKs
+
+### Advanced Topics
+
+- **[Core Features](../../features/)** - Advanced Bifrost capabilities
+- **[Architecture](../../architecture/)** - How Bifrost works internally
+- **[Deployment](../../deployment-guides)** - Production setup and scaling
--- a/docs/quickstart/gateway/setting-up-auth.mdx
+++ b/docs/quickstart/gateway/setting-up-auth.mdx
@@ -0,0 +1,143 @@
+---
+title: "Setting up auth"
+description: "Learn how to enable basic authentication for the Bifrost dashboard to secure your admin interface and API endpoints."
+icon: "lock"
+---
+
+<Note>This feature is only available in OSS. For enterprise builds you can setup [SCIM](/enterprise/scim)</Note>
+
+## Overview
+
+Bifrost provides built-in authentication to protect your dashboard and admin API endpoints. When enabled, users must log in with credentials before accessing the dashboard or making admin API calls. This feature helps secure your Bifrost instance, especially when deployed in production environments.
+
+## Enabling Authentication
+
+### Step 1: Navigate to Security Settings
+
+1. Open your Bifrost dashboard
+2. Go to **Workspace** → **Config** → **Security** tab
+3. Scroll to the **Password protect the dashboard** section
+
+![Setting up auth](../../media/setting-up-dashboard-auth.png)
+
+### Step 2: Enable Authentication
+
+1. Toggle the **Password protect the dashboard** switch to enable authentication
+2. Enter your **Username** in the admin username field
+3. Enter your **Password** in the admin password field
+
+<Note>
+  The username and password fields are only enabled when the authentication toggle is turned on. Make sure to use a
+  strong password for security.
+</Note>
+
+### Step 3: Configure Inference Call Authentication (Optional)
+
+By default, when authentication is enabled, all API calls (including inference calls) require authentication. You can optionally disable authentication for inference calls while keeping it enabled for the dashboard and admin API:
+
+1. Enable the **Disable authentication on inference calls** toggle
+2. When enabled:
+   - Dashboard and admin API calls will still require authentication
+   - Inference API calls (chat completions, embeddings, etc.) will not require authentication
+   - MCP tool execution calls will still require authentication
+
+<Note>
+  This option is useful if you want to protect your dashboard and admin functions while allowing public access to
+  inference endpoints.
+</Note>
+
+### Step 4: Configure Whitelisted Routes (Optional)
+
+You can configure specific routes that bypass the authentication middleware entirely. Requests to these routes will not require authentication, even when auth is enabled.
+
+1. Scroll to the **Whitelisted Routes** section
+2. Enter a comma-separated list of routes in the textarea
+
+![Whitelisted Routes Configuration](../../media/ui-security-whitelisted-routes.png)
+
+**Wildcard support:** Routes ending with `*` are treated as prefix matches. For example, `/api/webhook*` will match `/api/webhook`, `/api/webhook/v1`, `/api/webhook/github`, etc.
+
+**Example values:**
+
+```
+/api/custom-webhook, /api/public-endpoint, /api/webhook*
+```
+
+<Note>
+  System routes like `/health`, `/api/session/login`, `/api/session/is-auth-enabled`, `/api/oauth/callback`, and
+  `/api/info` are always whitelisted regardless of this setting. Whitelisted routes only apply to dashboard and admin
+  API endpoints — inference endpoints have their own toggle (see Step 3).
+</Note>
+
+### Step 5: Save Changes
+
+1. Click **Save Changes** to apply your authentication settings
+2. Changes take effect immediately - no restart required
+
+## Logging In
+
+Once authentication is enabled:
+
+1. Navigate to your Bifrost dashboard URL
+2. You will be automatically redirected to the login page
+3. Enter your configured username and password
+4. Click **Sign in**
+
+After successful login, you'll be redirected to the dashboard. Your session will remain active for 30 days, and you'll need to log in again after the session expires.
+
+## Authentication Methods
+
+Bifrost supports different authentication methods depending on the type of request:
+
+### Dashboard Access
+
+- **Bearer Token Authentication**: The dashboard uses Bearer token authentication
+- Tokens are automatically managed through the login session
+- Tokens are stored in browser localStorage and sent with each API request
+
+### API Calls
+
+When authentication is enabled, API calls can be made using
+
+- **Basic Authentication**: Username and password encoded as base64 via HTTP Basic auth
+- **Bearer Token**: Session token issued after login (Bearer token from session)
+
+When authentication is enabled for inference calls (i.e., the "Disable authentication on inference calls" toggle is OFF), inference calls can be made using:
+
+- **Basic Authentication**: Username and Password in Basic auth
+- **Bearer Token**: base64 string of username:password as bearer token
+
+### Whitelisted Routes
+
+When a route is added to the whitelisted routes list in Security settings, requests to that path bypass authentication entirely — no Basic Auth or Bearer Token is required. This applies only to dashboard and admin API endpoints. Inference endpoints are controlled separately via the "Disable authentication on inference calls" toggle.
+
+### Example: Using Basic Auth for Inference Calls
+
+```bash
+# Using curl with Basic Auth
+curl -X POST http://localhost:8080/v1/chat/completions \
+  -u "your-username:your-password" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "openai/gpt-4o",
+    "messages": [{"role": "user", "content": "Hello!"}]
+  }'
+```
+
+## Important Notes
+
+- **No Restart Required**: Authentication changes take effect immediately without requiring a server restart
+- **Session Duration**: Login sessions last for 30 days
+- **Password Security**: Passwords are hashed and stored securely in the database
+- **Inference Calls**: If you disable authentication on inference calls, only dashboard and admin API endpoints will be protected
+- **Whitelisted Routes**: Routes configured in Security settings bypass auth on dashboard/admin API endpoints only. Use `*` at the end of a route for prefix matching (e.g., `/api/webhook*`)
+
+## Disabling Authentication
+
+To disable authentication:
+
+1. Navigate to **Workspace** → **Config** → **Security**
+2. Toggle off the **Password protect the dashboard** switch
+3. Click **Save Changes**
+
+After disabling, the dashboard will be accessible without authentication immediately.
--- a/docs/quickstart/gateway/setting-up.mdx
+++ b/docs/quickstart/gateway/setting-up.mdx
@@ -0,0 +1,262 @@
+---
+title: "Setting Up"
+description: "Get Bifrost running as an HTTP API gateway in 30 seconds with zero configuration. Perfect for any programming language."
+icon: "play"
+---
+
+![Bifrost Gateway Installation](../../media/getting-started.png)
+
+## 30-Second Setup
+
+Get Bifrost running as a blazing-fast HTTP API gateway with **zero configuration**. Connect to any AI provider (OpenAI, Anthropic, Bedrock, and more) through a unified API that follows **OpenAI request/response format**.
+
+### 1. Choose Your Setup Method
+
+Both options work perfectly - choose what fits your workflow:
+
+#### NPX Binary
+
+<video width="100%" controls>
+  <source src="https://github.com/maximhq/bifrost/raw/refs/heads/main/docs/media/run-npx.mp4" type="video/mp4" />
+  Your browser does not support the video tag.
+</video>
+
+```bash
+# Install and run locally
+npx -y @maximhq/bifrost
+
+# Install a specific version
+npx -y @maximhq/bifrost --transport-version v1.3.9
+```
+
+#### Docker
+
+```bash
+# Pull and run Bifrost HTTP API
+docker pull maximhq/bifrost
+docker run -p 8080:8080 maximhq/bifrost
+
+# Pull a specific version
+docker pull maximhq/bifrost:v1.3.9
+docker pull maximhq/bifrost:v1.3.9-amd64
+docker pull maximhq/bifrost:v1.3.9-arm64
+```
+
+**For Data Persistence**
+
+```bash
+# For configuration persistence across restarts
+docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost
+```
+
+### 2. Configuration Flags
+
+| Flag      | Default   | NPX               | Docker                          | Description                          |
+| --------- | --------- | ----------------- | ------------------------------- | ------------------------------------ |
+| port      | 8080      | `-port 8080`      | `-e APP_PORT=8080 -p 8080:8080` | HTTP server port                     |
+| host      | localhost | `-host 0.0.0.0`   | `-e APP_HOST=0.0.0.0`           | Host to bind server to               |
+| log-level | info      | `-log-level info` | `-e LOG_LEVEL=info`             | Log level (debug, info, warn, error) |
+| log-style | json      | `-log-style json` | `-e LOG_STYLE=json`             | Log style (pretty, json)             |
+
+**Understanding App Directory**
+
+The `-app-dir` flag determines where Bifrost stores all its data:
+
+```bash
+# Specify custom directory
+npx -y @maximhq/bifrost -app-dir ./my-bifrost-data
+
+# If not specified, creates in your OS config directory:
+# • Linux/macOS: ~/.config/bifrost
+# • Windows: %APPDATA%\bifrost
+```
+
+**What's stored in app-dir:**
+
+- `config.json` - Configuration file (optional)
+- `config.db` - SQLite database for UI configuration
+- `logs.db` - Request logs database
+
+**Note:** When using Bifrost via Docker, the volume you mount will be used as the app-dir.
+
+### 3. Open the Web Interface
+
+Navigate to **http://localhost:8080** in your browser:
+
+```bash
+# macOS
+open http://localhost:8080
+
+# Linux
+xdg-open http://localhost:8080
+
+# Windows
+start http://localhost:8080
+```
+
+🖥️ **The Web UI provides:**
+
+- **Visual provider setup** - Add API keys with clicks, not code
+- **Real-time configuration** - Changes apply immediately
+- **Live monitoring** - Request logs, metrics, and analytics
+- **Governance management** - Virtual keys, usage budgets, and more
+
+### 4. Test Your First API Call
+
+```bash
+curl -X POST http://localhost:8080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "openai/gpt-4o-mini",
+    "messages": [{"role": "user", "content": "Hello, Bifrost!"}]
+  }'
+```
+
+**🎉 That's it!** Bifrost is running and ready to route AI requests.
+
+### What Just Happened?
+
+1. **Zero Configuration Start**: Bifrost launched without any config files - everything can be configured through the Web UI or API
+2. **OpenAI-Compatible API**: All Bifrost APIs follow OpenAI request/response format for seamless integration
+3. **Unified API Endpoint**: `/v1/chat/completions` works with any provider (OpenAI, Anthropic, Bedrock, etc.)
+4. **Provider Resolution**: `openai/gpt-4o-mini` tells Bifrost to use OpenAI's GPT-4o Mini model
+5. **Automatic Routing**: Bifrost handles authentication, rate limiting, and request routing automatically
+
+---
+
+## Two Configuration Modes
+
+Bifrost supports **two configuration approaches** - you cannot use both simultaneously:
+
+### Mode 1: Web UI Configuration
+
+![Configuration via UI](../../media/ui-config.png)
+
+**When the UI is available:**
+
+- No `config.json` file exists (Bifrost auto-creates SQLite database)
+- `config.json` exists with `config_store` configured
+
+### Mode 2: File-based Configuration
+
+<Note>You can view entire config schema [here](https://www.getbifrost.ai/schema)</Note>
+
+**When to use:** Advanced setups, GitOps workflows, or when UI is not needed
+
+Create `config.json` in your app directory:
+
+```json
+{
+  "$schema": "https://www.getbifrost.ai/schema",
+  "client": {
+    "drop_excess_requests": false
+  },
+  "providers": {
+    "openai": {
+      "keys": [
+        {
+          "name": "openai-key-1",
+          "value": "env.OPENAI_API_KEY",
+          "models": ["gpt-4o-mini", "gpt-4o"],
+          "weight": 1.0
+        }
+      ]
+    }
+  },
+  "config_store": {
+    "enabled": true,
+    "type": "sqlite",
+    "config": {
+      "path": "./config.db"
+    }
+  }
+}
+```
+
+**Without `config_store` in `config.json`:**
+
+- **UI is disabled** - no real-time configuration possible
+- **Read-only mode** - `config.json` is never modified
+- **Memory-only** - all configurations loaded into memory at startup
+- **Restart required** - changes to `config.json` only apply after restart
+
+**With `config_store` in `config.json`:**
+
+- **UI is enabled** - full real-time configuration via web interface
+- **Database check** - Bifrost checks if config store database exists and has data
+  - **Empty DB**: Bootstraps database with `config.json` settings, then uses DB exclusively
+  - **Existing DB**: Uses database directly, **ignores** `config.json` configurations
+- **Persistent storage** - all changes saved to database immediately
+
+**Important for Advanced Users:**
+If you want database persistence but prefer not to use the UI, note that modifying `config.json` after initial bootstrap has no effect when `config_store` is enabled. Use the public HTTP APIs to make configuration changes instead.
+
+**The Three Stores Explained:**
+
+- **Config Store**: Stores provider configs, API keys, MCP settings - Required for UI functionality
+- **Logs Store**: Stores request logs shown in UI - Optional, can be disabled
+- **Vector Store**: Used for semantic caching - Optional, can be disabled
+
+## PostgreSQL UTF8 Requirement
+
+<Note>
+  The minimum PostgreSQL version required is 16 or above.
+</Note>
+
+<Note>
+  For the log store, Bifrost creates materialized views to improve analytics performance. Ensure that the PostgreSQL user
+  has the necessary permissions to perform these operations on the target schema.
+</Note>
+
+If you use PostgreSQL for `config_store` or `logs_store`, the target database must use `UTF8` encoding.
+
+Use `template0` when creating the database so PostgreSQL applies UTF8 and locale settings explicitly:
+
+```sql
+CREATE DATABASE bifrost
+  WITH TEMPLATE template0
+       ENCODING 'UTF8'
+       LC_COLLATE '<your-locale>'
+       LC_CTYPE '<your-locale>';
+```
+
+Use locale names that exist in your Postgres image/host (for example, `en_US.UTF-8`, `C.UTF-8`, or another installed UTF-8 locale).
+
+Verify the database encoding:
+
+```sql
+SELECT datname, pg_encoding_to_char(encoding) AS encoding
+FROM pg_database
+WHERE datname = 'bifrost';
+```
+
+If the database is not UTF8, Bifrost startup/migrations can fail with:
+
+```text
+simple protocol queries must be run with client_encoding=UTF8
+```
+
+If you already created a SQL_ASCII database, create a new UTF8 database and update your Bifrost DB config to point to it.
+
+---
+
+## Next Steps
+
+Now that you have Bifrost running, explore these focused guides:
+
+### Essential Topics
+
+- **[Provider Configuration](./provider-configuration)** - Multiple providers, automatic failovers & load balancing
+- **[Integrations](../../integrations/what-is-an-integration)** - Drop-in replacements for OpenAI, Anthropic, and GenAI SDKs
+- **[Multimodal Support](./multimodal)** - Support for text, images, audio, and streaming, all behind a common interface.
+
+### Advanced Topics
+
+- **[Tracing](../../features/observability/default)** - Logging requests for monitoring and debugging
+- **[MCP Tools](../../mcp/overview)** - Enable AI models to use external tools (filesystem, web search, databases)
+- **[Governance](../../features/governance/virtual-keys)** - Usage tracking, rate limiting, and cost control
+- **[Deployment](../../deployment-guides/k8s)** - Production setup and scaling
+
+---
+
+**Happy building with Bifrost!** 🚀
--- a/docs/quickstart/gateway/streaming.mdx
+++ b/docs/quickstart/gateway/streaming.mdx
@@ -0,0 +1,174 @@
+---
+title: "Streaming Responses"
+description: "Receive AI responses in real-time via Server-Sent Events. Perfect for chat applications, audio processing, and real-time transcription where you want immediate results."
+icon: "water"
+---
+
+
+## Streaming Text Completion
+
+Request text completions with streaming enabled to receive partial `text` chunks as they are generated.
+
+```bash
+curl --location 'http://localhost:8080/v1/completions' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/gpt-4o-mini",
+    "prompt": "Write a short haiku about the ocean",
+    "stream": true
+}'
+```
+
+**Response Format (Server-Sent Events):**
+```
+data: {"choices":[{"text":"Waves whisper soft"}],"model":"gpt-4o-mini"}
+
+data: {"choices":[{"text":" on distant shores, the moon calls"}],"model":"gpt-4o-mini"}
+
+data: {"choices":[{"text":" tides to rise."}],"model":"gpt-4o-mini"}
+
+data: [DONE]
+```
+
+## Streaming Chat Responses
+
+Receive AI responses in real-time as they're generated. Perfect for chat applications where you want to show responses as they're being typed, improving user experience.
+
+```bash
+curl --location 'http://localhost:8080/v1/chat/completions' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/gpt-4o-mini",
+    "messages": [
+        {"role": "user", "content": "Tell me a story about a robot learning to paint"}
+    ],
+    "stream": true
+}'
+```
+
+**Response Format (Server-Sent Events):**
+```
+data: {"choices":[{"delta":{"content":"Once"}}],"model":"gpt-4o-mini"}
+
+data: {"choices":[{"delta":{"content":" upon"}}],"model":"gpt-4o-mini"}
+
+data: {"choices":[{"delta":{"content":" a"}}],"model":"gpt-4o-mini"}
+
+data: [DONE]
+```
+
+Each chunk contains partial content that you can append to build the complete response in real-time.
+
+> **Note:** Streaming requests also follow the default timeout setting defined in provider configuration, which defaults to **30 seconds**.
+
+<Note>
+Bifrost standardizes all stream responses to send usage and finish reason only in the last chunk, and content in the previous chunks.
+</Note>
+
+## Responses API Streaming
+
+Stream the OpenAI-style Responses API with event-based SSE. This includes `event:` lines and does not use the `[DONE]` marker; the stream ends when the connection closes.
+
+```bash
+curl --location 'http://localhost:8080/v1/responses' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/gpt-4o-mini",
+    "input": "Tell me one interesting fact about Mars",
+    "stream": true
+}'
+```
+
+**Response Format (Server-Sent Events):**
+```
+event: response.created
+data: {"type":"response.created"}
+
+event: response.output_text.delta
+data: {"type":"response.output_text.delta","delta": /* partial text delta payload */ }
+
+event: response.output_text.delta
+data: {"type":"response.output_text.delta","delta": * more text delta */ }
+
+event: response.completed
+data: {"type":"response.completed","response":{ /* usage, finish_reason, etc. */ }}
+```
+
+## Text-to-Speech Streaming: Real-time Audio Generation
+
+Stream audio generation in real-time as text is converted to speech. Ideal for long texts or when you need immediate audio playback.
+
+```bash
+curl --location 'http://localhost:8080/v1/audio/speech' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/gpt-4o-mini-tts",
+    "input": "Hello this is a sample test, respond with hello for my Bifrost",
+    "voice": "alloy",
+    "stream_format": "sse"
+}'
+```
+
+**Response:** Audio chunks are delivered via Server-Sent Events. Each chunk contains base64-encoded audio data that you can decode and play or save progressively.
+
+```
+data: {"audio":"UklGRigAAABXQVZFZm10IBAAAAABAAEA..."}
+
+data: {"audio":"AKlFQVZFZm10IBAAAAABAAEAq..."}
+
+data: [DONE]
+```
+
+**To save the stream:** Add `> audio_stream.txt` to redirect output to a file.
+
+## Speech-to-Text Streaming: Real-time Audio Transcription
+
+Stream audio transcription results as they're processed. Get immediate text output for real-time applications or long audio files.
+
+```bash
+curl --location 'http://localhost:8080/v1/audio/transcriptions' \
+--form 'file=@"/path/to/your/audio.mp3"' \
+--form 'model="openai/gpt-4o-transcribe"' \
+--form 'stream="true"' \
+--form 'response_format="json"'
+```
+
+**Response Format:**
+```
+data: {"text":"Hello"}
+
+data: {"text":" this"}
+
+data: {"text":" is"}
+
+data: {"text":" a sample"}
+
+data: [DONE]
+```
+
+**Additional options:** Add `--form 'language="en"'` or `--form 'prompt="context hint"'` for better accuracy.
+
+## Audio Format Support
+
+**Speech Synthesis:** Supports `"response_format": "mp3"` (default) and `"response_format": "wav"`
+
+**Transcription Input:** Accepts MP3, WAV, M4A, and other common audio formats
+
+> **Note:** Streaming capabilities vary by provider and model. Check each provider's documentation for specific streaming support and limitations.
+
+## Next Steps
+
+Now that you understand streaming responses, explore these related topics:
+
+### Essential Topics
+
+- **[Tool Calling](./tool-calling)** - Enable AI models to use external tools and functions
+- **[Multimodal AI](./multimodal)** - Process images, audio, and multimedia content
+- **[Provider Configuration](./provider-configuration)** - Multiple providers for redundancy
+- **[Integrations](./integrations)** - Drop-in compatibility with existing SDKs
+
+### Advanced Topics
+
+- **[Core Features](../../features/)** - Advanced Bifrost capabilities
+- **[Architecture](../../architecture/)** - How Bifrost works internally
+- **[Deployment](../../deployment-guides)** - Production setup and scaling
--- a/docs/quickstart/gateway/tool-calling.mdx
+++ b/docs/quickstart/gateway/tool-calling.mdx
@@ -0,0 +1,165 @@
+---
+title: "Tool Calling"
+description: "Enable AI models to use external functions and services by defining tool schemas or connecting to Model Context Protocol (MCP) servers. This allows AI to interact with databases, APIs, file systems, and more."
+icon: "wrench"
+---
+
+## Function Calling with Custom Tools
+
+Enable AI models to use external functions by defining tool schemas using OpenAI format. Models can then call these functions automatically based on user requests.
+
+```bash
+curl --location 'http://localhost:8080/v1/chat/completions' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/gpt-4o-mini",
+    "messages": [
+        {"role": "user", "content": "What is 15 + 27? Use the calculator tool."}
+    ],
+    "tools": [
+        {
+            "type": "function",
+            "function": {
+                "name": "calculator",
+                "description": "A calculator tool for basic arithmetic operations",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "operation": {
+                            "type": "string",
+                            "description": "The operation to perform",
+                            "enum": ["add", "subtract", "multiply", "divide"]
+                        },
+                        "a": {
+                            "type": "number",
+                            "description": "The first number"
+                        },
+                        "b": {
+                            "type": "number",
+                            "description": "The second number"
+                        }
+                    },
+                    "required": ["operation", "a", "b"]
+                }
+            }
+        }
+    ],
+    "tool_choice": "auto"
+}'
+```
+
+**Response includes tool calls:**
+```json
+{
+    "choices": [{
+        "message": {
+            "role": "assistant", 
+            "tool_calls": [{
+                "id": "call_abc123",
+                "type": "function",
+                "function": {
+                    "name": "calculator",
+                    "arguments": "{\"operation\":\"add\",\"a\":15,\"b\":27}"
+                }
+            }]
+        }
+    }]
+}
+```
+
+## Connecting to MCP Servers
+
+Connect to Model Context Protocol (MCP) servers to give AI models access to external tools and services without manually defining each function.
+
+<Tabs group="tool-calling">
+<Tab title="Using Web UI">
+![MCP Configuration Interface](../../media/ui-mcp-config.png)
+
+1. Go to **http://localhost:8080**
+2. Navigate to **"MCP Clients"** in the sidebar
+3. Click **"Add MCP Client"**
+4. Enter server details and save
+</Tab>
+
+<Tab title="Using API">
+```bash
+curl --location 'http://localhost:8080/api/mcp/client' \
+--header 'Content-Type: application/json' \
+--data '{
+    "name": "filesystem",
+    "connection_type": "stdio",
+    "stdio_config": {
+        "command": ["npx", "@modelcontextprotocol/server-filesystem", "/tmp"],
+        "args": []
+    }
+}'
+```
+
+**List configured MCP clients:**
+```bash
+curl --location 'http://localhost:8080/api/mcp/clients'
+```
+</Tab>
+
+<Tab title="Using config.json">
+```json
+{
+    "mcp": {
+        "client_configs": [
+            {
+                "name": "filesystem",
+                "connection_type": "stdio",
+                "stdio_config": {
+                    "command": ["npx", "@modelcontextprotocol/server-filesystem", "/tmp"],
+                    "args": []
+                }
+            },
+            {
+                "name": "youtube-search",
+                "connection_type": "http",
+                "connection_string": "http://your-youtube-mcp-url"
+            }
+        ]
+    }
+}
+```
+</Tab>
+
+</Tabs>
+
+Read more about MCP connections and advanced end to end tool execution in the [MCP Features](../../mcp/overview) section.
+
+## Tool Choice Options
+
+Control how the AI uses tools:
+
+```bash
+# Force use of specific tool
+"tool_choice": {
+    "type": "function",
+    "function": {"name": "calculator"}
+}
+
+# Let AI decide automatically (default)
+"tool_choice": "auto"
+
+# Disable tool usage
+"tool_choice": "none"
+```
+
+## Next Steps
+
+Now that you understand tool calling, explore these related topics:
+
+### Essential Topics
+
+- **[Multimodal AI](./multimodal)** - Process images, audio, and multimedia content
+- **[Streaming Responses](./streaming)** - Real-time response generation with tool calls
+- **[Provider Configuration](./provider-configuration)** - Multiple providers for redundancy
+- **[Integrations](./integrations)** - Drop-in compatibility with existing SDKs
+
+### Advanced Topics
+
+- **[MCP Features](../../mcp/overview)** - Advanced MCP server management and configuration
+- **[Core Features](../../features/drop-in-replacement)** - Advanced Bifrost capabilities
+- **[Architecture](../../architecture/core/request-flow)** - How Bifrost works internally
--- a/docs/quickstart/go-sdk/context-keys.mdx
+++ b/docs/quickstart/go-sdk/context-keys.mdx
@@ -0,0 +1,388 @@
+---
+title: "Context Keys"
+description: "Use context keys to configure request behavior, pass metadata, and access response information throughout the request lifecycle."
+icon: "key"
+---
+
+Bifrost uses `BifrostContext` — a custom `context.Context` — to pass configuration and metadata through the request lifecycle. Context keys allow you to customize request behavior, pass request-specific settings, and read metadata set by Bifrost.
+
+The idiomatic pattern is to create a `BifrostContext` and call `SetValue` (or the chainable `WithValue`) directly on it:
+
+```go
+bfCtx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline)
+bfCtx.SetValue(schemas.BifrostContextKeyRequestID, "req-001")
+
+response, err := client.ChatCompletionRequest(bfCtx, &schemas.BifrostChatRequest{...})
+```
+
+## Request Configuration Keys
+
+These keys can be set before making a request to customize behavior.
+
+### Virtual Key
+
+Pass a virtual key identifier to the governance plugin for budget and rate-limit enforcement.
+
+```go
+bfCtx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline)
+bfCtx.SetValue(schemas.BifrostContextKeyVirtualKey, "vk-my-team")
+```
+
+### Extra Headers
+
+Pass custom headers with individual requests. Headers are automatically propagated to the provider.
+
+```go
+bfCtx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline)
+bfCtx.SetValue(schemas.BifrostContextKeyExtraHeaders, map[string][]string{
+    "user-id":    {"user-123"},
+    "session-id": {"session-abc"},
+})
+
+response, err := client.ChatCompletionRequest(bfCtx, &schemas.BifrostChatRequest{
+    Provider: schemas.OpenAI,
+    Model:    "gpt-4o-mini",
+    Input:    messages,
+})
+```
+
+<Note>
+See [Custom Headers Per Request](./provider-configuration#custom-headers-per-request) for detailed information on header handling and security restrictions.
+</Note>
+
+### API Key Selection
+
+Bifrost supports selecting a specific key by **ID** or **name**. When both are present, ID takes priority.
+
+#### By ID
+
+Explicitly select a key by its unique ID.
+
+```go
+bfCtx.SetValue(schemas.BifrostContextKeyAPIKeyID, "key-uuid-1234")
+```
+
+#### By Name
+
+Explicitly select a named API key from your configured keys.
+
+```go
+bfCtx.SetValue(schemas.BifrostContextKeyAPIKeyName, "premium-key")
+```
+
+### Direct Key
+
+Provide credentials directly, bypassing Bifrost's key selection entirely. Useful for dynamic or per-request key scenarios.
+
+```go
+bfCtx.SetValue(schemas.BifrostContextKeyDirectKey, schemas.Key{
+    Value:  "sk-direct-api-key",
+    Models: []string{"gpt-4o"},
+    Weight: 1.0,
+})
+```
+
+### Skip Key Selection
+
+Skip key selection entirely and pass an empty key to the provider. Useful for providers that don't require authentication or when using ambient credentials (e.g., IAM roles).
+
+```go
+bfCtx.SetValue(schemas.BifrostContextKeySkipKeySelection, true)
+```
+
+### Session Stickiness (Session ID)
+
+Bind a session to a specific API key so that requests with the same session ID consistently use the same key. Useful for predictable rate-limit buckets, cost attribution per user, and consistent model routing per session.
+
+On the first request for a session ID, Bifrost selects a key (via weighted random) and caches the binding in the KV store. Subsequent requests with the same session ID reuse the cached key as long as it remains valid. If the cached key is no longer in the supported set (disabled, removed, or model support changed), Bifrost re-selects and overwrites the cache.
+
+<Note>
+Session stickiness requires a `KVStore` to be configured in `BifrostConfig`.
+</Note>
+
+```go
+bfCtx.SetValue(schemas.BifrostContextKeySessionID, "user-123-session-abc")
+```
+
+### Session TTL
+
+Controls how long the session-to-key binding is cached. If not set, Bifrost uses a default TTL of 1 hour. The TTL is refreshed on each request so active sessions do not expire.
+
+```go
+bfCtx.SetValue(schemas.BifrostContextKeySessionTTL, 30*time.Minute)
+```
+
+### Request ID
+
+Set a custom request ID for tracking and correlation.
+
+```go
+bfCtx.SetValue(schemas.BifrostContextKeyRequestID, "req-12345-abc")
+```
+
+### Custom URL Path
+
+Append a custom path to the provider's base URL. Useful for accessing provider-specific endpoints.
+
+```go
+bfCtx.SetValue(schemas.BifrostContextKeyURLPath, "/custom/endpoint")
+```
+
+### Stream Idle Timeout
+
+Set a per-chunk idle timeout for streaming responses. If no chunk arrives within this duration, the stream is considered stalled and cancelled.
+
+```go
+bfCtx.SetValue(schemas.BifrostContextKeyStreamIdleTimeout, 10*time.Second)
+```
+
+### Raw Request Body
+
+Send a raw request body instead of Bifrost's standardized format. The provider receives your payload as-is. You must both set the context key AND populate `RawRequestBody` on the request.
+
+```go
+rawPayload := []byte(`{
+    "model": "gpt-4o",
+    "messages": [{"role": "user", "content": "Hello!"}],
+    "custom_field": "provider-specific-value"
+}`)
+
+bfCtx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline)
+bfCtx.SetValue(schemas.BifrostContextKeyUseRawRequestBody, true)
+
+response, err := client.ChatCompletionRequest(bfCtx, &schemas.BifrostChatRequest{
+    Provider:       schemas.OpenAI,
+    Model:          "gpt-4o",
+    RawRequestBody: rawPayload,
+})
+```
+
+<Note>
+When using raw request body, Bifrost bypasses its request conversion and sends your payload directly to the provider. You are responsible for ensuring the payload matches the provider's expected format.
+</Note>
+
+### Send Back Raw Request/Response
+
+Include the original request or response bytes in `ExtraFields` for debugging.
+
+```go
+bfCtx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline)
+bfCtx.SetValue(schemas.BifrostContextKeySendBackRawRequest, true)
+bfCtx.SetValue(schemas.BifrostContextKeySendBackRawResponse, true)
+
+response, _ := client.ChatCompletionRequest(bfCtx, request)
+if response.ChatResponse != nil {
+    rawReq := response.ChatResponse.ExtraFields.RawRequest
+    rawResp := response.ChatResponse.ExtraFields.RawResponse
+}
+```
+
+### Passthrough Extra Parameters
+
+When enabled, any parameters in `ExtraParams` are merged directly into the JSON body sent to the provider, bypassing Bifrost's parameter filtering. Useful for provider-specific parameters that Bifrost doesn't natively support.
+
+```go
+bfCtx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline)
+bfCtx.SetValue(schemas.BifrostContextKeyPassthroughExtraParams, true)
+
+response, err := client.ChatCompletionRequest(bfCtx, &schemas.BifrostChatRequest{
+    Provider: schemas.OpenAI,
+    Model:    "gpt-4o-mini",
+    Input:    messages,
+    Params: &schemas.ChatParameters{
+        ExtraParams: map[string]interface{}{
+            "custom_param": "value",
+            "another_param": 123,
+            "nested_param": map[string]interface{}{
+                "nested_key": "nested_value",
+            },
+        },
+    },
+})
+```
+
+<Note>
+- This feature only works for JSON requests, not multipart/form-data requests
+- Parameters already handled by Bifrost are not duplicated — they appear in their proper location
+- Nested parameters are merged recursively with existing nested structures
+</Note>
+
+## MCP Context Keys
+
+These keys control MCP tool execution behavior on a per-request basis. Request-level filtering takes priority over client-level configuration.
+
+### Include Clients
+
+Restrict which MCP clients can provide tools for this request. Pass `[]string{"*"}` to include all clients, or an empty slice to exclude all.
+
+```go
+bfCtx.SetValue(schemas.MCPContextKeyIncludeClients, []string{"github", "filesystem"})
+```
+
+### Include Tools
+
+Restrict which tools are available for this request. Use `"clientName-toolName"` format for individual tools or `"clientName-*"` as a wildcard for all tools from a client.
+
+```go
+// Allow only the search tool from the github client
+bfCtx.SetValue(schemas.MCPContextKeyIncludeTools, []string{"github-search_repositories"})
+
+// Allow all tools from filesystem client
+bfCtx.SetValue(schemas.MCPContextKeyIncludeTools, []string{"filesystem-*"})
+```
+
+### MCP Extra Headers
+
+Forward additional headers to MCP servers during tool execution. Only headers present in the MCP client's configured allowlist are forwarded.
+
+```go
+bfCtx.SetValue(schemas.BifrostContextKeyMCPExtraHeaders, map[string][]string{
+    "x-user-id":    {"user-123"},
+    "x-session-id": {"session-abc"},
+})
+```
+
+## Response Metadata Keys
+
+These keys are set by Bifrost and can be read from the context after a request completes. They are particularly useful in plugins and post-hooks.
+
+### Selected Key Information
+
+After Bifrost selects an API key, it stores the selection details in the context.
+
+```go
+keyID := ctx.Value(schemas.BifrostContextKeySelectedKeyID).(string)
+keyName := ctx.Value(schemas.BifrostContextKeySelectedKeyName).(string)
+```
+
+### Retry and Fallback Information
+
+Track retry attempts and fallback progression.
+
+```go
+// Number of retries attempted (0 = first attempt)
+retries := ctx.Value(schemas.BifrostContextKeyNumberOfRetries).(int)
+
+// Fallback index (0 = primary, 1 = first fallback, etc.)
+fallbackIdx := ctx.Value(schemas.BifrostContextKeyFallbackIndex).(int)
+
+// Request ID used for the fallback attempt
+fallbackReqID := ctx.Value(schemas.BifrostContextKeyFallbackRequestID).(string)
+```
+
+### Stream End Indicator
+
+For streaming responses, indicates when the stream has completed. Set by Bifrost automatically.
+
+```go
+isStreamEnd := ctx.Value(schemas.BifrostContextKeyStreamEndIndicator).(bool)
+```
+
+<Note>
+Plugin developers: When implementing a short-circuit streaming response in `PreLLMHook` or `PostLLMHook`, set `BifrostContextKeyStreamEndIndicator` to `true` on the last chunk to trigger proper cleanup.
+</Note>
+
+### Integration Type
+
+Identifies which SDK integration format is in use (useful in gateway plugins).
+
+```go
+integrationType := ctx.Value(schemas.BifrostContextKeyIntegrationType).(string)
+// e.g., "openai", "anthropic", "bedrock"
+```
+
+## Complete Example
+
+```go
+package main
+
+import (
+    "context"
+    "fmt"
+    "log"
+    "time"
+
+    "github.com/maximhq/bifrost"
+    "github.com/maximhq/bifrost/core/schemas"
+)
+
+func makeRequest(client *bifrost.Bifrost) {
+    bfCtx := schemas.NewBifrostContext(context.Background(), schemas.NoDeadline)
+
+    // Request tracking
+    bfCtx.SetValue(schemas.BifrostContextKeyRequestID, "req-001")
+
+    // Custom headers forwarded to the provider
+    bfCtx.SetValue(schemas.BifrostContextKeyExtraHeaders, map[string][]string{
+        "x-correlation-id": {"corr-12345"},
+        "x-tenant-id":      {"tenant-abc"},
+    })
+
+    // Include raw provider response for debugging
+    bfCtx.SetValue(schemas.BifrostContextKeySendBackRawResponse, true)
+
+    // Restrict MCP tools to a specific client
+    bfCtx.SetValue(schemas.MCPContextKeyIncludeClients, []string{"filesystem"})
+
+    messages := []schemas.BifrostMessage{
+        {Role: "user", Content: &schemas.BifrostMessageContent{Text: bifrost.Ptr("Hello!")}},
+    }
+
+    response, err := client.ChatCompletionRequest(bfCtx, &schemas.BifrostChatRequest{
+        Provider: schemas.OpenAI,
+        Model:    "gpt-4o-mini",
+        Input:    messages,
+    })
+    if err != nil {
+        log.Printf("Request failed: %v", err)
+        return
+    }
+
+    if response.ChatResponse != nil {
+        extra := response.ChatResponse.ExtraFields
+        fmt.Printf("Provider: %s\n", extra.Provider)
+        fmt.Printf("Latency: %dms\n", extra.Latency)
+        if extra.RawResponse != nil {
+            fmt.Println("Raw response captured for debugging")
+        }
+    }
+}
+```
+
+## Context Keys Reference
+
+| Key | Type | Direction | Description |
+|-----|------|-----------|-------------|
+| `BifrostContextKeyVirtualKey` | `string` | Set | Virtual key identifier for governance |
+| `BifrostContextKeyAPIKeyName` | `string` | Set | Explicit API key name selection |
+| `BifrostContextKeyAPIKeyID` | `string` | Set | Explicit API key ID selection (priority over name) |
+| `BifrostContextKeyRequestID` | `string` | Set | Custom request ID for tracking |
+| `BifrostContextKeyFallbackRequestID` | `string` | Read | Request ID used for fallback attempt |
+| `BifrostContextKeyDirectKey` | `schemas.Key` | Set | Provide credentials directly, bypassing key selection |
+| `BifrostContextKeySkipKeySelection` | `bool` | Set | Skip key selection entirely |
+| `BifrostContextKeySessionID` | `string` | Set | Session ID for key stickiness (requires KV store) |
+| `BifrostContextKeySessionTTL` | `time.Duration` | Set | TTL for session-to-key cache (default: 1 hour) |
+| `BifrostContextKeyExtraHeaders` | `map[string][]string` | Set | Custom headers forwarded to the provider |
+| `BifrostContextKeyURLPath` | `string` | Set | Custom URL path appended to provider base URL |
+| `BifrostContextKeyStreamIdleTimeout` | `time.Duration` | Set | Per-chunk idle timeout for streaming responses |
+| `BifrostContextKeyUseRawRequestBody` | `bool` | Set | Send raw request body directly to provider |
+| `BifrostContextKeySendBackRawRequest` | `bool` | Set | Include raw request in `ExtraFields` |
+| `BifrostContextKeySendBackRawResponse` | `bool` | Set | Include raw provider response in `ExtraFields` |
+| `BifrostContextKeyPassthroughExtraParams` | `bool` | Set | Merge `ExtraParams` directly into provider request |
+| `MCPContextKeyIncludeClients` | `[]string` | Set | Allowlist of MCP client names for this request |
+| `MCPContextKeyIncludeTools` | `[]string` | Set | Allowlist of MCP tools (`"client-tool"` or `"client-*"`) |
+| `BifrostContextKeyMCPExtraHeaders` | `map[string][]string` | Set | Extra headers forwarded to MCP servers during tool execution |
+| `BifrostContextKeySelectedKeyID` | `string` | Read | ID of the key selected by Bifrost |
+| `BifrostContextKeySelectedKeyName` | `string` | Read | Name of the key selected by Bifrost |
+| `BifrostContextKeyNumberOfRetries` | `int` | Read | Number of retry attempts made |
+| `BifrostContextKeyFallbackIndex` | `int` | Read | Current fallback index (0 = primary) |
+| `BifrostContextKeyStreamEndIndicator` | `bool` | Read | Whether the stream has completed |
+| `BifrostContextKeyIntegrationType` | `string` | Read | SDK integration format in use (e.g. `"openai"`) |
+| `BifrostContextKeyUserAgent` | `string` | Read | User agent of the incoming request |
+
+## Next Steps
+
+- **[Provider Configuration](./provider-configuration)** - Configure providers and keys
+- **[Streaming Responses](./streaming)** - Real-time response handling
+- **[Tool Calling](./tool-calling)** - Enable AI function calling
+- **[Core Features](../../features/)** - Advanced Bifrost capabilities
--- a/docs/quickstart/go-sdk/logger.mdx
+++ b/docs/quickstart/go-sdk/logger.mdx
@@ -0,0 +1,303 @@
+---
+title: "Logging"
+description: "Configure logging for debugging, monitoring, and troubleshooting your Bifrost integration."
+icon: "file-lines"
+---
+
+Bifrost provides a flexible logging system with configurable log levels and output formats. You can use the built-in default logger or implement your own custom logger.
+
+## Using the Default Logger
+
+Bifrost includes a `DefaultLogger` that writes to stdout/stderr with timestamps. Create one with your desired log level:
+
+```go
+import (
+    "github.com/maximhq/bifrost"
+    "github.com/maximhq/bifrost/core/schemas"
+)
+
+func main() {
+    // Create logger with desired level
+    logger := bifrost.NewDefaultLogger(schemas.LogLevelInfo)
+
+    // Initialize Bifrost with the logger
+    client, err := bifrost.Init(schemas.BifrostConfig{
+        Account: &MyAccount{},
+        Logger:  logger,
+    })
+    if err != nil {
+        panic(err)
+    }
+}
+```
+
+## Log Levels
+
+Bifrost supports four log levels, from most to least verbose:
+
+| Level | Constant | Description |
+|-------|----------|-------------|
+| Debug | `schemas.LogLevelDebug` | Detailed debugging information for development |
+| Info | `schemas.LogLevelInfo` | General operational messages |
+| Warn | `schemas.LogLevelWarn` | Potentially harmful situations |
+| Error | `schemas.LogLevelError` | Serious problems requiring attention |
+
+```go
+// Debug level - most verbose, includes all messages
+logger := bifrost.NewDefaultLogger(schemas.LogLevelDebug)
+
+// Info level - general operational messages
+logger := bifrost.NewDefaultLogger(schemas.LogLevelInfo)
+
+// Warn level - only warnings and errors
+logger := bifrost.NewDefaultLogger(schemas.LogLevelWarn)
+
+// Error level - only errors (least verbose)
+logger := bifrost.NewDefaultLogger(schemas.LogLevelError)
+```
+
+You can change the log level at runtime:
+
+```go
+logger.SetLevel(schemas.LogLevelDebug)
+```
+
+## Output Formats
+
+The default logger supports two output formats:
+
+### JSON Output (Default)
+
+Structured JSON logs, ideal for log aggregation systems:
+
+```go
+logger := bifrost.NewDefaultLogger(schemas.LogLevelInfo)
+logger.SetOutputType(schemas.LoggerOutputTypeJSON)
+```
+
+Output example:
+```json
+{"level":"info","time":"2024-01-15T10:30:00Z","message":"Request completed"}
+```
+
+### Pretty Output
+
+Human-readable colored output, ideal for development:
+
+```go
+logger := bifrost.NewDefaultLogger(schemas.LogLevelInfo)
+logger.SetOutputType(schemas.LoggerOutputTypePretty)
+```
+
+Output example:
+```
+10:30:00 INF Request completed
+```
+
+## Custom Logger Implementation
+
+Implement the `Logger` interface to integrate with your existing logging infrastructure:
+
+```go
+type Logger interface {
+    Debug(msg string, args ...any)
+    Info(msg string, args ...any)
+    Warn(msg string, args ...any)
+    Error(msg string, args ...any)
+    Fatal(msg string, args ...any)
+    SetLevel(level schemas.LogLevel)
+    SetOutputType(outputType schemas.LoggerOutputType)
+}
+```
+
+### Example: Zap Logger Integration
+
+```go
+import (
+    "go.uber.org/zap"
+    "github.com/maximhq/bifrost/core/schemas"
+)
+
+type ZapLogger struct {
+    logger *zap.SugaredLogger
+    level  zap.AtomicLevel
+}
+
+func NewZapLogger() *ZapLogger {
+    level := zap.NewAtomicLevelAt(zap.InfoLevel)
+    config := zap.NewProductionConfig()
+    config.Level = level
+    logger, _ := config.Build()
+    return &ZapLogger{
+        logger: logger.Sugar(),
+        level:  level,
+    }
+}
+
+func (l *ZapLogger) Debug(msg string, args ...any) {
+    l.logger.Debugf(msg, args...)
+}
+
+func (l *ZapLogger) Info(msg string, args ...any) {
+    l.logger.Infof(msg, args...)
+}
+
+func (l *ZapLogger) Warn(msg string, args ...any) {
+    l.logger.Warnf(msg, args...)
+}
+
+func (l *ZapLogger) Error(msg string, args ...any) {
+    l.logger.Errorf(msg, args...)
+}
+
+func (l *ZapLogger) Fatal(msg string, args ...any) {
+    l.logger.Fatalf(msg, args...)
+}
+
+func (l *ZapLogger) SetLevel(level schemas.LogLevel) {
+    switch level {
+    case schemas.LogLevelDebug:
+        l.level.SetLevel(zap.DebugLevel)
+    case schemas.LogLevelInfo:
+        l.level.SetLevel(zap.InfoLevel)
+    case schemas.LogLevelWarn:
+        l.level.SetLevel(zap.WarnLevel)
+    case schemas.LogLevelError:
+        l.level.SetLevel(zap.ErrorLevel)
+    }
+}
+
+func (l *ZapLogger) SetOutputType(outputType schemas.LoggerOutputType) {
+    // Zap handles output format via encoder configuration
+}
+```
+
+### Example: Logrus Integration
+
+```go
+import (
+    "github.com/sirupsen/logrus"
+    "github.com/maximhq/bifrost/core/schemas"
+)
+
+type LogrusLogger struct {
+    logger *logrus.Logger
+}
+
+func NewLogrusLogger() *LogrusLogger {
+    logger := logrus.New()
+    logger.SetLevel(logrus.InfoLevel)
+    return &LogrusLogger{logger: logger}
+}
+
+func (l *LogrusLogger) Debug(msg string, args ...any) {
+    l.logger.Debugf(msg, args...)
+}
+
+func (l *LogrusLogger) Info(msg string, args ...any) {
+    l.logger.Infof(msg, args...)
+}
+
+func (l *LogrusLogger) Warn(msg string, args ...any) {
+    l.logger.Warnf(msg, args...)
+}
+
+func (l *LogrusLogger) Error(msg string, args ...any) {
+    l.logger.Errorf(msg, args...)
+}
+
+func (l *LogrusLogger) Fatal(msg string, args ...any) {
+    l.logger.Fatalf(msg, args...)
+}
+
+func (l *LogrusLogger) SetLevel(level schemas.LogLevel) {
+    switch level {
+    case schemas.LogLevelDebug:
+        l.logger.SetLevel(logrus.DebugLevel)
+    case schemas.LogLevelInfo:
+        l.logger.SetLevel(logrus.InfoLevel)
+    case schemas.LogLevelWarn:
+        l.logger.SetLevel(logrus.WarnLevel)
+    case schemas.LogLevelError:
+        l.logger.SetLevel(logrus.ErrorLevel)
+    }
+}
+
+func (l *LogrusLogger) SetOutputType(outputType schemas.LoggerOutputType) {
+    switch outputType {
+    case schemas.LoggerOutputTypeJSON:
+        l.logger.SetFormatter(&logrus.JSONFormatter{})
+    case schemas.LoggerOutputTypePretty:
+        l.logger.SetFormatter(&logrus.TextFormatter{
+            FullTimestamp: true,
+        })
+    }
+}
+```
+
+## Using Your Custom Logger
+
+Pass your custom logger to Bifrost during initialization:
+
+```go
+client, err := bifrost.Init(schemas.BifrostConfig{
+    Account: &MyAccount{},
+    Logger:  NewZapLogger(),  // or NewLogrusLogger()
+})
+```
+
+## Disabling Logging
+
+To disable logging, implement a no-op logger:
+
+```go
+type NoOpLogger struct{}
+
+func (l *NoOpLogger) Debug(msg string, args ...any)                   {}
+func (l *NoOpLogger) Info(msg string, args ...any)                    {}
+func (l *NoOpLogger) Warn(msg string, args ...any)                    {}
+func (l *NoOpLogger) Error(msg string, args ...any)                   {}
+func (l *NoOpLogger) Fatal(msg string, args ...any)                   {}
+func (l *NoOpLogger) SetLevel(level schemas.LogLevel)                 {}
+func (l *NoOpLogger) SetOutputType(outputType schemas.LoggerOutputType) {}
+
+// Use it
+client, err := bifrost.Init(schemas.BifrostConfig{
+    Account: &MyAccount{},
+    Logger:  &NoOpLogger{},
+})
+```
+
+## Best Practices
+
+### Development vs Production
+
+```go
+func createLogger(env string) schemas.Logger {
+    logger := bifrost.NewDefaultLogger(schemas.LogLevelInfo)
+
+    if env == "development" {
+        logger.SetLevel(schemas.LogLevelDebug)
+        logger.SetOutputType(schemas.LoggerOutputTypePretty)
+    } else {
+        logger.SetLevel(schemas.LogLevelInfo)
+        logger.SetOutputType(schemas.LoggerOutputTypeJSON)
+    }
+
+    return logger
+}
+```
+
+### Log Level Guidelines
+
+- **Debug**: Use during development to trace request flow, inspect payloads, and diagnose issues
+- **Info**: Use for normal operational events like successful requests, provider switches
+- **Warn**: Use for recoverable issues like retries, fallback activations, deprecated usage
+- **Error**: Use for failures that need attention but don't crash the application
+
+## Next Steps
+
+- **[Context Keys](./context-keys)** - Pass metadata through requests
+- **[Provider Configuration](./provider-configuration)** - Configure multiple providers
+- **[Streaming Responses](./streaming)** - Real-time response handling
+- **[Core Features](../../features/)** - Advanced Bifrost capabilities
--- a/docs/quickstart/go-sdk/multimodal.mdx
+++ b/docs/quickstart/go-sdk/multimodal.mdx
@@ -0,0 +1,393 @@
+---
+title: "Multimodal Support"
+description: "Process multiple types of content including images, audio, and text with AI models. Bifrost supports vision analysis, image generation, speech synthesis, and audio transcription across various providers."
+icon: "images"
+---
+
+## Vision: Analyzing Images with AI
+
+Send images to vision-capable models for analysis, description, and understanding. This example shows how to analyze an image from a URL using GPT-4o with high detail processing for better accuracy.
+
+```go
+response, err := client.ChatCompletionRequest(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), &schemas.BifrostChatRequest{
+	Provider: schemas.OpenAI,
+	Model:    "gpt-4o", // Using vision-capable model
+	Input: []schemas.ChatMessage{
+		{
+			Role: schemas.ChatMessageRoleUser,
+			Content: &schemas.ChatMessageContent{
+				ContentBlocks: []schemas.ChatContentBlock{
+					{
+						Type: schemas.ChatContentBlockTypeText,
+						Text: schemas.Ptr("What do you see in this image? Please describe it in detail."),
+					},
+					{
+						Type: schemas.ChatContentBlockTypeImage,
+						ImageURLStruct: &schemas.ChatInputImage{
+							URL:    "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
+							Detail: schemas.Ptr("high"), // Optional: can be "low", "high", or "auto"
+						},
+					},
+				},
+			},
+		},
+	},
+})
+
+if err != nil {
+	panic(err)
+}
+
+fmt.Println("Response:", *response.Choices[0].Message.Content.ContentStr)
+```
+
+## Image Generation: Generating Images with AI
+
+Generate images from text prompts using OpenAI-compatible image generation models via the Go SDK.
+
+```go
+response, err := client.ImageGenerationRequest(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), &schemas.BifrostImageGenerationRequest{
+	Provider: schemas.OpenAI,
+	Model:    "dall-e-3",
+	Input: &schemas.ImageGenerationInput{
+		Prompt: "A futuristic city skyline at sunset with flying cars",
+	},
+	Params: &schemas.ImageGenerationParameters{
+		Size:           schemas.Ptr("1024x1024"),
+		ResponseFormat: schemas.Ptr("url"),
+	},
+})
+
+if err != nil {
+	panic(err)
+}
+
+// Handle image generation response
+if len(response.Data) > 0 {
+	imageData := response.Data[0]
+	
+	// Handle URL response (when response_format is "url")
+	if imageData.URL != "" {
+		fmt.Printf("Generated image URL: %s\n", imageData.URL)
+	}
+	
+	// Handle base64-encoded response (when response_format is "b64_json")
+	if imageData.B64JSON != "" {
+		fmt.Printf("Generated base64 image (length: %d)\n", len(imageData.B64JSON))
+	}
+	
+	// Handle revised prompt if present
+	if imageData.RevisedPrompt != "" {
+		fmt.Printf("Revised prompt: %s\n", imageData.RevisedPrompt)
+	}
+}
+
+// Handle usage metrics
+// Note: For image generation endpoints, response.Usage and Usage.TotalTokens may be empty/not populated
+// as token-based usage metrics are not provided by some image-generation providers
+if response.Usage != nil {
+	fmt.Printf("Usage: %d tokens\n", response.Usage.TotalTokens)
+}
+```
+
+## Audio Understanding: Analyzing Audio with AI
+
+If your chat application supports text input, you can add audio input and output—just include audio in the modalities array and use an audio model, like gpt-4o-audio-preview.
+
+### Audio Input to Model
+
+```go
+response, err := client.ChatCompletionRequest(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), &schemas.BifrostChatRequest{
+	Provider: schemas.OpenAI,
+	Model:    "gpt-4o-audio-preview",
+	Input: []schemas.ChatMessage{
+		{
+			Role: schemas.ChatMessageRoleUser,
+			Content: &schemas.ChatMessageContent{
+				ContentBlocks: []schemas.ChatContentBlock{
+					{
+						Type: schemas.ChatContentBlockTypeText,
+						Text: schemas.Ptr("Please analyze this audio recording and summarize what was discussed."),
+					},
+					{
+						Type: schemas.ChatContentBlockTypeInputAudio,
+						InputAudio: &schemas.ChatInputAudio{
+							Data:   []byte("base64-encoded audio data containing the word 'Affirmative'"),
+							Format: "wav",
+						},
+					},
+				},
+			},
+		},
+	},
+})
+```
+
+## Text-to-Speech: Converting Text to Audio
+
+Convert text into natural-sounding speech using AI voice models. This example demonstrates generating an MP3 audio file from text using the "alloy" voice. The result is saved to a local file for playback.
+
+```go
+response, err := client.SpeechRequest(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), &schemas.BifrostSpeechRequest{
+	Provider: schemas.OpenAI,
+	Model:    "tts-1", // Using text-to-speech model
+	Input: &schemas.SpeechInput{
+		Input: "Hello! This is a sample text that will be converted to speech using Bifrost's speech synthesis capabilities. The weather today is wonderful, and I hope you're having a great day!",
+	},
+	Params: &schemas.SpeechParameters{
+		VoiceConfig: &schemas.SpeechVoiceInput{
+			Voice: schemas.Ptr("alloy"),
+		},
+		ResponseFormat: schemas.Ptr("mp3"),
+	},
+})
+
+if err != nil {
+	panic(err)
+}
+
+// Handle speech synthesis response
+if response.Speech != nil && len(response.Speech.Audio) > 0 {
+	// Save the audio to a file
+	filename := "output.mp3"
+	err := os.WriteFile("output.mp3", response.Speech.Audio, 0644)
+	if err != nil {
+		panic(fmt.Sprintf("Failed to save audio file: %v", err))
+	}
+
+	fmt.Printf("Speech synthesis successful! Audio saved to %s, file size: %d bytes\n", filename, len(response.Speech.Audio))
+}
+```
+
+## Speech-to-Text: Transcribing Audio Files
+
+Convert audio files into text using AI transcription models. This example shows how to transcribe an MP3 file using OpenAI's Whisper model, with an optional context prompt to improve accuracy.
+
+```go
+// Read the audio file for transcription
+audioFilename := "output.mp3"
+audioData, err := os.ReadFile(audioFilename)
+if err != nil {
+	panic(fmt.Sprintf("Failed to read audio file %s: %v. Please make sure the file exists.", audioFilename, err))
+}
+
+fmt.Printf("Loaded audio file %s (%d bytes) for transcription...\n", audioFilename, len(audioData))
+
+response, err := client.TranscriptionRequest(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), &schemas.BifrostTranscriptionRequest{
+	Provider: schemas.OpenAI,
+	Model:    "whisper-1", // Using Whisper model for transcription
+	Input: &schemas.TranscriptionInput{
+		File: audioData,
+	},
+	Params: &schemas.TranscriptionParameters{
+		Prompt: schemas.Ptr("This is a sample audio transcription from Bifrost speech synthesis."), // Optional: provide context
+	},
+})
+
+if err != nil {
+	panic(err)
+}
+
+fmt.Printf("Transcription Result: %s\n", response.Transcribe.Text)
+```
+
+## Advanced Vision Examples
+
+### Multiple Images
+
+Send multiple images in a single request for comparison or analysis. This is useful for comparing products, analyzing changes over time, or understanding relationships between different visual elements.
+
+```go
+response, err := client.ChatCompletionRequest(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), &schemas.BifrostChatRequest{
+	Provider: schemas.OpenAI,
+	Model:    "gpt-4o",
+	Input: []schemas.ChatMessage{
+		{
+			Role: schemas.ChatMessageRoleUser,
+			Content: &schemas.ChatMessageContent{
+				ContentBlocks: []schemas.ChatContentBlock{
+					{
+						Type: schemas.ChatContentBlockTypeText,
+						Text: schemas.Ptr("Compare these two images. What are the differences?"),
+					},
+					{
+						Type: schemas.ChatContentBlockTypeImage,
+						ImageURLStruct: &schemas.ChatInputImage{
+							URL: "https://example.com/image1.jpg",
+						},
+					},
+					{
+						Type: schemas.ChatContentBlockTypeImage,
+						ImageURLStruct: &schemas.ChatInputImage{
+							URL: "https://example.com/image2.jpg",
+						},
+					},
+				},
+			},
+		},
+	},
+})
+```
+
+### Base64 Images
+
+Process local images by encoding them as base64 data URLs. This approach is ideal when you need to analyze images stored locally on your system without uploading them to external URLs first.
+
+```go
+// Read and encode image
+imageData, err := os.ReadFile("local_image.jpg")
+if err != nil {
+	panic(err)
+}
+base64Image := base64.StdEncoding.EncodeToString(imageData)
+dataURL := fmt.Sprintf("data:image/jpeg;base64,%s", base64Image)
+
+response, err := client.ChatCompletionRequest(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), &schemas.BifrostChatRequest{
+	Provider: schemas.OpenAI,
+	Model:    "gpt-4o",
+	Input: []schemas.ChatMessage{
+		{
+			Role: schemas.ChatMessageRoleUser,
+			Content: &schemas.ChatMessageContent{
+				ContentBlocks: []schemas.ChatContentBlock{
+					{
+						Type: schemas.ChatContentBlockTypeText,
+						Text: schemas.Ptr("Analyze this image and describe what you see."),
+					},
+					{
+						Type: schemas.ChatContentBlockTypeImage,
+						ImageURLStruct: &schemas.ChatInputImage{
+							URL:    dataURL,
+							Detail: schemas.Ptr("high"),
+						},
+					},
+				},
+			},
+		},
+	},
+})
+```
+
+## Audio Configuration Options
+
+### Voice Selection for Speech Synthesis
+
+OpenAI provides six distinct voice options, each with different characteristics. This example generates sample audio files for each voice so you can compare and choose the one that best fits your application.
+
+```go
+// Available voices: alloy, echo, fable, onyx, nova, shimmer
+voices := []string{"alloy", "echo", "fable", "onyx", "nova", "shimmer"}
+
+for _, voice := range voices {
+	response, err := client.SpeechRequest(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), &schemas.BifrostSpeechRequest{
+		Provider: schemas.OpenAI,
+		Model:    "tts-1",
+		Input: &schemas.SpeechInput{
+			Input: fmt.Sprintf("This is the %s voice speaking.", voice),
+		},
+		Params: &schemas.SpeechParameters{
+			VoiceConfig: &schemas.SpeechVoiceInput{
+				Voice: schemas.Ptr(voice),
+			},
+			ResponseFormat: schemas.Ptr("mp3"),
+		},
+	})
+	
+	if err == nil && response.Speech != nil {
+		filename := fmt.Sprintf("sample_%s.mp3", voice)
+		os.WriteFile(filename, response.Speech.Audio, 0644)
+		fmt.Printf("Generated %s\n", filename)
+	}
+}
+```
+
+### Audio Formats
+
+Generate audio in different formats depending on your use case. MP3 for general use, Opus for web streaming, AAC for mobile apps, and FLAC for high-quality audio applications.
+
+```go
+formats := []string{"mp3", "opus", "aac", "flac"}
+
+for _, format := range formats {
+	response, err := client.SpeechRequest(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), &schemas.BifrostSpeechRequest{
+		Provider: schemas.OpenAI,
+		Model:    "tts-1",
+		Input: &schemas.SpeechInput{
+			Input: "Testing different audio formats.",
+		},
+		Params: &schemas.SpeechParameters{
+			VoiceConfig: &schemas.SpeechVoiceInput{
+				Voice: schemas.Ptr("alloy"),
+			},
+			ResponseFormat: schemas.Ptr(format),
+		}
+	})
+	
+	if err == nil && response.Speech != nil {
+		filename := fmt.Sprintf("output.%s", format)
+		os.WriteFile(filename, response.Speech.Audio, 0644)
+	}
+}
+```
+
+## Transcription Options
+
+### Language Specification
+
+Improve transcription accuracy by specifying the source language. This is particularly helpful for non-English audio or when the audio contains technical terms or specific domain vocabulary.
+
+```go
+response, err := client.TranscriptionRequest(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), &schemas.BifrostTranscriptionRequest{
+	Provider: schemas.OpenAI,
+	Model:    "whisper-1",
+	Input: &schemas.TranscriptionInput{
+		File: audioData,
+	},
+	Params: &schemas.TranscriptionParameters{
+		Language: schemas.Ptr("es"), // Spanish
+		Prompt:   schemas.Ptr("This is a Spanish audio recording about technology."),
+	},
+})
+```
+
+### Response Formats
+
+Choose between simple text output or detailed JSON responses with timestamps. The verbose JSON format provides word-level and segment-level timing information, useful for creating subtitles or analyzing speech patterns.
+
+```go
+// Text only
+response, err := client.TranscriptionRequest(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), &schemas.BifrostTranscriptionRequest{
+	Provider: schemas.OpenAI,
+	Model:    "whisper-1",
+	Input: &schemas.TranscriptionInput{
+		File: audioData,
+	},
+	Params: &schemas.TranscriptionParameters{
+		ResponseFormat: schemas.Ptr("text"),
+	},
+})
+
+// JSON with timestamps
+response, err := client.TranscriptionRequest(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), &schemas.BifrostTranscriptionRequest{
+	Provider: schemas.OpenAI,
+	Model:    "whisper-1",
+	Input: &schemas.TranscriptionInput{
+		File: audioData,
+	},
+	Params: &schemas.TranscriptionParameters{
+		ResponseFormat:             schemas.Ptr("verbose_json"),
+		TimestampGranularities:     []string{"word", "segment"},
+	},
+})
+```
+
+<Info>
+Check the [Supported Providers](/providers/supported-providers/overview) page for more information on multimodal capabilities supported by each provider.
+</Info>
+
+## Next Steps
+
+- **[Streaming Responses](./streaming)** - Real-time multimodal processing
+- **[Tool Calling](./tool-calling)** - Combine with external tools
+- **[Provider Configuration](./provider-configuration)** - Multiple providers for different capabilities
+- **[Core Features](../../features/)** - Advanced Bifrost capabilities
--- a/docs/quickstart/go-sdk/provider-configuration.mdx
+++ b/docs/quickstart/go-sdk/provider-configuration.mdx
@@ -0,0 +1,504 @@
+---
+title: "Provider Configuration"
+description: "Configure multiple AI providers for custom concurrency, queue sizes, proxy settings, and more."
+icon: "sliders"
+---
+
+## Multi-Provider Setup
+
+Configure multiple providers to seamlessly switch between them. This example shows how to configure OpenAI, Anthropic, and Mistral providers.
+
+```go
+type MyAccount struct{}
+
+func (a *MyAccount) GetConfiguredProviders() ([]schemas.ModelProvider, error) {
+    return []schemas.ModelProvider{schemas.OpenAI, schemas.Anthropic, schemas.Mistral}, nil
+}
+
+func (a *MyAccount) GetKeysForProvider(ctx *context.Context, provider schemas.ModelProvider) ([]schemas.Key, error) {
+    switch provider {
+    case schemas.OpenAI:
+        return []schemas.Key{{
+            Value:  os.Getenv("OPENAI_API_KEY"),
+            Models: []string{},
+            Weight: 1.0,
+        }}, nil
+    case schemas.Anthropic:
+        return []schemas.Key{{
+            Value:  os.Getenv("ANTHROPIC_API_KEY"),
+            Models: []string{},
+            Weight: 1.0,
+        }}, nil
+    case schemas.Mistral:
+        return []schemas.Key{{
+            Value:  os.Getenv("MISTRAL_API_KEY"),
+            Models: []string{},
+            Weight: 1.0,
+        }}, nil
+    }
+    return nil, fmt.Errorf("provider %s not supported", provider)
+}
+
+func (a *MyAccount) GetConfigForProvider(provider schemas.ModelProvider) (*schemas.ProviderConfig, error) {
+    // Return same config for all providers
+    return &schemas.ProviderConfig{
+            NetworkConfig:            schemas.DefaultNetworkConfig,
+            ConcurrencyAndBufferSize: schemas.DefaultConcurrencyAndBufferSize,
+    }, nil
+}
+```
+
+> If Bifrost receives a new provider at runtime (i.e., one that is not returned by `GetConfiguredProviders()` initially on `bifrost.Init()`), it will set up the provider at runtime using `GetConfigForProvider()`, which may cause a delay in the first request to that provider.
+
+## Making Requests
+
+Once providers are configured, you can make requests to any specific provider. This example shows how to send a request directly to Mistral's latest vision model. Bifrost handles the provider-specific API formatting automatically.
+
+```go
+response, err := client.ChatCompletionRequest(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), &schemas.BifrostChatRequest{
+    Provider: schemas.Mistral,
+    Model:    "pixtral-12b-latest",
+    Input:    messages,
+})
+```
+
+## Environment Variables
+
+Set up your API keys for the providers you want to use:
+
+```bash
+export OPENAI_API_KEY="your-openai-api-key"
+export ANTHROPIC_API_KEY="your-anthropic-api-key"
+export CEREBRAS_API_KEY="your-cerebras-api-key"
+export MISTRAL_API_KEY="your-mistral-api-key"
+export GROQ_API_KEY="your-groq-api-key"
+export COHERE_API_KEY="your-cohere-api-key"
+```
+
+## Advanced Configuration
+
+### Weighted Load Balancing
+
+Distribute requests across multiple API keys or providers based on custom weights. This example shows how to split traffic 70/30 between two OpenAI keys, useful for managing rate limits or costs across different accounts.
+
+```go
+func (a *MyAccount) GetKeysForProvider(ctx *context.Context, provider schemas.ModelProvider) ([]schemas.Key, error) {
+    switch provider {
+    case schemas.OpenAI:
+        return []schemas.Key{{
+            Value:  os.Getenv("OPENAI_API_KEY_1"),
+            Models: []string{},
+            Weight: 0.7, // 70% of requests
+        },
+        {
+            Value:  os.Getenv("OPENAI_API_KEY_2"),
+            Models: []string{},
+            Weight: 0.3, // 30% of requests
+        },
+        }, nil
+    }
+    return nil, fmt.Errorf("provider %s not supported", provider)
+}
+```
+
+### Model-Specific Keys
+
+Use different API keys for specific models, allowing you to manage access controls and billing separately. This example uses a premium key for advanced reasoning models (o1-preview, o1-mini) and a standard key for regular GPT models.
+
+```go
+func (a *MyAccount) GetKeysForProvider(ctx *context.Context, provider schemas.ModelProvider) ([]schemas.Key, error) {
+    switch provider {
+    case schemas.OpenAI:
+        return []schemas.Key{
+            {
+                Value:  os.Getenv("OPENAI_API_KEY"),
+                Models: []string{"gpt-4o", "gpt-4o-mini"},
+                Weight: 1.0,
+            },
+            {
+                Value:  os.Getenv("OPENAI_API_KEY_PREMIUM"),
+                Models: []string{"o1-preview", "o1-mini"},
+                Weight: 1.0,
+            },
+        }, nil
+    }
+    return nil, fmt.Errorf("provider %s not supported", provider)
+}
+```
+
+### Custom Base URL
+
+Override the default API endpoint for a provider. This is useful for connecting to self-hosted models, local development servers, or OpenAI-compatible APIs like vLLM, Ollama, or LiteLLM.
+
+```go
+func (a *MyAccount) GetConfigForProvider(provider schemas.ModelProvider) (*schemas.ProviderConfig, error) {
+	switch provider {
+	case schemas.OpenAI:
+		return &schemas.ProviderConfig{
+			NetworkConfig: schemas.NetworkConfig{
+				BaseURL: "http://localhost:8000/v1", // Custom endpoint
+			},
+			ConcurrencyAndBufferSize: schemas.DefaultConcurrencyAndBufferSize,
+		}, nil
+	}
+	return nil, fmt.Errorf("provider %s not supported", provider)
+}
+```
+
+<Note>
+For self-hosted providers like Ollama and SGL, `BaseURL` is required. For standard providers, it's optional and overrides the default endpoint.
+</Note>
+### Managing Retries
+
+Configure retry behavior for handling temporary failures and rate limits. This example sets up exponential backoff with up to 5 retries, starting with 1ms delay and capping at 10 seconds - ideal for handling transient network issues.
+
+<Info>
+For a full explanation of how retries work, key rotation on rate limits, and how retries connect with fallbacks, see [Retries & Fallbacks](/features/retries-and-fallbacks).
+</Info>
+
+```go
+func (a *MyAccount) GetConfigForProvider(provider schemas.ModelProvider) (*schemas.ProviderConfig, error) {
+	switch provider {
+	case schemas.OpenAI:
+		return &schemas.ProviderConfig{
+			NetworkConfig: schemas.NetworkConfig{
+				MaxRetries:          5,
+				RetryBackoffInitial: 1 * time.Millisecond,
+				RetryBackoffMax:     10 * time.Second,
+			},
+			ConcurrencyAndBufferSize: schemas.DefaultConcurrencyAndBufferSize,
+		}, nil
+	}
+	return nil, fmt.Errorf("provider %s not supported", provider)
+}
+```
+
+### Custom Concurrency and Buffer Size
+
+Fine-tune performance by adjusting worker concurrency and queue sizes per provider (defaults are 1000 workers and 5000 queue size). This example gives OpenAI higher limits (100 workers, 500 queue) for high throughput, while Anthropic gets conservative limits to respect their rate limits.
+
+```go
+func (a *MyAccount) GetConfigForProvider(provider schemas.ModelProvider) (*schemas.ProviderConfig, error) {
+    switch provider {
+    case schemas.OpenAI:
+        return &schemas.ProviderConfig{
+            NetworkConfig: schemas.DefaultNetworkConfig,
+            ConcurrencyAndBufferSize: schemas.ConcurrencyAndBufferSize{
+                MaxConcurrency: 100, // Max number of concurrent requests (no of workers)
+                BufferSize:     500, // Max number of requests in the buffer (queue size)
+            },
+        }, nil
+    case schemas.Anthropic:
+        return &schemas.ProviderConfig{
+            NetworkConfig: schemas.DefaultNetworkConfig,
+            ConcurrencyAndBufferSize: schemas.ConcurrencyAndBufferSize{
+                MaxConcurrency: 25,
+                BufferSize:     100,
+            },
+        }, nil
+    }
+    return nil, fmt.Errorf("provider %s not supported", provider)
+}
+```
+
+### Custom Headers
+
+Bifrost supports two ways to add custom headers to provider requests: **static headers** configured at the provider level, and **dynamic headers** passed per-request via context.
+
+#### Static Headers (Provider Level)
+
+Configure headers that are automatically included in every request to a specific provider using `NetworkConfig.ExtraHeaders`:
+
+```go
+func (a *MyAccount) GetConfigForProvider(provider schemas.ModelProvider) (*schemas.ProviderConfig, error) {
+	switch provider {
+	case schemas.OpenAI:
+		return &schemas.ProviderConfig{
+			NetworkConfig: schemas.NetworkConfig{
+				ExtraHeaders: map[string]string{
+					"x-custom-org":   "my-organization",
+					"x-environment":  "production",
+				},
+			},
+			ConcurrencyAndBufferSize: schemas.DefaultConcurrencyAndBufferSize,
+		}, nil
+	}
+	return nil, fmt.Errorf("provider %s not supported", provider)
+}
+```
+
+#### Dynamic Headers (Per Request)
+
+Send custom headers with individual requests by adding them to the request context. Headers are automatically propagated to the provider:
+
+```go
+import (
+    "context"
+    "github.com/maximhq/bifrost/core/schemas"
+)
+
+func makeRequestWithCustomHeaders() {
+    // Create base context
+    ctx := context.Background()
+
+    // Add custom headers using BifrostContextKeyExtraHeaders
+    extraHeaders := map[string][]string{
+        "user-id":         {"user-123"},
+        "session-id":      {"session-abc"},
+        "custom-metadata": {"value1", "value2"}, // Multiple values supported
+    }
+    ctx = context.WithValue(ctx, schemas.BifrostContextKeyExtraHeaders, extraHeaders)
+
+    // Make request with custom headers
+    response, err := client.ChatCompletionRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostChatRequest{
+        Provider: schemas.OpenAI,
+        Model:    "gpt-4o-mini",
+        Input:    messages,
+    })
+    if err != nil {
+        // Handle error
+    }
+}
+```
+
+**How it works:**
+- Headers are stored as `map[string][]string` in the context
+- Multiple values per header name are supported
+- Header names are case-insensitive and normalized to lowercase
+- Headers are accessible throughout the request lifecycle
+
+**Example use cases:**
+- User identification: `user-id`, `tenant-id`
+- Request tracking: `correlation-id`, `trace-id`
+- Custom metadata: `department`, `cost-center`
+- A/B testing: `experiment-id`, `variant`
+
+#### Security Denylist
+
+Bifrost maintains a security denylist of headers that are never forwarded to providers, regardless of configuration:
+
+```go
+denylist := map[string]bool{
+    "proxy-authorization": true,
+    "cookie":              true,
+    "host":                true,
+    "content-length":      true,
+    "connection":          true,
+    "transfer-encoding":   true,
+
+    // prevent auth/key overrides
+    "x-api-key":      true,
+    "x-goog-api-key": true,
+    "x-bf-api-key":   true,
+    "x-bf-vk":        true,
+}
+```
+
+This denylist is applied to both static and dynamic headers to prevent security vulnerabilities.
+
+### Setting Up a Proxy
+
+Route requests through proxies for compliance, security, or geographic requirements. This example shows both HTTP proxy for OpenAI and authenticated SOCKS5 proxy for Anthropic, useful for corporate environments or regional access.
+
+```go
+func (a *MyAccount) GetConfigForProvider(provider schemas.ModelProvider) (*schemas.ProviderConfig, error) {
+	switch provider {
+	case schemas.OpenAI:
+		return &schemas.ProviderConfig{
+			NetworkConfig:            schemas.DefaultNetworkConfig,
+			ConcurrencyAndBufferSize: schemas.DefaultConcurrencyAndBufferSize,
+			ProxyConfig: &schemas.ProxyConfig{
+				Type: schemas.HttpProxy,
+				URL:  "http://localhost:8000", // Proxy URL
+			},
+		}, nil
+	case schemas.Anthropic:
+		return &schemas.ProviderConfig{
+			NetworkConfig:            schemas.DefaultNetworkConfig,
+			ConcurrencyAndBufferSize: schemas.DefaultConcurrencyAndBufferSize,
+			ProxyConfig: &schemas.ProxyConfig{
+				Type:     schemas.Socks5Proxy,
+				URL:      "http://localhost:8000", // Proxy URL
+				Username: "user",
+				Password: "password",
+			},
+		}, nil
+	}
+	return nil, fmt.Errorf("provider %s not supported", provider)
+}
+```
+
+### Send Back Raw Response
+
+Include the original provider response alongside Bifrost's standardized response format. Useful for debugging and accessing provider-specific metadata.
+
+**Provider-level default** (applies to all requests for this provider):
+
+```go
+func (a *MyAccount) GetConfigForProvider(provider schemas.ModelProvider) (*schemas.ProviderConfig, error) {
+    return &schemas.ProviderConfig{
+        NetworkConfig: schemas.DefaultNetworkConfig,
+        ConcurrencyAndBufferSize: schemas.DefaultConcurrencyAndBufferSize,
+        SendBackRawResponse: true,
+    }, nil
+}
+```
+
+**Per-request override** (overrides the provider default for a single request):
+
+```go
+ctx := context.Background()
+ctx = context.WithValue(ctx, schemas.BifrostContextKeySendBackRawResponse, true) // or false to suppress
+
+response, err := client.ChatCompletionRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostChatRequest{
+    Provider: schemas.OpenAI,
+    Model:    "gpt-4o-mini",
+    Input:    messages,
+})
+
+if response.ChatResponse != nil {
+    rawResp := response.ChatResponse.ExtraFields.RawResponse // original provider JSON
+}
+```
+
+When enabled, the raw provider response appears in `ExtraFields.RawResponse`:
+
+```go
+type BifrostChatResponse struct {
+	ID                string                     `json:"id"`
+	Choices           []BifrostResponseChoice    `json:"choices"`
+	Created           int                        `json:"created"` // The Unix timestamp (in seconds).
+	Model             string                     `json:"model"`
+	Object            string                     `json:"object"` // "chat.completion" or "chat.completion.chunk"
+	ServiceTier       string                     `json:"service_tier"`
+	SystemFingerprint string                     `json:"system_fingerprint"`
+	Usage             *BifrostLLMUsage           `json:"usage"`
+	ExtraFields       BifrostResponseExtraFields `json:"extra_fields"`
+}
+
+type BifrostResponseExtraFields struct {
+	RequestType    RequestType        `json:"request_type"`
+	Provider       ModelProvider      `json:"provider"`
+	ModelRequested string             `json:"model_requested"`
+	Latency        int64              `json:"latency"`     // in milliseconds (for streaming responses this will be each chunk latency, and the last chunk latency will be the total latency)
+	ChunkIndex     int                `json:"chunk_index"` // used for streaming responses to identify the chunk index, will be 0 for non-streaming responses
+	RawResponse    interface{}        `json:"raw_response,omitempty"`
+	CacheDebug     *BifrostCacheDebug `json:"cache_debug,omitempty"`
+}
+```
+
+### Send Back Raw Request
+
+Include the original request sent to the provider alongside Bifrost's response. Useful for debugging request transformations and verifying what was actually sent to the provider.
+
+**Provider-level default** (applies to all requests for this provider):
+
+```go
+func (a *MyAccount) GetConfigForProvider(provider schemas.ModelProvider) (*schemas.ProviderConfig, error) {
+    return &schemas.ProviderConfig{
+        NetworkConfig: schemas.DefaultNetworkConfig,
+        ConcurrencyAndBufferSize: schemas.DefaultConcurrencyAndBufferSize,
+        SendBackRawRequest: true,
+    }, nil
+}
+```
+
+**Per-request override** (overrides the provider default for a single request):
+
+```go
+ctx := context.Background()
+ctx = context.WithValue(ctx, schemas.BifrostContextKeySendBackRawRequest, true) // or false to suppress
+
+response, err := client.ChatCompletionRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostChatRequest{
+    Provider: schemas.OpenAI,
+    Model:    "gpt-4o-mini",
+    Input:    messages,
+})
+
+if response.ChatResponse != nil {
+    rawReq := response.ChatResponse.ExtraFields.RawRequest // exact JSON sent to the provider
+}
+```
+
+When enabled, the raw provider request appears in `ExtraFields.RawRequest`:
+
+```go
+type BifrostResponseExtraFields struct {
+	// ... other fields
+	RawRequest     interface{}        `json:"raw_request,omitempty"`
+	RawResponse    interface{}        `json:"raw_response,omitempty"`
+}
+```
+
+### Store Raw Request/Response
+
+Persist the raw provider request and response in the log record without necessarily returning them in the API response. This is orthogonal to the send-back flags — enabling this does not affect what the caller receives, and enabling send-back does not automatically store data in logs. Enable both to do both.
+
+**Provider-level default** (applies to all requests for this provider):
+
+```go
+func (a *MyAccount) GetConfigForProvider(provider schemas.ModelProvider) (*schemas.ProviderConfig, error) {
+    return &schemas.ProviderConfig{
+        NetworkConfig: schemas.DefaultNetworkConfig,
+        ConcurrencyAndBufferSize: schemas.DefaultConcurrencyAndBufferSize,
+        StoreRawRequestResponse: true,
+    }, nil
+}
+```
+
+**Per-request override** (overrides the provider default for a single request):
+
+```go
+ctx := context.Background()
+ctx = context.WithValue(ctx, schemas.BifrostContextKeyStoreRawRequestResponse, true) // or false to disable
+
+response, err := client.ChatCompletionRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostChatRequest{
+    Provider: schemas.OpenAI,
+    Model:    "gpt-4o-mini",
+    Input:    messages,
+})
+// Raw data is persisted in the log record.
+// ExtraFields.RawRequest/RawResponse are nil unless send-back flags are also enabled.
+```
+
+<Note>
+`StoreRawRequestResponse` only has effect when the logging plugin is active — raw data is written to the log record by the logging plugin. Without it, enabling this flag captures the data but nothing persists it.
+
+`StoreRawRequestResponse`, `SendBackRawRequest`, and `SendBackRawResponse` are orthogonal controls — enabling any one does not imply the others. Enable any combination depending on whether you need raw data in logs, in the response, or both.
+</Note>
+
+## Best Practices
+
+### Performance Considerations
+
+Keys are fetched from your `GetKeysForProvider` implementation on every request. Ensure your implementation is optimized for speed to avoid adding latency:
+
+```go
+func (a *MyAccount) GetKeysForProvider(ctx *context.Context, provider schemas.ModelProvider) ([]schemas.Key, error) {
+    // ✅ Good: Fast in-memory lookup
+    switch provider {
+    case schemas.OpenAI:
+        return a.cachedOpenAIKeys, nil  // Pre-cached keys
+    }
+    
+    // ❌ Avoid: Database queries, API calls, complex algorithms
+    // This will add latency to every AI request
+    // keys := fetchKeysFromDatabase(provider)  // Too slow!
+    // return processWithComplexLogic(keys)     // Too slow!
+    
+    return nil, fmt.Errorf("provider %s not supported", provider)
+}
+```
+
+**Recommendations:**
+- Cache keys in memory during application startup
+- Use simple switch statements or map lookups
+- Avoid database queries, file I/O, or network calls
+- Keep complex key processing logic outside the request path
+
+## Next Steps
+
+- **[Streaming Responses](./streaming)** - Real-time response generation
+- **[Tool Calling](./tool-calling)** - Enable AI to use external functions
+- **[Multimodal AI](./multimodal)** - Process images, audio, and text
+- **[Core Features](../../features/)** - Advanced Bifrost capabilities
--- a/docs/quickstart/go-sdk/reranking.mdx
+++ b/docs/quickstart/go-sdk/reranking.mdx
@@ -0,0 +1,88 @@
+---
+title: "Reranking"
+description: "Rerank documents with Bifrost Go SDK using client.RerankRequest."
+icon: "book-open-cover"
+---
+
+Use the Go SDK to rank candidate documents by relevance to a query.
+
+Provider/model examples:
+- Cohere: `Provider: schemas.Cohere`, `Model: "rerank-v3.5"`
+- vLLM: `Provider: schemas.VLLM`, `Model: "BAAI/bge-reranker-v2-m3"`
+
+## Basic Example
+
+```go
+package main
+
+import (
+	"context"
+	"fmt"
+
+	bifrost "github.com/maximhq/bifrost/core"
+	"github.com/maximhq/bifrost/core/schemas"
+)
+
+func main() {
+	client, err := bifrost.Init(context.Background(), schemas.BifrostConfig{
+		Account: &MyAccount{},
+	})
+	if err != nil {
+		panic(err)
+	}
+	defer client.Shutdown()
+
+	request := &schemas.BifrostRerankRequest{
+		Provider: schemas.Cohere,
+		Model:    "rerank-v3.5",
+		Query:    "What is Bifrost?",
+		Documents: []schemas.RerankDocument{
+			{Text: "Bifrost is an AI gateway that unifies many LLM providers."},
+			{Text: "Paris is the capital of France."},
+			{Text: "Bifrost exposes an OpenAI-compatible API."},
+		},
+		Params: &schemas.RerankParameters{
+			TopN:            bifrost.Ptr(2),
+			ReturnDocuments: bifrost.Ptr(true),
+		},
+	}
+
+	resp, bfErr := client.RerankRequest(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), request)
+	if bfErr != nil {
+		panic(bfErr.Error.Message)
+	}
+
+	for _, result := range resp.Results {
+		fmt.Printf("index=%d score=%.4f\n", result.Index, result.RelevanceScore)
+	}
+}
+```
+
+## Parameters
+
+- `Provider`, `Model`: provider/model to use for rerank
+- `Query`: query text
+- `Documents`: documents to score (`text`, optional `id`, `meta`)
+- `Params.TopN`: max result count
+- `Params.MaxTokensPerDoc`: provider-dependent token cap
+- `Params.Priority`: provider-dependent priority hint
+- `Params.ReturnDocuments`: include source document in each result
+- `Fallbacks`: fallback provider/model choices
+
+For vLLM, set `Provider` to `schemas.VLLM` and use the upstream model ID as `Model` (without the `vllm/` prefix that is used in Gateway HTTP requests).
+
+## Response
+
+`BifrostRerankResponse` includes:
+
+- `Results []RerankResult` (`index`, `relevance_score`, optional `document`)
+- `Model`
+- optional `Usage`
+- `ExtraFields` metadata (`provider`, `latency`, `request_type`, etc.)
+
+## Next Steps
+
+- **[Streaming Responses](./streaming)** - Real-time response processing
+- **[Tool Calling](./tool-calling)** - Enable AI to use external functions
+- **[Multimodal AI](./multimodal)** - Process images and multimedia content
+- **[Core Features](../../features/)** - Advanced Bifrost capabilities
--- a/docs/quickstart/go-sdk/setting-up.mdx
+++ b/docs/quickstart/go-sdk/setting-up.mdx
@@ -0,0 +1,144 @@
+---
+title: "Setting Up"
+description: "Get Bifrost running in your Go application in 30 seconds with minimal setup and direct code integration."
+icon: "play"
+---
+
+<video width="100%" controls>
+  <source src="https://github.com/maximhq/bifrost/raw/refs/heads/main/docs/media/package-demo.mp4" type="video/mp4" />
+  Your browser does not support the video tag.
+</video>
+
+
+## 30-Second Setup
+
+Get Bifrost running in your Go application with minimal setup. This guide shows you how to integrate multiple AI providers through a single, unified interface.
+
+### 1. Install Package
+
+```bash
+go mod init my-bifrost-app
+go get github.com/maximhq/bifrost/core
+```
+
+### 2. Set Environment Variable
+
+```bash
+export OPENAI_API_KEY="your-openai-api-key"
+```
+
+### 3. Create `main.go`
+
+```go
+package main
+
+import (
+    "context"
+    "fmt"
+    "os"
+
+    "github.com/maximhq/bifrost/core"
+    "github.com/maximhq/bifrost/core/schemas"
+)
+
+type MyAccount struct{}
+
+// Account interface needs to implement these 3 methods
+func (a *MyAccount) GetConfiguredProviders() ([]schemas.ModelProvider, error) {
+    return []schemas.ModelProvider{schemas.OpenAI}, nil
+}
+
+func (a *MyAccount) GetKeysForProvider(ctx *context.Context, provider schemas.ModelProvider) ([]schemas.Key, error) {
+    if provider == schemas.OpenAI {
+    return []schemas.Key{{
+        Value:  os.Getenv("OPENAI_API_KEY"),
+            Models: schemas.WhiteList{"*"}, // Keep Models ["*"] to use any model
+            Weight: 1.0,
+        }}, nil
+    }
+    return nil, fmt.Errorf("provider %s not supported", provider)
+}
+
+func (a *MyAccount) GetConfigForProvider(provider schemas.ModelProvider) (*schemas.ProviderConfig, error) {
+    if provider == schemas.OpenAI {
+        // Return default config (can be customized for advanced use cases)
+        return &schemas.ProviderConfig{
+                NetworkConfig:            schemas.DefaultNetworkConfig,
+                ConcurrencyAndBufferSize: schemas.DefaultConcurrencyAndBufferSize,
+        }, nil
+    }
+    return nil, fmt.Errorf("provider %s not supported", provider)
+}
+
+// Main function implement to initialize bifrost and make a request
+func main() {
+	client, initErr := bifrost.Init(context.Background(), schemas.BifrostConfig{
+		Account: &MyAccount{},
+	})
+	if initErr != nil {
+		panic(initErr)
+	}
+	defer client.Shutdown()
+
+	messages := []schemas.ChatMessage{
+		{
+            Role:    schemas.ChatMessageRoleUser,
+            Content: &schemas.ChatMessageContent{
+                ContentStr: schemas.Ptr("Hello, Bifrost!"),
+            },
+        },
+	}
+
+	response, err := client.ChatCompletionRequest(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), &schemas.BifrostChatRequest{
+		Provider: schemas.OpenAI,
+		Model:    "gpt-4o-mini",
+		Input:    messages,
+	})
+
+	if err != nil {
+		panic(err)
+	}
+
+    fmt.Println("Response:", *response.Choices[0].Message.Content.ContentStr)
+}
+```
+
+### 4. Run Your App
+
+```bash
+go run main.go
+# Output: Response: Hello! I'm Bifrost, your AI model gateway...
+```
+
+**🎉 That's it!** You're now running Bifrost in your Go application.
+
+### What Just Happened?
+
+1. **Account Interface**: `MyAccount` provides API keys and list of providers to Bifrost for initialisation and key lookups.
+2. **Provider Resolution**: `schemas.OpenAI` tells Bifrost to use OpenAI as the provider.
+3. **Model Selection**: `"gpt-4o-mini"` specifies which model to use.
+4. **Unified API**: Same interface works for any provider/model combination (OpenAI, Anthropic, Vertex etc.)
+
+---
+
+## Next Steps
+
+Now that you have Bifrost running, explore these focused guides:
+
+### Essential Topics
+
+- **[Provider Configuration](./provider-configuration)** - Multiple providers & automatic failovers
+- **[Streaming Responses](./streaming)** - Real-time chat, audio, and transcription
+- **[Tool Calling](./tool-calling)** - Functions & MCP server integration  
+- **[Multimodal AI](./multimodal)** - Images, speech synthesis, and vision
+
+### Advanced Topics
+
+- **[Core Features](../../features/)** - Caching, observability, and governance
+- **[Integrations](../../integrations/)** - Drop-in replacements for existing SDKs
+- **[Architecture](../../architecture/)** - How Bifrost works internally
+- **[Deployment](../../deployment-guides)** - Production setup and scaling
+
+---
+
+**Happy coding with Bifrost!** 🚀
--- a/docs/quickstart/go-sdk/streaming.mdx
+++ b/docs/quickstart/go-sdk/streaming.mdx
@@ -0,0 +1,300 @@
+---
+title: "Streaming Responses"
+description: "Receive AI responses in real-time as they're generated. Perfect for chat applications, audio processing, and real-time transcription where you want immediate results."
+icon: "water"
+---
+
+## Streaming Text Completion
+
+Stream plain text completions as they are generated, ideal for autocomplete, summaries, and single-output generation.
+
+```go
+stream, err := client.TextCompletionStreamRequest(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), &schemas.BifrostTextCompletionRequest{
+	Provider: schemas.OpenAI,
+	Model:    "gpt-4o-mini",
+	Input: &schemas.TextCompletionInput{
+		PromptStr: bifrost.Ptr("A for apple and B for"),
+	},
+})
+
+if err != nil {
+	log.Printf("Streaming request failed: %v", err)
+	return
+}
+
+for chunk := range stream {
+	// Handle errors in stream
+	if chunk.BifrostError != nil {
+		log.Printf("Stream error: %v", chunk.BifrostError)
+		break
+	}
+
+	// Process response chunks
+	if chunk.BifrostTextCompletionResponse != nil && len(chunk.BifrostTextCompletionResponse.Choices) > 0 {
+		choice := chunk.BifrostTextCompletionResponse.Choices[0]
+		
+		// Check for streaming content
+		if choice.TextCompletionResponseChoice != nil &&
+			choice.TextCompletionResponseChoice.Text != nil {
+			content := *choice.BifrostTextCompletionResponseChoice.Text
+			fmt.Print(content) // Print content as it arrives
+		}
+	}
+}
+```
+
+## Streaming Chat Responses
+
+Receive incremental chat deltas in real-time. Append delta content to progressively render assistant messages.
+
+```go
+stream, err := client.ChatCompletionStreamRequest(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), &schemas.BifrostChatRequest{
+	Provider: schemas.OpenAI,
+	Model:    "gpt-4o-mini",
+	Input:    messages,
+})
+
+if err != nil {
+	log.Printf("Streaming request failed: %v", err)
+	return
+}
+
+for chunk := range stream {
+	// Handle errors in stream
+	if chunk.BifrostError != nil {
+		log.Printf("Stream error: %v", chunk.BifrostError)
+		break
+	}
+
+	// Process response chunks
+	if chunk.BifrostChatResponse != nil && len(chunk.BifrostChatResponse.Choices) > 0 {
+		choice := chunk.BifrostChatResponse.Choices[0]
+
+		// Check for streaming content
+		if choice.ChatStreamResponseChoice != nil &&
+			choice.ChatStreamResponseChoice.Delta != nil &&
+			choice.ChatStreamResponseChoice.Delta.Content != nil {
+
+			content := *choice.ChatStreamResponseChoice.Delta.Content
+			fmt.Print(content) // Print content as it arrives
+		}
+	}
+}
+```
+
+> **Note:** Streaming requests also follow the default timeout setting defined in provider configuration, which defaults to **30 seconds**.
+
+<Note>
+Bifrost standardizes all stream responses to send usage and finish reason only in the last chunk, and content in the previous chunks.
+</Note>
+
+## Responses API Streaming
+
+Use the OpenAI-style Responses API with streaming for unified flows. Events arrive via SSE; accumulate text deltas until completion.
+
+```go
+messages := []schemas.ResponsesMessage{
+	{
+		Role: bifrost.Ptr(schemas.ResponsesInputMessageRoleUser),
+		Content: &schemas.ResponsesMessageContent{
+			ContentStr: bifrost.Ptr("Hello, Bifrost!"),
+		},
+	},
+}
+
+stream, err := client.ResponsesStreamRequest(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), &schemas.BifrostResponsesRequest{
+	Provider: schemas.OpenAI,
+	Model:    "gpt-4o-mini",
+	Input:    messages,
+})
+
+if err != nil {
+	log.Printf("Streaming request failed: %v", err)
+	return
+}
+
+for chunk := range stream {
+	// Handle errors in stream
+	if chunk.BifrostError != nil {
+		log.Printf("Stream error: %v", chunk.BifrostError)
+		break
+	}
+
+	// Process response chunks
+	if chunk.BifrostResponsesStreamResponse != nil {
+		delta := chunk.BifrostResponsesStreamResponse.Delta
+
+		// Check for streaming content
+		if delta != nil {
+			fmt.Print(*delta) // Print content as it arrives
+		}
+	}
+}
+```
+
+## Text-to-Speech Streaming: Real-time Audio Generation
+
+Stream audio generation in real-time as text is converted to speech. Ideal for long texts or when you need immediate audio playback.
+
+```go
+stream, err := client.SpeechStreamRequest(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), &schemas.BifrostSpeechRequest{
+	Provider: schemas.OpenAI,
+	Model:    "tts-1", // Using text-to-speech model
+	Input: &schemas.SpeechInput{
+		Input: "Hello! This is a sample text that will be converted to speech using Bifrost's speech synthesis capabilities. The weather today is wonderful, and I hope you're having a great day!",
+	},
+	Params: &schemas.SpeechParameters{
+		VoiceConfig: &schemas.SpeechVoiceInput{
+			Voice: schemas.Ptr("alloy"),
+		},
+		ResponseFormat: schemas.Ptr("mp3"),
+	},
+})
+
+if err != nil {
+	panic(err)
+}
+
+// Handle speech synthesis stream
+var audioData []byte
+var totalChunks int
+filename := "output.mp3"
+
+for chunk := range stream {
+	if chunk.BifrostError != nil {
+		panic(fmt.Sprintf("Stream error: %s", chunk.BifrostError.Error.Message))
+	}
+
+	if chunk.BifrostSpeechStreamResponse != nil {
+		// Accumulate audio data from each chunk
+		audioData = append(audioData, chunk.BifrostSpeechStreamResponse.Audio...)
+		totalChunks++
+		fmt.Printf("Received chunk %d, size: %d bytes\n", totalChunks, len(chunk.BifrostSpeechStreamResponse.Audio))
+	}
+}
+
+if len(audioData) > 0 {
+	// Save the accumulated audio to a file
+	err := os.WriteFile(filename, audioData, 0644)
+	if err != nil {
+		panic(fmt.Sprintf("Failed to save audio file: %v", err))
+	}
+
+	fmt.Printf("Speech synthesis streaming complete! Audio saved to %s\n", filename)
+	fmt.Printf("Total chunks received: %d, final file size: %d bytes\n", totalChunks, len(audioData))
+}
+```
+
+## Speech-to-Text Streaming: Real-time Audio Transcription
+
+Stream audio transcription results as they're processed. Get immediate text output for real-time applications or long audio files.
+
+```go
+// Read the audio file for transcription
+audioFilename := "output.mp3"
+audioData, err := os.ReadFile(audioFilename)
+if err != nil {
+	panic(fmt.Sprintf("Failed to read audio file %s: %v. Please make sure the file exists.", audioFilename, err))
+}
+
+fmt.Printf("Loaded audio file %s (%d bytes) for transcription...\n", audioFilename, len(audioData))
+
+stream, err := client.TranscriptionStreamRequest(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), &schemas.BifrostTranscriptionRequest{
+	Provider: schemas.OpenAI,
+	Model:    "whisper-1", // Using Whisper model for transcription
+	Input: &schemas.TranscriptionInput{
+		File: audioData,
+	},
+	Params: &schemas.TranscriptionParameters{
+		Prompt: schemas.Ptr("This is a sample audio transcription from Bifrost speech synthesis."), // Optional: provide context
+	},
+})
+
+if err != nil {
+	panic(err)
+}
+
+for chunk := range stream {
+	if chunk.BifrostError != nil {
+		panic(fmt.Sprintf("Stream error: %s", chunk.BifrostError.Error.Message))
+	}
+
+	if chunk.BifrostTranscriptionStreamResponse != nil && chunk.BifrostTranscriptionStreamResponse.Delta != nil {
+		// Print each chunk of text as it arrives
+		fmt.Print(*chunk.BifrostTranscriptionStreamResponse.Delta)
+	}
+}
+```
+
+## Streaming Best Practices
+
+### Buffering for Audio
+
+For audio streaming, consider buffering chunks before saving:
+
+```go
+const bufferSize = 1024 * 1024 // 1MB buffer
+
+var audioBuffer bytes.Buffer
+var lastSave time.Time
+
+for chunk := range stream {
+	if chunk.BifrostSpeechStreamResponse != nil {
+		audioBuffer.Write(chunk.BifrostSpeechStreamResponse.Audio)
+
+		// Save every second or when buffer is full
+		if time.Since(lastSave) > time.Second || audioBuffer.Len() > bufferSize {
+			// Append to file
+			file, err := os.OpenFile("streaming_audio.mp3", os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
+			if err == nil {
+				file.Write(audioBuffer.Bytes())
+				file.Close()
+				audioBuffer.Reset()
+				lastSave = time.Now()
+			}
+		}
+	}
+}
+```
+
+### Context and Cancellation
+
+Use context to control streaming duration:
+
+```go
+ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
+defer cancel()
+
+stream, err := client.ChatCompletionStreamRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostChatRequest{
+	// ... your request
+})
+
+// Stream will automatically stop after 30 seconds
+```
+
+## Voice Options
+
+OpenAI TTS supports these voices:
+
+- `alloy` - Balanced, natural voice
+- `echo` - Deep, resonant voice  
+- `fable` - Expressive, storytelling voice
+- `onyx` - Strong, confident voice
+- `nova` - Bright, energetic voice
+- `shimmer` - Gentle, soothing voice
+
+```go
+// Different voice example
+VoiceConfig: schemas.SpeechVoiceInput{
+    Voice: bifrost.Ptr("nova"),
+},
+```
+
+> **Note:** Please check each model's documentation to see if it supports the corresponding streaming features. Not all providers support all streaming capabilities.
+
+## Next Steps
+
+- **[Tool Calling](./tool-calling)** - Enable AI to use external functions
+- **[Multimodal AI](./multimodal)** - Process images and multimedia content
+- **[Provider Configuration](./provider-configuration)** - Multiple providers for redundancy
+- **[Core Features](../../features/)** - Advanced Bifrost capabilities
--- a/docs/quickstart/go-sdk/tool-calling.mdx
+++ b/docs/quickstart/go-sdk/tool-calling.mdx
@@ -0,0 +1,268 @@
+---
+title: "Tool Calling"
+description: "Enable AI models to use external functions and services by defining tool schemas or connecting to Model Context Protocol (MCP) servers. This allows AI to interact with databases, APIs, file systems, and more."
+icon: "wrench"
+---
+
+## Function Calling with Custom Tools
+
+Enable AI models to use external functions by defining tool schemas. Models can then call these functions automatically based on user requests.
+
+```go
+// Define a tool for the calculator
+calculatorTool := schemas.ChatTool{
+	Type: schemas.ChatToolTypeFunction,
+	Function: &schemas.ChatToolFunction{
+		Name:        "calculator",
+		Description: schemas.Ptr("A calculator tool"),
+		Parameters: &schemas.ToolFunctionParameters{
+			Type: "object",
+			Properties: map[string]interface{}{
+				"operation": map[string]interface{}{
+					"type":        "string",
+					"description": "The operation to perform",
+					"enum":        []string{"add", "subtract", "multiply", "divide"},
+				},
+				"a": map[string]interface{}{
+					"type":        "number",
+					"description": "The first number",
+				},
+				"b": map[string]interface{}{
+					"type":        "number",
+					"description": "The second number",
+				},
+			},
+			Required: []string{"operation", "a", "b"},
+		},
+	},
+}
+
+response, err := client.ChatCompletionRequest(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), &schemas.BifrostChatRequest{
+	Provider: schemas.OpenAI,
+	Model:    "gpt-4o-mini",
+	Input: []schemas.ChatMessage{
+		{
+			Role: schemas.ChatMessageRoleUser,
+			Content: &schemas.ChatMessageContent{
+				ContentStr: schemas.Ptr("What is 2+2? Use the calculator tool."),
+			},
+		},
+	},
+	Params: &schemas.ChatParameters{
+		Tools: []schemas.ChatTool{calculatorTool},
+	},
+})
+
+if err != nil {
+	panic(err)
+}
+
+if response.Choices[0].Message.ChatAssistantMessage != nil && response.Choices[0].Message.ChatAssistantMessage.ToolCalls != nil {
+	for _, toolCall := range response.Choices[0].Message.ChatAssistantMessage.ToolCalls {
+		fmt.Printf("Tool call in response - %s: %s\n", *toolCall.ID, *toolCall.Function.Name)
+		fmt.Printf("Tool call arguments - %s\n", toolCall.Function.Arguments)
+	}
+}
+```
+
+## Connecting to MCP Servers
+
+Connect to Model Context Protocol (MCP) servers to give AI models access to external tools and services without manually defining each function.
+
+```go
+client, initErr := bifrost.Init(context.Background(), schemas.BifrostConfig{
+	Account: &MyAccount{},
+	MCPConfig: &schemas.MCPConfig{
+		ClientConfigs: []schemas.MCPClientConfig{
+			// Sample youtube-mcp server
+			{
+				Name:             "youtube-mcp",
+				ConnectionType:   schemas.MCPConnectionTypeHTTP,
+				ConnectionString: schemas.Ptr("http://your-youtube-mcp-url"),
+				ToolsToExecute: []string{"*"}, // Allow all tools from this client
+			},
+		},
+	},
+})
+if initErr != nil {
+	panic(initErr)
+}
+defer client.Shutdown()
+
+response, err := client.ChatCompletionRequest(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), &schemas.BifrostChatRequest{
+	Provider: schemas.OpenAI,
+	Model:    "gpt-4o-mini",
+	Input: []schemas.ChatMessage{
+		{
+			Role: schemas.ChatMessageRoleUser,
+			Content: &schemas.ChatMessageContent{
+				ContentStr: schemas.Ptr("What do you see when you search for 'bifrost' on youtube?"),
+			},
+		},
+	},
+})
+
+if err != nil {
+	panic(err)
+}
+
+if response.Choices[0].Message.ChatAssistantMessage != nil && response.Choices[0].Message.ChatAssistantMessage.ToolCalls != nil {
+	for _, toolCall := range response.Choices[0].Message.ChatAssistantMessage.ToolCalls {
+		fmt.Printf("Tool call in response - %s: %s\n", *toolCall.ID, *toolCall.Function.Name)
+		fmt.Printf("Tool call arguments - %s\n", toolCall.Function.Arguments)
+	}
+}
+```
+
+Read more about MCP connections and in-house tool registration via local MCP server in the [MCP Features](../../mcp/overview) section.
+
+## Advanced Tool Examples
+
+### Weather API Tool
+
+```go
+weatherTool := schemas.ChatTool{
+	Type: schemas.ChatToolTypeFunction,
+	Function: &schemas.ChatToolFunction{
+		Name:        "get_weather",
+		Description: schemas.Ptr("Get the current weather for a location"),
+		Parameters: &schemas.ToolFunctionParameters{
+			Type: "object",
+			Properties: map[string]interface{}{
+				"location": map[string]interface{}{
+					"type":        "string",
+					"description": "The city and state, e.g. San Francisco, CA",
+				},
+				"unit": map[string]interface{}{
+					"type":        "string",
+					"description": "Temperature unit",
+					"enum":        []string{"celsius", "fahrenheit"},
+				},
+			},
+			Required: []string{"location"},
+		},
+	},
+}
+```
+
+### Database Query Tool
+
+```go
+databaseTool := schemas.ChatTool{
+	Type: schemas.ChatToolTypeFunction,
+	Function: &schemas.ChatToolFunction{
+		Name:        "query_database",
+		Description: schemas.Ptr("Execute a SQL query on the customer database"),
+		Parameters: &schemas.ToolFunctionParameters{
+			Type: "object",
+			Properties: map[string]interface{}{
+				"query": map[string]interface{}{
+					"type":        "string",
+					"description": "The SQL query to execute",
+				},
+				"table": map[string]interface{}{
+					"type":        "string",
+					"description": "The table to query",
+					"enum":        []string{"customers", "orders", "products"},
+				},
+			},
+			Required: []string{"query", "table"},
+		},
+	},
+}
+```
+
+### File System Tool
+
+```go
+fileSystemTool := schemas.ChatTool{
+	Type: schemas.ChatToolTypeFunction,
+	Function: &schemas.ChatToolFunction{
+		Name:        "read_file",
+		Description: schemas.Ptr("Read the contents of a file"),
+		Parameters: &schemas.ToolFunctionParameters{
+			Type: "object",
+			Properties: map[string]interface{}{
+				"path": map[string]interface{}{
+					"type":        "string",
+					"description": "The file path to read",
+				},
+				"encoding": map[string]interface{}{
+					"type":        "string",
+					"description": "File encoding",
+					"enum":        []string{"utf-8", "ascii", "base64"},
+					"default":     "utf-8",
+				},
+			},
+			Required: []string{"path"},
+		},
+	},
+}
+```
+
+## Multiple Tool Support
+
+Use multiple tools in a single request:
+
+```go
+response, err := client.ChatCompletionRequest(schemas.NewBifrostContext(context.Background(), schemas.NoDeadline), &schemas.BifrostChatRequest{
+	Provider: schemas.OpenAI,
+	Model:    "gpt-4o-mini",
+	Input: []schemas.ChatMessage{
+		{
+			Role: schemas.ChatMessageRoleUser,
+			Content: &schemas.ChatMessageContent{
+				ContentStr: schemas.Ptr("What's the weather in New York and calculate 15% tip for a $50 bill?"),
+			},
+		},
+	},
+	Params: &schemas.ChatParameters{
+		Tools: []schemas.ChatTool{weatherTool, calculatorTool},
+		ToolChoice: &schemas.ChatToolChoice{
+			ChatToolChoiceStr: schemas.Ptr("auto"), // Let AI decide which tools to use
+		},
+	},
+})
+```
+
+## Tool Choice Options
+
+Control how the AI uses tools:
+
+```go
+// Force use of a specific tool
+Params: &schemas.ChatParameters{
+	Tools: []schemas.ChatTool{calculatorTool},
+	ToolChoice: &schemas.ChatToolChoice{
+		ChatToolChoiceStruct: &schemas.ChatToolChoiceStruct{
+			Type: schemas.ChatToolChoiceTypeFunction,
+			Function: &schemas.ChatToolChoiceFunction{
+				Name: "calculator",
+			},
+		},
+	},
+}
+
+// Let AI decide automatically
+Params: &schemas.ChatParameters{
+	Tools: []schemas.ChatTool{calculatorTool, weatherTool},
+	ToolChoice: &schemas.ChatToolChoice{
+		ChatToolChoiceStr: schemas.Ptr("auto"),
+	},
+}
+
+// Disable tool usage
+Params: &schemas.ChatParameters{
+	Tools: []schemas.ChatTool{calculatorTool},
+	ToolChoice: &schemas.ChatToolChoice{
+		ChatToolChoiceStr: schemas.Ptr("none"),
+	},
+}
+```
+
+## Next Steps
+
+- **[Multimodal AI](./multimodal)** - Process images, audio, and multimedia content
+- **[Streaming Responses](./streaming)** - Real-time response generation
+- **[Provider Configuration](./provider-configuration)** - Multiple providers for redundancy
+- **[MCP Features](../../mcp/overview)** - Advanced MCP server management