first commit

2026-04-26 21:52:23 +03:00
commit 880f412e2c
2662 changed files with 866266 additions and 0 deletions
--- a/docs/quickstart/gateway/integrations.mdx
+++ b/docs/quickstart/gateway/integrations.mdx
@@ -0,0 +1,70 @@
+---
+title: "Integrations"
+description: "Use Bifrost as a drop-in replacement for existing AI provider SDKs with zero code changes. Just change the base URL and unlock advanced features."
+icon: "plug"
+---
+
+## What are Integrations?
+
+Integrations are protocol adapters that make Bifrost **100% compatible** with existing AI provider SDKs. They translate between provider-specific API formats (OpenAI, Anthropic, Google GenAI) and Bifrost's unified API, enabling you to:
+
+- **Drop-in replacement** - Change only the base URL in your existing code
+- **Zero migration effort** - Keep your current SDK and request/response handling
+- **Instant feature access** - Get governance, caching, fallbacks, and monitoring without code changes
+
+## Quick Example
+
+### Before (Direct Provider)
+```python
+import openai
+
+client = openai.OpenAI(
+    api_key="your-openai-key"
+)
+```
+
+### After (Bifrost Integration)
+```python
+import openai
+
+client = openai.OpenAI(
+    base_url="http://localhost:8080/openai",  # Point to Bifrost
+    api_key="dummy-key"  # Keys handled by Bifrost
+)
+```
+
+**That's it!** Your application now has automatic fallbacks, governance, monitoring, and all Bifrost features.
+
+## Available Integrations
+
+Bifrost provides complete compatibility with these popular AI SDKs:
+
+- **[OpenAI SDK](../../integrations/openai-sdk)**
+- **[Anthropic SDK](../../integrations/anthropic-sdk)**
+- **[Google GenAI SDK](../../integrations/genai-sdk)**
+- **[LiteLLM](../../integrations/litellm-sdk)**
+- **[LangChain](../../integrations/langchain-sdk)**
+- **[Passthrough Endpoints](../../integrations/passthrough)**
+
+## Learn More
+
+For detailed setup guides, compatibility information, and advanced usage:
+
+**➜ [Complete Integration Documentation](../../integrations/what-is-an-integration)**
+
+## Next Steps
+
+Now that you understand integrations, explore these related topics:
+
+### Essential Topics
+
+- **[Provider Configuration](./provider-configuration)** - Set up multiple AI providers for redundancy
+- **[Tool Calling](./tool-calling)** - Enable AI models to use external functions
+- **[Streaming Responses](./streaming)** - Real-time response generation
+- **[Multimodal AI](./multimodal)** - Process images, audio, and multimedia content
+
+### Advanced Topics
+
+- **[Core Features](../../features/)** - Governance, caching, and observability
+- **[Architecture](../../architecture/)** - How Bifrost works internally
+- **[Deployment](../../deployment-guides)** - Production setup and scaling
--- a/docs/quickstart/gateway/multimodal.mdx
+++ b/docs/quickstart/gateway/multimodal.mdx
@@ -0,0 +1,356 @@
+---
+title: "Multimodal Support"
+description: "Process multiple types of content including images, audio, and text with AI models. Bifrost supports vision analysis, image generation, speech synthesis, and audio transcription across various providers."
+icon: "images"
+---
+
+## Vision: Analyzing Images with AI
+
+Send images to vision-capable models for analysis, description, and understanding. This example shows how to analyze an image from a URL using GPT-4o with high detail processing for better accuracy.
+
+```bash
+curl --location 'http://localhost:8080/v1/chat/completions' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/gpt-4o",
+    "messages": [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "What do you see in this image? Please describe it in detail."
+                },
+                {
+                    "type": "image_url",
+                    "image_url": {
+                        "url": "https://pub-cdead89c2f004d8f963fd34010c479d0.r2.dev/Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
+                        "detail": "high"
+                    }
+                }
+            ]
+        }
+    ]
+}'
+```
+
+**Response includes detailed image analysis:**
+```json
+{
+    "choices": [{
+        "message": {
+            "role": "assistant",
+            "content": "I can see a beautiful wooden boardwalk extending through a natural landscape..."
+        }
+    }]
+}
+```
+
+## Image Generation: Generating Images with AI
+
+Generate images from text prompts using OpenAI-compatible image generation models. 
+
+### Basic Image Generation
+
+Generate an image from a text prompt using `dall-e-3`.
+
+```bash
+curl --location 'http://localhost:8080/v1/images/generations' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/dall-e-3",
+    "prompt": "A futuristic city skyline at sunset with flying cars",
+    "size": "1024x1024",
+    "response_format": "url"
+}'
+```
+
+**Response format:**
+```json
+{
+    "created": 1713833628,
+    "data": [
+        {
+            "url": "https://oaidalleapiprodscus.blob.core.windows.net/...",
+            "revised_prompt": "A futuristic city skyline at sunset featuring advanced architecture and flying vehicles.",
+            "index": 0
+        }
+    ],
+    "background": "opaque",
+    "output_format": "png",
+    "quality": "standard",
+    "size": "1024x1024",
+    "usage": {
+        "input_tokens": 15,
+        "output_tokens": 1,
+        "total_tokens": 16
+    },
+    "extra_fields": {
+        "request_type": "image_generation",
+        "provider": "openai",
+        "model_requested": "dall-e-3",
+        "latency": 15265,
+        "chunk_index": 0
+    }
+}
+```
+
+## Audio Understanding: Analyzing Audio with AI
+
+If your chat application supports text input, you can add audio input and output—just include audio in the modalities array and use an audio model, like gpt-4o-audio-preview.
+
+### Audio Input to Model
+
+```bash
+curl --location 'http://localhost:8080/v1/chat/completions' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/gpt-4o-audio-preview",
+    "modalities": ["text"],
+    "messages": [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "Please analyze this audio recording and summarize what was discussed."
+                },
+                {
+                    "type": "input_audio",
+                    "input_audio": {
+                        "data": "<base64-encoded audio data containing the word 'Affirmative'>",
+                        "format": "wav"
+                    }
+                }
+            ]
+        }
+    ]
+}'
+```
+
+### Audio Output from Model
+
+```bash
+{
+    "choices": [
+        {
+            "index": 0,
+            "finish_reason": "stop",
+            "message": {
+                "role": "assistant",
+                "content": "The audio recording captured a brief segment where a speaker simply said \"Affirmative\" in response. There wasn't any detailed discussion or context provided beyond that one-word affirmation. If you have more audio or specific questions, feel free to share!"
+            }
+        }
+    ]
+}
+```
+
+## Text-to-Speech: Converting Text to Audio
+
+Convert text into natural-sounding speech using AI voice models. This example demonstrates generating an MP3 audio file from text using the "alloy" voice. The result is returned as binary audio data.
+
+```bash
+curl --location 'http://localhost:8080/v1/audio/speech' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/tts-1",
+    "input": "Hello! This is a sample text that will be converted to speech using Bifrost speech synthesis capabilities. The weather today is wonderful, and I hope you are having a great day!",
+    "voice": "alloy",
+    "response_format": "mp3"
+}' \
+--output "output.mp3"
+```
+
+**Save audio to file:**
+```bash
+# The --output flag saves the binary audio data directly to a file
+# File size will vary based on input text length
+```
+
+## Speech-to-Text: Transcribing Audio Files
+
+Convert audio files into text using AI transcription models. This example shows how to transcribe an MP3 file using OpenAI's Whisper model, with an optional context prompt to improve accuracy.
+
+```bash
+curl --location 'http://localhost:8080/v1/audio/transcriptions' \
+--form 'file=@"output.mp3"' \
+--form 'model="openai/whisper-1"' \
+--form 'prompt="This is a sample audio transcription from Bifrost speech synthesis."'
+```
+
+**Response format:**
+```json
+{
+    "text": "Hello! This is a sample text that will be converted to speech using Bifrost speech synthesis capabilities. The weather today is wonderful, and I hope you are having a great day!"
+}
+```
+
+## Advanced Vision Examples
+
+### Multiple Images
+
+Send multiple images in a single request for comparison or analysis. This is useful for comparing products, analyzing changes over time, or understanding relationships between different visual elements.
+
+```bash
+curl --location 'http://localhost:8080/v1/chat/completions' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/gpt-4o",
+    "messages": [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "Compare these two images. What are the differences?"
+                },
+                {
+                    "type": "image_url",
+                    "image_url": {
+                        "url": "https://example.com/image1.jpg"
+                    }
+                },
+                {
+                    "type": "image_url",
+                    "image_url": {
+                        "url": "https://example.com/image2.jpg"
+                    }
+                }
+            ]
+        }
+    ]
+}'
+```
+
+### Base64 Images
+
+Process local images by encoding them as base64 data URLs. This approach is ideal when you need to analyze images stored locally on your system without uploading them to external URLs first.
+
+```bash
+# First, encode your local image to base64
+base64_image=$(base64 -i local_image.jpg)
+data_url="data:image/jpeg;base64,$base64_image"
+
+curl --location 'http://localhost:8080/v1/chat/completions' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/gpt-4o",
+    "messages": [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "Analyze this image and describe what you see."
+                },
+                {
+                    "type": "image_url",
+                    "image_url": {
+                        "url": "'$data_url'",
+                        "detail": "high"
+                    }
+                }
+            ]
+        }
+    ]
+}'
+```
+
+## Audio Configuration Options
+
+### Voice Selection for Speech Synthesis
+
+OpenAI provides six distinct voice options, each with different characteristics:
+
+- `alloy` - Balanced, natural voice
+- `echo` - Deep, resonant voice  
+- `fable` - Expressive, storytelling voice
+- `onyx` - Strong, confident voice
+- `nova` - Bright, energetic voice
+- `shimmer` - Gentle, soothing voice
+
+```bash
+# Example with different voice
+curl --location 'http://localhost:8080/v1/audio/speech' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/tts-1",
+    "input": "This is the nova voice speaking.",
+    "voice": "nova",
+    "response_format": "mp3"
+}' \
+--output "sample_nova.mp3"
+```
+
+### Audio Formats
+
+Generate audio in different formats depending on your use case. MP3 for general use, Opus for web streaming, AAC for mobile apps, and FLAC for high-quality audio applications.
+
+```bash
+# MP3 format (default)
+"response_format": "mp3"
+
+# Opus format for web streaming
+"response_format": "opus"
+
+# AAC format for mobile apps
+"response_format": "aac"
+
+# FLAC format for high-quality audio
+"response_format": "flac"
+```
+
+## Transcription Options
+
+### Language Specification
+
+Improve transcription accuracy by specifying the source language. This is particularly helpful for non-English audio or when the audio contains technical terms or specific domain vocabulary.
+
+```bash
+curl --location 'http://localhost:8080/v1/audio/transcriptions' \
+--form 'file=@"spanish_audio.mp3"' \
+--form 'model="openai/whisper-1"' \
+--form 'language="es"' \
+--form 'prompt="This is a Spanish audio recording about technology."'
+```
+
+### Response Formats
+
+Choose between simple text output or detailed JSON responses with timestamps. The verbose JSON format provides word-level and segment-level timing information, useful for creating subtitles or analyzing speech patterns.
+
+```bash
+# Text only response
+curl --location 'http://localhost:8080/v1/audio/transcriptions' \
+--form 'file=@"audio.mp3"' \
+--form 'model="openai/whisper-1"' \
+--form 'response_format="text"'
+
+# JSON with timestamps
+curl --location 'http://localhost:8080/v1/audio/transcriptions' \
+--form 'file=@"audio.mp3"' \
+--form 'model="openai/whisper-1"' \
+--form 'response_format="verbose_json"' \
+--form 'timestamp_granularities[]=word' \
+--form 'timestamp_granularities[]=segment'
+```
+
+<Info>
+Check the [Supported Providers](/providers/supported-providers/overview) page for more information on multimodal capabilities supported by each provider.
+</Info>
+
+## Next Steps
+
+Now that you understand multimodal capabilities, explore these related topics:
+
+### Essential Topics
+
+- **[Streaming Responses](./streaming)** - Real-time multimodal processing
+- **[Tool Calling](./tool-calling)** - Combine with external tools
+- **[Provider Configuration](./provider-configuration)** - Multiple providers for different capabilities
+- **[Integrations](./integrations)** - Drop-in compatibility with existing SDKs
+
+### Advanced Topics
+
+- **[Core Features](../../features/)** - Advanced Bifrost capabilities
+- **[Architecture](../../architecture/)** - How Bifrost works internally
+- **[Deployment](../../deployment-guides)** - Production setup and scaling
--- a/docs/quickstart/gateway/provider-configuration.mdx
+++ b/docs/quickstart/gateway/provider-configuration.mdx
--- a/docs/quickstart/gateway/reranking.mdx
+++ b/docs/quickstart/gateway/reranking.mdx
@@ -0,0 +1,124 @@
+---
+title: "Reranking"
+description: "Reorder documents by relevance to a query using /v1/rerank."
+icon: "book-open-cover"
+---
+
+Use reranking to sort documents by relevance for search, retrieval, and context selection.
+
+## Provider Model Examples
+
+- Cohere: `cohere/rerank-v3.5`
+- vLLM: `vllm/BAAI/bge-reranker-v2-m3`
+- Bedrock: `bedrock/<rerank-model-or-arn>`
+- Vertex AI: `vertex/<ranking-model>`
+
+## Basic Request
+
+```bash
+curl --location 'http://localhost:8080/v1/rerank' \
+--header 'Content-Type: application/json' \
+--data '{
+  "model": "cohere/rerank-v3.5",
+  "query": "What is Bifrost?",
+  "documents": [
+    {"text": "Bifrost is an AI gateway that unifies many LLM providers."},
+    {"text": "Paris is the capital of France."},
+    {"text": "Bifrost exposes an OpenAI-compatible API."}
+  ]
+}'
+```
+
+## Request Parameters
+
+- `model` (required): model in `provider/model` format
+- `query` (required): query used for ranking
+- `documents` (required): array of documents with `text` (optional `id`, `meta`)
+- `top_n` (optional): maximum number of results
+- `max_tokens_per_doc` (optional): provider-dependent document token cap
+- `priority` (optional): provider-dependent priority hint
+- `return_documents` (optional): include matched document content in each result
+- `fallbacks` (optional): fallback models in `provider/model` format
+
+## Example with Options
+
+```bash
+curl --location 'http://localhost:8080/v1/rerank' \
+--header 'Content-Type: application/json' \
+--data '{
+  "model": "cohere/rerank-v3.5",
+  "query": "gateway observability",
+  "top_n": 2,
+  "return_documents": true,
+  "documents": [
+    {"id": "a", "text": "Bifrost supports observability plugins like OTEL and Maxim."},
+    {"id": "b", "text": "Bifrost can run in Kubernetes and ECS."},
+    {"id": "c", "text": "Token counting is available at /v1/responses/input_tokens."}
+  ]
+}'
+```
+
+## vLLM Endpoint Compatibility
+
+When using a `vllm/...` model, Bifrost sends rerank requests to `/v1/rerank` first and automatically retries `/rerank` when the upstream endpoint responds with `404`, `405`, or `501`.
+
+## Response Shape
+
+```json
+{
+  "results": [
+    {
+      "index": 0,
+      "relevance_score": 0.98,
+      "document": {
+        "id": "a",
+        "text": "Bifrost supports observability plugins like OTEL and Maxim."
+      }
+    },
+    {
+      "index": 2,
+      "relevance_score": 0.63,
+      "document": {
+        "id": "c",
+        "text": "Token counting is available at /v1/responses/input_tokens."
+      }
+    }
+  ],
+  "model": "rerank-v3.5",
+  "usage": {
+    "prompt_tokens": 52,
+    "completion_tokens": 0,
+    "total_tokens": 52
+  },
+  "extra_fields": {
+    "request_type": "rerank",
+    "provider": "cohere",
+    "latency": 245,
+    "chunk_index": 0
+  }
+}
+```
+
+## Common Validation Errors
+
+- Missing `query` -> `query is required for rerank`
+- Empty `documents` -> `documents are required for rerank`
+- Blank document text -> `document text is required for rerank at index N`
+- `top_n < 1` -> `top_n must be at least 1`
+
+## Next Steps
+
+Now that you understand reranking, explore these related topics:
+
+### Essential Topics
+
+- **[Multimodal AI](./multimodal)** - Process images, audio, and multimedia content
+- **[Tool Calling](./tool-calling)** - Enable AI models to use external tools and functions
+- **[Provider Configuration](./provider-configuration)** - Multiple providers for redundancy
+- **[Integrations](./integrations)** - Drop-in compatibility with existing SDKs
+
+### Advanced Topics
+
+- **[Core Features](../../features/)** - Advanced Bifrost capabilities
+- **[Architecture](../../architecture/)** - How Bifrost works internally
+- **[Deployment](../../deployment-guides)** - Production setup and scaling
--- a/docs/quickstart/gateway/setting-up-auth.mdx
+++ b/docs/quickstart/gateway/setting-up-auth.mdx
@@ -0,0 +1,143 @@
+---
+title: "Setting up auth"
+description: "Learn how to enable basic authentication for the Bifrost dashboard to secure your admin interface and API endpoints."
+icon: "lock"
+---
+
+<Note>This feature is only available in OSS. For enterprise builds you can setup [SCIM](/enterprise/scim)</Note>
+
+## Overview
+
+Bifrost provides built-in authentication to protect your dashboard and admin API endpoints. When enabled, users must log in with credentials before accessing the dashboard or making admin API calls. This feature helps secure your Bifrost instance, especially when deployed in production environments.
+
+## Enabling Authentication
+
+### Step 1: Navigate to Security Settings
+
+1. Open your Bifrost dashboard
+2. Go to **Workspace** → **Config** → **Security** tab
+3. Scroll to the **Password protect the dashboard** section
+
+![Setting up auth](../../media/setting-up-dashboard-auth.png)
+
+### Step 2: Enable Authentication
+
+1. Toggle the **Password protect the dashboard** switch to enable authentication
+2. Enter your **Username** in the admin username field
+3. Enter your **Password** in the admin password field
+
+<Note>
+  The username and password fields are only enabled when the authentication toggle is turned on. Make sure to use a
+  strong password for security.
+</Note>
+
+### Step 3: Configure Inference Call Authentication (Optional)
+
+By default, when authentication is enabled, all API calls (including inference calls) require authentication. You can optionally disable authentication for inference calls while keeping it enabled for the dashboard and admin API:
+
+1. Enable the **Disable authentication on inference calls** toggle
+2. When enabled:
+   - Dashboard and admin API calls will still require authentication
+   - Inference API calls (chat completions, embeddings, etc.) will not require authentication
+   - MCP tool execution calls will still require authentication
+
+<Note>
+  This option is useful if you want to protect your dashboard and admin functions while allowing public access to
+  inference endpoints.
+</Note>
+
+### Step 4: Configure Whitelisted Routes (Optional)
+
+You can configure specific routes that bypass the authentication middleware entirely. Requests to these routes will not require authentication, even when auth is enabled.
+
+1. Scroll to the **Whitelisted Routes** section
+2. Enter a comma-separated list of routes in the textarea
+
+![Whitelisted Routes Configuration](../../media/ui-security-whitelisted-routes.png)
+
+**Wildcard support:** Routes ending with `*` are treated as prefix matches. For example, `/api/webhook*` will match `/api/webhook`, `/api/webhook/v1`, `/api/webhook/github`, etc.
+
+**Example values:**
+
+```
+/api/custom-webhook, /api/public-endpoint, /api/webhook*
+```
+
+<Note>
+  System routes like `/health`, `/api/session/login`, `/api/session/is-auth-enabled`, `/api/oauth/callback`, and
+  `/api/info` are always whitelisted regardless of this setting. Whitelisted routes only apply to dashboard and admin
+  API endpoints — inference endpoints have their own toggle (see Step 3).
+</Note>
+
+### Step 5: Save Changes
+
+1. Click **Save Changes** to apply your authentication settings
+2. Changes take effect immediately - no restart required
+
+## Logging In
+
+Once authentication is enabled:
+
+1. Navigate to your Bifrost dashboard URL
+2. You will be automatically redirected to the login page
+3. Enter your configured username and password
+4. Click **Sign in**
+
+After successful login, you'll be redirected to the dashboard. Your session will remain active for 30 days, and you'll need to log in again after the session expires.
+
+## Authentication Methods
+
+Bifrost supports different authentication methods depending on the type of request:
+
+### Dashboard Access
+
+- **Bearer Token Authentication**: The dashboard uses Bearer token authentication
+- Tokens are automatically managed through the login session
+- Tokens are stored in browser localStorage and sent with each API request
+
+### API Calls
+
+When authentication is enabled, API calls can be made using
+
+- **Basic Authentication**: Username and password encoded as base64 via HTTP Basic auth
+- **Bearer Token**: Session token issued after login (Bearer token from session)
+
+When authentication is enabled for inference calls (i.e., the "Disable authentication on inference calls" toggle is OFF), inference calls can be made using:
+
+- **Basic Authentication**: Username and Password in Basic auth
+- **Bearer Token**: base64 string of username:password as bearer token
+
+### Whitelisted Routes
+
+When a route is added to the whitelisted routes list in Security settings, requests to that path bypass authentication entirely — no Basic Auth or Bearer Token is required. This applies only to dashboard and admin API endpoints. Inference endpoints are controlled separately via the "Disable authentication on inference calls" toggle.
+
+### Example: Using Basic Auth for Inference Calls
+
+```bash
+# Using curl with Basic Auth
+curl -X POST http://localhost:8080/v1/chat/completions \
+  -u "your-username:your-password" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "openai/gpt-4o",
+    "messages": [{"role": "user", "content": "Hello!"}]
+  }'
+```
+
+## Important Notes
+
+- **No Restart Required**: Authentication changes take effect immediately without requiring a server restart
+- **Session Duration**: Login sessions last for 30 days
+- **Password Security**: Passwords are hashed and stored securely in the database
+- **Inference Calls**: If you disable authentication on inference calls, only dashboard and admin API endpoints will be protected
+- **Whitelisted Routes**: Routes configured in Security settings bypass auth on dashboard/admin API endpoints only. Use `*` at the end of a route for prefix matching (e.g., `/api/webhook*`)
+
+## Disabling Authentication
+
+To disable authentication:
+
+1. Navigate to **Workspace** → **Config** → **Security**
+2. Toggle off the **Password protect the dashboard** switch
+3. Click **Save Changes**
+
+After disabling, the dashboard will be accessible without authentication immediately.
--- a/docs/quickstart/gateway/setting-up.mdx
+++ b/docs/quickstart/gateway/setting-up.mdx
@@ -0,0 +1,262 @@
+---
+title: "Setting Up"
+description: "Get Bifrost running as an HTTP API gateway in 30 seconds with zero configuration. Perfect for any programming language."
+icon: "play"
+---
+
+![Bifrost Gateway Installation](../../media/getting-started.png)
+
+## 30-Second Setup
+
+Get Bifrost running as a blazing-fast HTTP API gateway with **zero configuration**. Connect to any AI provider (OpenAI, Anthropic, Bedrock, and more) through a unified API that follows **OpenAI request/response format**.
+
+### 1. Choose Your Setup Method
+
+Both options work perfectly - choose what fits your workflow:
+
+#### NPX Binary
+
+<video width="100%" controls>
+  <source src="https://github.com/maximhq/bifrost/raw/refs/heads/main/docs/media/run-npx.mp4" type="video/mp4" />
+  Your browser does not support the video tag.
+</video>
+
+```bash
+# Install and run locally
+npx -y @maximhq/bifrost
+
+# Install a specific version
+npx -y @maximhq/bifrost --transport-version v1.3.9
+```
+
+#### Docker
+
+```bash
+# Pull and run Bifrost HTTP API
+docker pull maximhq/bifrost
+docker run -p 8080:8080 maximhq/bifrost
+
+# Pull a specific version
+docker pull maximhq/bifrost:v1.3.9
+docker pull maximhq/bifrost:v1.3.9-amd64
+docker pull maximhq/bifrost:v1.3.9-arm64
+```
+
+**For Data Persistence**
+
+```bash
+# For configuration persistence across restarts
+docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost
+```
+
+### 2. Configuration Flags
+
+| Flag      | Default   | NPX               | Docker                          | Description                          |
+| --------- | --------- | ----------------- | ------------------------------- | ------------------------------------ |
+| port      | 8080      | `-port 8080`      | `-e APP_PORT=8080 -p 8080:8080` | HTTP server port                     |
+| host      | localhost | `-host 0.0.0.0`   | `-e APP_HOST=0.0.0.0`           | Host to bind server to               |
+| log-level | info      | `-log-level info` | `-e LOG_LEVEL=info`             | Log level (debug, info, warn, error) |
+| log-style | json      | `-log-style json` | `-e LOG_STYLE=json`             | Log style (pretty, json)             |
+
+**Understanding App Directory**
+
+The `-app-dir` flag determines where Bifrost stores all its data:
+
+```bash
+# Specify custom directory
+npx -y @maximhq/bifrost -app-dir ./my-bifrost-data
+
+# If not specified, creates in your OS config directory:
+# • Linux/macOS: ~/.config/bifrost
+# • Windows: %APPDATA%\bifrost
+```
+
+**What's stored in app-dir:**
+
+- `config.json` - Configuration file (optional)
+- `config.db` - SQLite database for UI configuration
+- `logs.db` - Request logs database
+
+**Note:** When using Bifrost via Docker, the volume you mount will be used as the app-dir.
+
+### 3. Open the Web Interface
+
+Navigate to **http://localhost:8080** in your browser:
+
+```bash
+# macOS
+open http://localhost:8080
+
+# Linux
+xdg-open http://localhost:8080
+
+# Windows
+start http://localhost:8080
+```
+
+🖥️ **The Web UI provides:**
+
+- **Visual provider setup** - Add API keys with clicks, not code
+- **Real-time configuration** - Changes apply immediately
+- **Live monitoring** - Request logs, metrics, and analytics
+- **Governance management** - Virtual keys, usage budgets, and more
+
+### 4. Test Your First API Call
+
+```bash
+curl -X POST http://localhost:8080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "openai/gpt-4o-mini",
+    "messages": [{"role": "user", "content": "Hello, Bifrost!"}]
+  }'
+```
+
+**🎉 That's it!** Bifrost is running and ready to route AI requests.
+
+### What Just Happened?
+
+1. **Zero Configuration Start**: Bifrost launched without any config files - everything can be configured through the Web UI or API
+2. **OpenAI-Compatible API**: All Bifrost APIs follow OpenAI request/response format for seamless integration
+3. **Unified API Endpoint**: `/v1/chat/completions` works with any provider (OpenAI, Anthropic, Bedrock, etc.)
+4. **Provider Resolution**: `openai/gpt-4o-mini` tells Bifrost to use OpenAI's GPT-4o Mini model
+5. **Automatic Routing**: Bifrost handles authentication, rate limiting, and request routing automatically
+
+---
+
+## Two Configuration Modes
+
+Bifrost supports **two configuration approaches** - you cannot use both simultaneously:
+
+### Mode 1: Web UI Configuration
+
+![Configuration via UI](../../media/ui-config.png)
+
+**When the UI is available:**
+
+- No `config.json` file exists (Bifrost auto-creates SQLite database)
+- `config.json` exists with `config_store` configured
+
+### Mode 2: File-based Configuration
+
+<Note>You can view entire config schema [here](https://www.getbifrost.ai/schema)</Note>
+
+**When to use:** Advanced setups, GitOps workflows, or when UI is not needed
+
+Create `config.json` in your app directory:
+
+```json
+{
+  "$schema": "https://www.getbifrost.ai/schema",
+  "client": {
+    "drop_excess_requests": false
+  },
+  "providers": {
+    "openai": {
+      "keys": [
+        {
+          "name": "openai-key-1",
+          "value": "env.OPENAI_API_KEY",
+          "models": ["gpt-4o-mini", "gpt-4o"],
+          "weight": 1.0
+        }
+      ]
+    }
+  },
+  "config_store": {
+    "enabled": true,
+    "type": "sqlite",
+    "config": {
+      "path": "./config.db"
+    }
+  }
+}
+```
+
+**Without `config_store` in `config.json`:**
+
+- **UI is disabled** - no real-time configuration possible
+- **Read-only mode** - `config.json` is never modified
+- **Memory-only** - all configurations loaded into memory at startup
+- **Restart required** - changes to `config.json` only apply after restart
+
+**With `config_store` in `config.json`:**
+
+- **UI is enabled** - full real-time configuration via web interface
+- **Database check** - Bifrost checks if config store database exists and has data
+  - **Empty DB**: Bootstraps database with `config.json` settings, then uses DB exclusively
+  - **Existing DB**: Uses database directly, **ignores** `config.json` configurations
+- **Persistent storage** - all changes saved to database immediately
+
+**Important for Advanced Users:**
+If you want database persistence but prefer not to use the UI, note that modifying `config.json` after initial bootstrap has no effect when `config_store` is enabled. Use the public HTTP APIs to make configuration changes instead.
+
+**The Three Stores Explained:**
+
+- **Config Store**: Stores provider configs, API keys, MCP settings - Required for UI functionality
+- **Logs Store**: Stores request logs shown in UI - Optional, can be disabled
+- **Vector Store**: Used for semantic caching - Optional, can be disabled
+
+## PostgreSQL UTF8 Requirement
+
+<Note>
+  The minimum PostgreSQL version required is 16 or above.
+</Note>
+
+<Note>
+  For the log store, Bifrost creates materialized views to improve analytics performance. Ensure that the PostgreSQL user
+  has the necessary permissions to perform these operations on the target schema.
+</Note>
+
+If you use PostgreSQL for `config_store` or `logs_store`, the target database must use `UTF8` encoding.
+
+Use `template0` when creating the database so PostgreSQL applies UTF8 and locale settings explicitly:
+
+```sql
+CREATE DATABASE bifrost
+  WITH TEMPLATE template0
+       ENCODING 'UTF8'
+       LC_COLLATE '<your-locale>'
+       LC_CTYPE '<your-locale>';
+```
+
+Use locale names that exist in your Postgres image/host (for example, `en_US.UTF-8`, `C.UTF-8`, or another installed UTF-8 locale).
+
+Verify the database encoding:
+
+```sql
+SELECT datname, pg_encoding_to_char(encoding) AS encoding
+FROM pg_database
+WHERE datname = 'bifrost';
+```
+
+If the database is not UTF8, Bifrost startup/migrations can fail with:
+
+```text
+simple protocol queries must be run with client_encoding=UTF8
+```
+
+If you already created a SQL_ASCII database, create a new UTF8 database and update your Bifrost DB config to point to it.
+
+---
+
+## Next Steps
+
+Now that you have Bifrost running, explore these focused guides:
+
+### Essential Topics
+
+- **[Provider Configuration](./provider-configuration)** - Multiple providers, automatic failovers & load balancing
+- **[Integrations](../../integrations/what-is-an-integration)** - Drop-in replacements for OpenAI, Anthropic, and GenAI SDKs
+- **[Multimodal Support](./multimodal)** - Support for text, images, audio, and streaming, all behind a common interface.
+
+### Advanced Topics
+
+- **[Tracing](../../features/observability/default)** - Logging requests for monitoring and debugging
+- **[MCP Tools](../../mcp/overview)** - Enable AI models to use external tools (filesystem, web search, databases)
+- **[Governance](../../features/governance/virtual-keys)** - Usage tracking, rate limiting, and cost control
+- **[Deployment](../../deployment-guides/k8s)** - Production setup and scaling
+
+---
+
+**Happy building with Bifrost!** 🚀
--- a/docs/quickstart/gateway/streaming.mdx
+++ b/docs/quickstart/gateway/streaming.mdx
@@ -0,0 +1,174 @@
+---
+title: "Streaming Responses"
+description: "Receive AI responses in real-time via Server-Sent Events. Perfect for chat applications, audio processing, and real-time transcription where you want immediate results."
+icon: "water"
+---
+
+
+## Streaming Text Completion
+
+Request text completions with streaming enabled to receive partial `text` chunks as they are generated.
+
+```bash
+curl --location 'http://localhost:8080/v1/completions' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/gpt-4o-mini",
+    "prompt": "Write a short haiku about the ocean",
+    "stream": true
+}'
+```
+
+**Response Format (Server-Sent Events):**
+```
+data: {"choices":[{"text":"Waves whisper soft"}],"model":"gpt-4o-mini"}
+
+data: {"choices":[{"text":" on distant shores, the moon calls"}],"model":"gpt-4o-mini"}
+
+data: {"choices":[{"text":" tides to rise."}],"model":"gpt-4o-mini"}
+
+data: [DONE]
+```
+
+## Streaming Chat Responses
+
+Receive AI responses in real-time as they're generated. Perfect for chat applications where you want to show responses as they're being typed, improving user experience.
+
+```bash
+curl --location 'http://localhost:8080/v1/chat/completions' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/gpt-4o-mini",
+    "messages": [
+        {"role": "user", "content": "Tell me a story about a robot learning to paint"}
+    ],
+    "stream": true
+}'
+```
+
+**Response Format (Server-Sent Events):**
+```
+data: {"choices":[{"delta":{"content":"Once"}}],"model":"gpt-4o-mini"}
+
+data: {"choices":[{"delta":{"content":" upon"}}],"model":"gpt-4o-mini"}
+
+data: {"choices":[{"delta":{"content":" a"}}],"model":"gpt-4o-mini"}
+
+data: [DONE]
+```
+
+Each chunk contains partial content that you can append to build the complete response in real-time.
+
+> **Note:** Streaming requests also follow the default timeout setting defined in provider configuration, which defaults to **30 seconds**.
+
+<Note>
+Bifrost standardizes all stream responses to send usage and finish reason only in the last chunk, and content in the previous chunks.
+</Note>
+
+## Responses API Streaming
+
+Stream the OpenAI-style Responses API with event-based SSE. This includes `event:` lines and does not use the `[DONE]` marker; the stream ends when the connection closes.
+
+```bash
+curl --location 'http://localhost:8080/v1/responses' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/gpt-4o-mini",
+    "input": "Tell me one interesting fact about Mars",
+    "stream": true
+}'
+```
+
+**Response Format (Server-Sent Events):**
+```
+event: response.created
+data: {"type":"response.created"}
+
+event: response.output_text.delta
+data: {"type":"response.output_text.delta","delta": /* partial text delta payload */ }
+
+event: response.output_text.delta
+data: {"type":"response.output_text.delta","delta": * more text delta */ }
+
+event: response.completed
+data: {"type":"response.completed","response":{ /* usage, finish_reason, etc. */ }}
+```
+
+## Text-to-Speech Streaming: Real-time Audio Generation
+
+Stream audio generation in real-time as text is converted to speech. Ideal for long texts or when you need immediate audio playback.
+
+```bash
+curl --location 'http://localhost:8080/v1/audio/speech' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/gpt-4o-mini-tts",
+    "input": "Hello this is a sample test, respond with hello for my Bifrost",
+    "voice": "alloy",
+    "stream_format": "sse"
+}'
+```
+
+**Response:** Audio chunks are delivered via Server-Sent Events. Each chunk contains base64-encoded audio data that you can decode and play or save progressively.
+
+```
+data: {"audio":"UklGRigAAABXQVZFZm10IBAAAAABAAEA..."}
+
+data: {"audio":"AKlFQVZFZm10IBAAAAABAAEAq..."}
+
+data: [DONE]
+```
+
+**To save the stream:** Add `> audio_stream.txt` to redirect output to a file.
+
+## Speech-to-Text Streaming: Real-time Audio Transcription
+
+Stream audio transcription results as they're processed. Get immediate text output for real-time applications or long audio files.
+
+```bash
+curl --location 'http://localhost:8080/v1/audio/transcriptions' \
+--form 'file=@"/path/to/your/audio.mp3"' \
+--form 'model="openai/gpt-4o-transcribe"' \
+--form 'stream="true"' \
+--form 'response_format="json"'
+```
+
+**Response Format:**
+```
+data: {"text":"Hello"}
+
+data: {"text":" this"}
+
+data: {"text":" is"}
+
+data: {"text":" a sample"}
+
+data: [DONE]
+```
+
+**Additional options:** Add `--form 'language="en"'` or `--form 'prompt="context hint"'` for better accuracy.
+
+## Audio Format Support
+
+**Speech Synthesis:** Supports `"response_format": "mp3"` (default) and `"response_format": "wav"`
+
+**Transcription Input:** Accepts MP3, WAV, M4A, and other common audio formats
+
+> **Note:** Streaming capabilities vary by provider and model. Check each provider's documentation for specific streaming support and limitations.
+
+## Next Steps
+
+Now that you understand streaming responses, explore these related topics:
+
+### Essential Topics
+
+- **[Tool Calling](./tool-calling)** - Enable AI models to use external tools and functions
+- **[Multimodal AI](./multimodal)** - Process images, audio, and multimedia content
+- **[Provider Configuration](./provider-configuration)** - Multiple providers for redundancy
+- **[Integrations](./integrations)** - Drop-in compatibility with existing SDKs
+
+### Advanced Topics
+
+- **[Core Features](../../features/)** - Advanced Bifrost capabilities
+- **[Architecture](../../architecture/)** - How Bifrost works internally
+- **[Deployment](../../deployment-guides)** - Production setup and scaling
--- a/docs/quickstart/gateway/tool-calling.mdx
+++ b/docs/quickstart/gateway/tool-calling.mdx
@@ -0,0 +1,165 @@
+---
+title: "Tool Calling"
+description: "Enable AI models to use external functions and services by defining tool schemas or connecting to Model Context Protocol (MCP) servers. This allows AI to interact with databases, APIs, file systems, and more."
+icon: "wrench"
+---
+
+## Function Calling with Custom Tools
+
+Enable AI models to use external functions by defining tool schemas using OpenAI format. Models can then call these functions automatically based on user requests.
+
+```bash
+curl --location 'http://localhost:8080/v1/chat/completions' \
+--header 'Content-Type: application/json' \
+--data '{
+    "model": "openai/gpt-4o-mini",
+    "messages": [
+        {"role": "user", "content": "What is 15 + 27? Use the calculator tool."}
+    ],
+    "tools": [
+        {
+            "type": "function",
+            "function": {
+                "name": "calculator",
+                "description": "A calculator tool for basic arithmetic operations",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "operation": {
+                            "type": "string",
+                            "description": "The operation to perform",
+                            "enum": ["add", "subtract", "multiply", "divide"]
+                        },
+                        "a": {
+                            "type": "number",
+                            "description": "The first number"
+                        },
+                        "b": {
+                            "type": "number",
+                            "description": "The second number"
+                        }
+                    },
+                    "required": ["operation", "a", "b"]
+                }
+            }
+        }
+    ],
+    "tool_choice": "auto"
+}'
+```
+
+**Response includes tool calls:**
+```json
+{
+    "choices": [{
+        "message": {
+            "role": "assistant", 
+            "tool_calls": [{
+                "id": "call_abc123",
+                "type": "function",
+                "function": {
+                    "name": "calculator",
+                    "arguments": "{\"operation\":\"add\",\"a\":15,\"b\":27}"
+                }
+            }]
+        }
+    }]
+}
+```
+
+## Connecting to MCP Servers
+
+Connect to Model Context Protocol (MCP) servers to give AI models access to external tools and services without manually defining each function.
+
+<Tabs group="tool-calling">
+<Tab title="Using Web UI">
+![MCP Configuration Interface](../../media/ui-mcp-config.png)
+
+1. Go to **http://localhost:8080**
+2. Navigate to **"MCP Clients"** in the sidebar
+3. Click **"Add MCP Client"**
+4. Enter server details and save
+</Tab>
+
+<Tab title="Using API">
+```bash
+curl --location 'http://localhost:8080/api/mcp/client' \
+--header 'Content-Type: application/json' \
+--data '{
+    "name": "filesystem",
+    "connection_type": "stdio",
+    "stdio_config": {
+        "command": ["npx", "@modelcontextprotocol/server-filesystem", "/tmp"],
+        "args": []
+    }
+}'
+```
+
+**List configured MCP clients:**
+```bash
+curl --location 'http://localhost:8080/api/mcp/clients'
+```
+</Tab>
+
+<Tab title="Using config.json">
+```json
+{
+    "mcp": {
+        "client_configs": [
+            {
+                "name": "filesystem",
+                "connection_type": "stdio",
+                "stdio_config": {
+                    "command": ["npx", "@modelcontextprotocol/server-filesystem", "/tmp"],
+                    "args": []
+                }
+            },
+            {
+                "name": "youtube-search",
+                "connection_type": "http",
+                "connection_string": "http://your-youtube-mcp-url"
+            }
+        ]
+    }
+}
+```
+</Tab>
+
+</Tabs>
+
+Read more about MCP connections and advanced end to end tool execution in the [MCP Features](../../mcp/overview) section.
+
+## Tool Choice Options
+
+Control how the AI uses tools:
+
+```bash
+# Force use of specific tool
+"tool_choice": {
+    "type": "function",
+    "function": {"name": "calculator"}
+}
+
+# Let AI decide automatically (default)
+"tool_choice": "auto"
+
+# Disable tool usage
+"tool_choice": "none"
+```
+
+## Next Steps
+
+Now that you understand tool calling, explore these related topics:
+
+### Essential Topics
+
+- **[Multimodal AI](./multimodal)** - Process images, audio, and multimedia content
+- **[Streaming Responses](./streaming)** - Real-time response generation with tool calls
+- **[Provider Configuration](./provider-configuration)** - Multiple providers for redundancy
+- **[Integrations](./integrations)** - Drop-in compatibility with existing SDKs
+
+### Advanced Topics
+
+- **[MCP Features](../../mcp/overview)** - Advanced MCP server management and configuration
+- **[Core Features](../../features/drop-in-replacement)** - Advanced Bifrost capabilities
+- **[Architecture](../../architecture/core/request-flow)** - How Bifrost works internally