first commit

2026-04-26 21:52:23 +03:00
commit 880f412e2c
2662 changed files with 866266 additions and 0 deletions
--- a/docs/integrations/openai-sdk/files-and-batch.mdx
+++ b/docs/integrations/openai-sdk/files-and-batch.mdx
@@ -0,0 +1,669 @@
+---
+title: "Files and Batch API"
+description: "Upload files and create batch jobs for asynchronous processing using the OpenAI SDK through Bifrost across multiple providers."
+tag: "Beta"
+icon: "folder-open"
+---
+
+## Overview
+
+Bifrost supports the OpenAI Files API and Batch API with **cross-provider routing**. This means you can use the familiar OpenAI SDK to manage files and batch jobs across multiple providers including OpenAI, Anthropic, Bedrock, and Gemini.
+
+The provider is specified using `extra_body` (for POST requests) or `extra_query` (for GET requests) parameters.
+
+---
+
+## Client Setup
+
+The base client setup is the same for all providers. The provider is specified per-request:
+
+```python
+from openai import OpenAI
+
+client = OpenAI(
+    base_url="http://localhost:8080/openai",
+    api_key="your-api-key"  # Your actual API key
+)
+```
+
+---
+
+## Files API
+
+### Upload a File
+
+<Note>
+**Bedrock** requires S3 storage configuration. OpenAI and Gemini use their native file storage. Anthropic uses inline requests (no file upload).
+</Note>
+
+<Tabs group="provider">
+<Tab title="OpenAI Provider">
+
+```python
+from openai import OpenAI
+
+client = OpenAI(
+    base_url="http://localhost:8080/openai",
+    api_key="your-openai-api-key"
+)
+
+# Create JSONL content for OpenAI batch format
+jsonl_content = '''{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 100}}
+{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "How are you?"}], "max_tokens": 100}}'''
+
+# Upload file (uses OpenAI's native file storage)
+response = client.files.create(
+    file=("batch_input.jsonl", jsonl_content.encode(), "application/jsonl"),
+    purpose="batch",
+    extra_body={"provider": "openai"},
+)
+
+print(f"Uploaded file ID: {response.id}")
+```
+
+</Tab>
+<Tab title="Bedrock Provider">
+
+For Bedrock, you need to provide S3 storage configuration:
+
+```python
+from openai import OpenAI
+
+client = OpenAI(
+    base_url="http://localhost:8080/openai",
+    api_key="your-api-key"
+)
+
+# Create JSONL content using OpenAI-style format (Bifrost converts to Bedrock format internally)
+jsonl_content = '''{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic.claude-3-sonnet-20240229-v1:0", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 100}}
+{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic.claude-3-sonnet-20240229-v1:0", "messages": [{"role": "user", "content": "How are you?"}], "max_tokens": 100}}'''
+
+# Upload file with S3 storage configuration
+response = client.files.create(
+    file=("batch_input.jsonl", jsonl_content.encode(), "application/jsonl"),
+    purpose="batch",
+    extra_body={
+        "provider": "bedrock",
+        "storage_config": {
+            "s3": {
+                "bucket": "your-s3-bucket",
+                "region": "us-west-2",
+                "prefix": "bifrost-batch-output",
+            },
+        },
+    },
+)
+
+print(f"Uploaded file ID: {response.id}")
+```
+
+</Tab>
+<Tab title="Anthropic Provider">
+
+Anthropic uses inline requests for batching (no file upload needed). See the Batch API section below.
+
+</Tab>
+<Tab title="Gemini Provider">
+
+```python
+from openai import OpenAI
+
+client = OpenAI(
+    base_url="http://localhost:8080/openai",
+    api_key="your-api-key"
+)
+
+# Create JSONL content using OpenAI-style format (Bifrost converts to Gemini format internally)
+jsonl_content = '''{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 100}}
+{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash", "messages": [{"role": "user", "content": "How are you?"}], "max_tokens": 100}}'''
+
+# Upload file (uses Gemini's native file storage)
+response = client.files.create(
+    file=("batch_input.jsonl", jsonl_content.encode(), "application/jsonl"),
+    purpose="batch",
+    extra_body={"provider": "gemini"},
+)
+
+print(f"Uploaded file ID: {response.id}")
+```
+
+</Tab>
+</Tabs>
+
+### List Files
+
+```python
+# List files for OpenAI or Gemini (no S3 config needed)
+response = client.files.list(
+    extra_query={"provider": "openai"}  # or "gemini"
+)
+
+for file in response.data:
+    print(f"File ID: {file.id}, Name: {file.filename}")
+
+# For Bedrock (requires S3 config)
+response = client.files.list(
+    extra_query={
+        "provider": "bedrock",
+        "storage_config": {
+            "s3": {
+                "bucket": "your-s3-bucket",
+                "region": "us-west-2",
+                "prefix": "bifrost-batch-output",
+            },
+        },
+    }
+)
+```
+
+### Retrieve File Metadata
+
+```python
+# Retrieve file metadata (specify provider)
+file_id = "file-abc123"
+response = client.files.retrieve(
+    file_id,
+    extra_query={"provider": "bedrock"}  # or "openai", "gemini"
+)
+
+print(f"File ID: {response.id}")
+print(f"Filename: {response.filename}")
+print(f"Purpose: {response.purpose}")
+print(f"Bytes: {response.bytes}")
+```
+
+### Delete a File
+
+```python
+# Delete file (specify provider)
+file_id = "file-abc123"
+response = client.files.delete(
+    file_id,
+    extra_query={"provider": "bedrock"}  # or "openai", "gemini"
+)
+
+print(f"Deleted: {response.deleted}")
+```
+
+### Download File Content
+
+```python
+# Download file content (specify provider)
+file_id = "file-abc123"
+response = client.files.content(
+    file_id,
+    extra_query={"provider": "bedrock"}  # or "openai", "gemini"
+)
+
+# Handle different response types
+if hasattr(response, "read"):
+    content = response.read()
+elif hasattr(response, "content"):
+    content = response.content
+else:
+    content = response
+
+# Decode bytes to string if needed
+if isinstance(content, bytes):
+    content = content.decode("utf-8")
+
+print(f"File content:\n{content}")
+```
+
+---
+
+## Batch API
+
+### Create a Batch
+
+<Tabs group="provider">
+<Tab title="OpenAI Provider">
+
+For native OpenAI batching:
+
+```python
+from openai import OpenAI
+
+client = OpenAI(
+    base_url="http://localhost:8080/openai",
+    api_key="your-openai-api-key"
+)
+
+# First upload a file (see Files API section)
+# Then create batch using the file ID
+
+batch = client.batches.create(
+    input_file_id="file-abc123",
+    endpoint="/v1/chat/completions",
+    completion_window="24h",
+    extra_body={"provider": "openai"},
+)
+
+print(f"Batch ID: {batch.id}")
+print(f"Status: {batch.status}")
+```
+
+</Tab>
+<Tab title="Bedrock Provider">
+
+For Bedrock, you need to provide output S3 URI:
+
+```python
+from openai import OpenAI
+
+client = OpenAI(
+    base_url="http://localhost:8080/openai",
+    api_key="your-api-key"
+)
+
+# First upload a file with S3 config (see Files API section)
+# Then create batch using the file ID
+
+batch = client.batches.create(
+    input_file_id="file-abc123",
+    endpoint="/v1/chat/completions",
+    completion_window="24h",
+    extra_body={
+        "provider": "bedrock",
+        "model": "anthropic.claude-3-sonnet-20240229-v1:0",
+        "output_s3_uri": "s3://your-bucket/batch-output",
+    },
+)
+
+print(f"Batch ID: {batch.id}")
+print(f"Status: {batch.status}")
+```
+
+</Tab>
+<Tab title="Anthropic Provider">
+
+Anthropic supports inline requests (no file upload required):
+
+```python
+from openai import OpenAI
+
+client = OpenAI(
+    base_url="http://localhost:8080/openai",
+    api_key="your-anthropic-api-key"
+)
+
+# Create inline requests for Anthropic
+requests = [
+    {
+        "custom_id": "request-1",
+        "params": {
+            "model": "claude-3-sonnet-20240229",
+            "max_tokens": 100,
+            "messages": [{"role": "user", "content": "Hello!"}]
+        }
+    },
+    {
+        "custom_id": "request-2",
+        "params": {
+            "model": "claude-3-sonnet-20240229",
+            "max_tokens": 100,
+            "messages": [{"role": "user", "content": "How are you?"}]
+        }
+    }
+]
+
+# Create batch with inline requests (no file ID needed)
+batch = client.batches.create(
+    input_file_id="",  # Empty for inline requests
+    endpoint="/v1/chat/completions",
+    completion_window="24h",
+    extra_body={
+        "provider": "anthropic",
+        "requests": requests,
+    },
+)
+
+print(f"Batch ID: {batch.id}")
+print(f"Status: {batch.status}")
+```
+
+</Tab>
+<Tab title="Gemini Provider">
+
+```python
+from openai import OpenAI
+
+client = OpenAI(
+    base_url="http://localhost:8080/openai",
+    api_key="your-api-key"
+)
+
+# First upload a file with Gemini format (see Files API section)
+# Then create batch using the file ID
+
+batch = client.batches.create(
+    input_file_id="file-abc123",
+    endpoint="/v1/chat/completions",
+    completion_window="24h",
+    extra_body={
+        "provider": "gemini",
+        "model": "gemini-1.5-flash",
+    },
+)
+
+print(f"Batch ID: {batch.id}")
+print(f"Status: {batch.status}")
+```
+
+</Tab>
+</Tabs>
+
+### List Batches
+
+```python
+# List batches (specify provider)
+response = client.batches.list(
+    limit=10,
+    extra_query={
+        "provider": "bedrock",  # or "openai", "anthropic", "gemini"
+        "model": "anthropic.claude-3-sonnet-20240229-v1:0",  # Required for bedrock
+    }
+)
+
+for batch in response.data:
+    print(f"Batch ID: {batch.id}, Status: {batch.status}")
+```
+
+### Retrieve Batch Status
+
+```python
+# Retrieve batch status (specify provider)
+batch_id = "batch-abc123"
+batch = client.batches.retrieve(
+    batch_id,
+    extra_query={"provider": "bedrock"}  # or "openai", "anthropic", "gemini"
+)
+
+print(f"Batch ID: {batch.id}")
+print(f"Status: {batch.status}")
+
+if batch.request_counts:
+    print(f"Total: {batch.request_counts.total}")
+    print(f"Completed: {batch.request_counts.completed}")
+    print(f"Failed: {batch.request_counts.failed}")
+```
+
+### Cancel a Batch
+
+```python
+# Cancel batch (specify provider)
+batch_id = "batch-abc123"
+batch = client.batches.cancel(
+    batch_id,
+    extra_body={"provider": "bedrock"}  # or "openai", "anthropic", "gemini"
+)
+
+print(f"Batch ID: {batch.id}")
+print(f"Status: {batch.status}")  # "cancelling" or "cancelled"
+```
+
+---
+
+## End-to-End Workflows
+
+### OpenAI Batch Workflow
+
+```python
+import time
+from openai import OpenAI
+
+client = OpenAI(
+    base_url="http://localhost:8080/openai",
+    api_key="your-openai-api-key"
+)
+
+# Configuration
+provider = "openai"
+
+# Step 1: Create OpenAI JSONL content
+jsonl_content = '''{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "What is 2+2?"}], "max_tokens": 100}}
+{"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 100}}'''
+
+# Step 2: Upload file (uses OpenAI's native file storage)
+print("Step 1: Uploading batch input file...")
+uploaded_file = client.files.create(
+    file=("batch_e2e.jsonl", jsonl_content.encode(), "application/jsonl"),
+    purpose="batch",
+    extra_body={"provider": provider},
+)
+print(f"  Uploaded file: {uploaded_file.id}")
+
+# Step 3: Create batch
+print("Step 2: Creating batch job...")
+batch = client.batches.create(
+    input_file_id=uploaded_file.id,
+    endpoint="/v1/chat/completions",
+    completion_window="24h",
+    extra_body={"provider": provider},
+)
+print(f"  Created batch: {batch.id}, status: {batch.status}")
+
+# Step 4: Poll for completion
+print("Step 3: Polling batch status...")
+for i in range(10):
+    batch = client.batches.retrieve(batch.id, extra_query={"provider": provider})
+    print(f"  Poll {i+1}: status = {batch.status}")
+
+    if batch.status in ["completed", "failed", "expired", "cancelled"]:
+        break
+
+    if batch.request_counts:
+        print(f"    Completed: {batch.request_counts.completed}/{batch.request_counts.total}")
+
+    time.sleep(5)
+
+print(f"\nSuccess! Batch {batch.id} workflow completed.")
+```
+
+### Bedrock Batch Workflow
+
+```python
+import time
+from openai import OpenAI
+
+client = OpenAI(
+    base_url="http://localhost:8080/openai",
+    api_key="your-api-key"
+)
+
+# Configuration
+provider = "bedrock"
+s3_bucket = "your-s3-bucket"
+s3_region = "us-west-2"
+model = "anthropic.claude-3-sonnet-20240229-v1:0"
+
+# Step 1: Create JSONL content using OpenAI-style format (Bifrost converts to Bedrock format internally)
+jsonl_content = '''{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic.claude-3-sonnet-20240229-v1:0", "messages": [{"role": "user", "content": "What is 2+2?"}], "max_tokens": 100}}
+{"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic.claude-3-sonnet-20240229-v1:0", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 100}}'''
+
+# Step 2: Upload file
+print("Step 1: Uploading batch input file...")
+uploaded_file = client.files.create(
+    file=("batch_e2e.jsonl", jsonl_content.encode(), "application/jsonl"),
+    purpose="batch",
+    extra_body={
+        "provider": provider,
+        "storage_config": {
+            "s3": {"bucket": s3_bucket, "region": s3_region, "prefix": "batch-input"},
+        },
+    },
+)
+print(f"  Uploaded file: {uploaded_file.id}")
+
+# Step 3: Create batch
+print("Step 2: Creating batch job...")
+batch = client.batches.create(
+    input_file_id=uploaded_file.id,
+    endpoint="/v1/chat/completions",
+    completion_window="24h",
+    extra_body={
+        "provider": provider,
+        "model": model,
+        "output_s3_uri": f"s3://{s3_bucket}/batch-output",
+    },
+)
+print(f"  Created batch: {batch.id}, status: {batch.status}")
+
+# Step 4: Poll for completion
+print("Step 3: Polling batch status...")
+for i in range(10):
+    batch = client.batches.retrieve(batch.id, extra_query={"provider": provider})
+    print(f"  Poll {i+1}: status = {batch.status}")
+    
+    if batch.status in ["completed", "failed", "expired", "cancelled"]:
+        break
+    
+    if batch.request_counts:
+        print(f"    Completed: {batch.request_counts.completed}/{batch.request_counts.total}")
+    
+    time.sleep(5)
+
+print(f"\nSuccess! Batch {batch.id} workflow completed.")
+```
+
+### Anthropic Inline Batch Workflow
+
+```python
+import time
+from openai import OpenAI
+
+client = OpenAI(
+    base_url="http://localhost:8080/openai",
+    api_key="your-anthropic-api-key"
+)
+
+provider = "anthropic"
+
+# Step 1: Create inline requests
+print("Step 1: Creating inline requests...")
+requests = [
+    {
+        "custom_id": "math-question",
+        "params": {
+            "model": "claude-3-sonnet-20240229",
+            "max_tokens": 100,
+            "messages": [{"role": "user", "content": "What is 15 * 7?"}]
+        }
+    },
+    {
+        "custom_id": "geography-question",
+        "params": {
+            "model": "claude-3-sonnet-20240229",
+            "max_tokens": 100,
+            "messages": [{"role": "user", "content": "What is the largest ocean?"}]
+        }
+    }
+]
+print(f"  Created {len(requests)} inline requests")
+
+# Step 2: Create batch
+print("Step 2: Creating batch job...")
+batch = client.batches.create(
+    input_file_id="",
+    endpoint="/v1/chat/completions",
+    completion_window="24h",
+    extra_body={"provider": provider, "requests": requests},
+)
+print(f"  Created batch: {batch.id}, status: {batch.status}")
+
+# Step 3: Poll for completion
+print("Step 3: Polling batch status...")
+for i in range(10):
+    batch = client.batches.retrieve(batch.id, extra_query={"provider": provider})
+    print(f"  Poll {i+1}: status = {batch.status}")
+    
+    if batch.status in ["completed", "failed", "expired", "cancelled", "ended"]:
+        break
+    
+    time.sleep(5)
+
+print(f"\nSuccess! Batch {batch.id} workflow completed.")
+```
+
+### Gemini Batch Workflow
+
+```python
+import time
+from openai import OpenAI
+
+client = OpenAI(
+    base_url="http://localhost:8080/openai",
+    api_key="your-api-key"
+)
+
+# Configuration
+provider = "gemini"
+model = "gemini-1.5-flash"
+
+# Step 1: Create JSONL content using OpenAI-style format (Bifrost converts to Gemini format internally)
+jsonl_content = '''{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash", "messages": [{"role": "user", "content": "What is 2+2?"}], "max_tokens": 100}}
+{"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 100}}'''
+
+# Step 2: Upload file (uses Gemini's native file storage)
+print("Step 1: Uploading batch input file...")
+uploaded_file = client.files.create(
+    file=("batch_e2e.jsonl", jsonl_content.encode(), "application/jsonl"),
+    purpose="batch",
+    extra_body={"provider": provider},
+)
+print(f"  Uploaded file: {uploaded_file.id}")
+
+# Step 3: Create batch
+print("Step 2: Creating batch job...")
+batch = client.batches.create(
+    input_file_id=uploaded_file.id,
+    endpoint="/v1/chat/completions",
+    completion_window="24h",
+    extra_body={
+        "provider": provider,
+        "model": model,
+    },
+)
+print(f"  Created batch: {batch.id}, status: {batch.status}")
+
+# Step 4: Poll for completion
+print("Step 3: Polling batch status...")
+for i in range(10):
+    batch = client.batches.retrieve(batch.id, extra_query={"provider": provider})
+    print(f"  Poll {i+1}: status = {batch.status}")
+
+    if batch.status in ["completed", "failed", "expired", "cancelled"]:
+        break
+
+    if batch.request_counts:
+        print(f"    Completed: {batch.request_counts.completed}/{batch.request_counts.total}")
+
+    time.sleep(5)
+
+print(f"\nSuccess! Batch {batch.id} workflow completed.")
+```
+
+---
+
+## Provider-Specific Notes
+
+| Provider | File Upload | Batch Creation | Extra Configuration |
+|----------|-------------|----------------|---------------------|
+| **OpenAI** | ✅ Native storage | ✅ File-based | None |
+| **Bedrock** | ✅ S3-based | ✅ File-based | `storage_config`, `output_s3_uri` |
+| **Anthropic** | ❌ Not supported | ✅ Inline requests | `requests` array in `extra_body` |
+| **Gemini** | ✅ Native storage | ✅ File-based | `model` in `extra_body` |
+
+<Note>
+- **OpenAI** and **Gemini** use their native file storage - no S3 configuration needed
+- **Bedrock** requires S3 storage configuration (`storage_config`, `output_s3_uri`)
+- **Anthropic** does not support file-based batch operations - use inline requests instead
+</Note>
+
+---
+
+## Next Steps
+
+- **[Overview](./overview)** - OpenAI SDK integration basics
+- **[Configuration](../../quickstart/gateway/provider-configuration)** - Bifrost setup and configuration
+- **[Core Features](../../features/)** - Governance, semantic caching, and more
--- a/docs/integrations/openai-sdk/overview.mdx
+++ b/docs/integrations/openai-sdk/overview.mdx
@@ -0,0 +1,563 @@
+---
+title: "Overview"
+description: "Use Bifrost as a drop-in replacement for OpenAI API with full compatibility and enhanced features."
+icon: "book"
+---
+
+## Overview
+
+Bifrost provides complete OpenAI API compatibility through protocol adaptation. The integration handles request transformation, response normalization, and error mapping between OpenAI's API specification and Bifrost's internal processing pipeline.
+
+This integration enables you to utilize Bifrost's features like governance, load balancing, semantic caching, multi-provider support, and more, all while preserving your existing OpenAI SDK-based architecture.
+
+**Endpoint:** `/openai`
+
+---
+
+## Setup
+
+<Tabs group="openai-sdk">
+<Tab title="Python">
+
+```python {5}
+import openai
+
+# Configure client to use Bifrost
+client = openai.OpenAI(
+    base_url="http://localhost:8080/openai",
+    api_key="dummy-key"  # Keys handled by Bifrost
+)
+
+# Make requests as usual
+response = client.chat.completions.create(
+    model="gpt-4o-mini",
+    messages=[{"role": "user", "content": "Hello!"}]
+)
+
+print(response.choices[0].message.content)
+```
+
+</Tab>
+<Tab title="JavaScript">
+
+```javascript {5}
+import OpenAI from "openai";
+
+// Configure client to use Bifrost
+const openai = new OpenAI({
+  baseURL: "http://localhost:8080/openai",
+  apiKey: "dummy-key", // Keys handled by Bifrost
+});
+
+// Make requests as usual
+const response = await openai.chat.completions.create({
+  model: "gpt-4o-mini",
+  messages: [{ role: "user", content: "Hello!" }],
+});
+
+console.log(response.choices[0].message.content);
+```
+
+</Tab>
+</Tabs>
+
+---
+
+## Provider/Model Usage Examples
+
+Use multiple providers through the same OpenAI SDK format by prefixing model names with the provider:
+
+<Tabs group="openai-sdk">
+<Tab title="Python">
+
+```python
+import openai
+
+client = openai.OpenAI(
+    base_url="http://localhost:8080/openai",
+    api_key="dummy-key"
+)
+
+# OpenAI models (default)
+openai_response = client.chat.completions.create(
+    model="gpt-4o-mini",
+    messages=[{"role": "user", "content": "Hello from OpenAI!"}]
+)
+
+# Anthropic models via OpenAI SDK format
+anthropic_response = client.chat.completions.create(
+    model="anthropic/claude-3-sonnet-20240229",
+    messages=[{"role": "user", "content": "Hello from Claude!"}]
+)
+
+# Google Vertex models via OpenAI SDK format
+vertex_response = client.chat.completions.create(
+    model="vertex/gemini-pro",
+    messages=[{"role": "user", "content": "Hello from Gemini!"}]
+)
+
+# Azure models
+azure_response = client.chat.completions.create(
+    model="azure/gpt-4o",
+    messages=[{"role": "user", "content": "Hello from Azure!"}]
+)
+
+# Local Ollama models
+ollama_response = client.chat.completions.create(
+    model="ollama/llama3.1:8b",
+    messages=[{"role": "user", "content": "Hello from Ollama!"}]
+)
+```
+
+</Tab>
+<Tab title="JavaScript">
+
+```javascript
+import OpenAI from "openai";
+
+const openai = new OpenAI({
+  baseURL: "http://localhost:8080/openai",
+  apiKey: "dummy-key",
+});
+
+// OpenAI models (default)
+const openaiResponse = await openai.chat.completions.create({
+  model: "gpt-4o-mini",
+  messages: [{ role: "user", content: "Hello from OpenAI!" }],
+});
+
+// Anthropic models via OpenAI SDK format
+const anthropicResponse = await openai.chat.completions.create({
+  model: "anthropic/claude-3-sonnet-20240229",
+  messages: [{ role: "user", content: "Hello from Claude!" }],
+});
+
+// Google Vertex models via OpenAI SDK format
+const vertexResponse = await openai.chat.completions.create({
+  model: "vertex/gemini-pro",
+  messages: [{ role: "user", content: "Hello from Gemini!" }],
+});
+
+// Azure models
+const azureResponse = await openai.chat.completions.create({
+  model: "azure/gpt-4o",
+  messages: [{ role: "user", content: "Hello from Azure!" }],
+});
+
+// Local Ollama models
+const ollamaResponse = await openai.chat.completions.create({
+  model: "ollama/llama3.1:8b",
+  messages: [{ role: "user", content: "Hello from Ollama!" }],
+});
+```
+
+</Tab>
+</Tabs>
+
+---
+
+## Adding Custom Headers
+
+Pass custom headers required by Bifrost plugins (like governance, telemetry, etc.):
+
+<Tabs group="openai-sdk">
+<Tab title="Python">
+
+```python
+import openai
+
+client = openai.OpenAI(
+    base_url="http://localhost:8080/openai",
+    api_key="dummy-key",
+    default_headers={
+        "x-bf-vk": "vk_12345",  # Virtual key for governance
+    }
+)
+
+response = client.chat.completions.create(
+    model="gpt-4o-mini",
+    messages=[{"role": "user", "content": "Hello with custom headers!"}]
+)
+```
+
+</Tab>
+<Tab title="JavaScript">
+
+```javascript
+import OpenAI from "openai";
+
+const openai = new OpenAI({
+  baseURL: "http://localhost:8080/openai",
+  apiKey: "dummy-key",
+  defaultHeaders: {
+    "x-bf-vk": "vk_12345", // Virtual key for governance
+  },
+});
+
+const response = await openai.chat.completions.create({
+  model: "gpt-4o-mini",
+  messages: [{ role: "user", content: "Hello with custom headers!" }],
+});
+```
+
+</Tab>
+</Tabs>
+
+---
+
+## Using Direct Keys
+
+Pass API keys directly in requests to bypass Bifrost's load balancing. You can pass any provider's API key (OpenAI, Anthropic, Mistral, etc.) since Bifrost only looks for `Authorization` or `x-api-key` headers. This requires the **Allow Direct API keys** option to be enabled in Bifrost configuration.
+
+> **Learn more:** See [Key Management](../../features/keys-management#direct-key-bypass) for enabling direct API key usage.
+
+<Tabs group="openai-sdk">
+<Tab title="Python">
+
+```python
+import openai
+
+# Using OpenAI's API key directly
+client_with_direct_key = openai.OpenAI(
+    base_url="http://localhost:8080/openai",
+    api_key="sk-your-openai-key"  # OpenAI's API key works
+)
+
+openai_response = client_with_direct_key.chat.completions.create(
+    model="openai/gpt-4o-mini",
+    messages=[{"role": "user", "content": "Hello from GPT!"}]
+)
+
+# Or pass different provider keys per request
+client = openai.OpenAI(
+    base_url="http://localhost:8080/openai",
+    api_key="dummy-key"
+)
+
+# Use OpenAI key for GPT models
+openai_response = client.chat.completions.create(
+    model="gpt-4o-mini",
+    messages=[{"role": "user", "content": "Hello GPT!"}],
+    extra_headers={
+        "Authorization": "Bearer sk-your-openai-key"
+    }
+)
+
+# Use Anthropic key for Claude models
+anthropic_response = client.chat.completions.create(
+    model="anthropic/claude-3-sonnet-20240229",
+    messages=[{"role": "user", "content": "Hello Claude!"}],
+    extra_headers={
+        "x-api-key": "sk-ant-your-anthropic-key"
+    }
+)
+
+# Use Gemini key for Gemini models
+gemini_response = client.chat.completions.create(
+    model="gemini/gemini-2.5-flash",
+    messages=[{"role": "user", "content": "Hello Gemini!"}],
+    extra_headers={
+        "x-goog-api-key": "sk-gemini-your-gemini-key"
+    }
+)
+```
+
+</Tab>
+<Tab title="JavaScript">
+
+```javascript
+import OpenAI from "openai";
+
+// Using OpenAI's API key directly
+const openaiWithDirectKey = new OpenAI({
+  baseURL: "http://localhost:8080/openai",
+  apiKey: "sk-your-openai-key", // OpenAI's API key works
+});
+
+const openaiResponse = await openaiWithDirectKey.chat.completions.create({
+  model: "openai/gpt-4o-mini",
+  messages: [{ role: "user", content: "Hello from GPT!" }],
+});
+
+// Or pass different provider keys per request
+const openai = new OpenAI({
+  baseURL: "http://localhost:8080/openai",
+  apiKey: "dummy-key",
+});
+
+// Use OpenAI key for GPT models
+const openaiResponse = await openai.chat.completions.create({
+  model: "gpt-4o-mini",
+  messages: [{ role: "user", content: "Hello GPT!" }],
+  headers: {
+    "Authorization": "Bearer sk-your-openai-key",
+  },
+});
+
+// Use Anthropic key for Claude models
+const anthropicResponseWithHeader = await openai.chat.completions.create({
+  model: "anthropic/claude-3-sonnet-20240229",
+  messages: [{ role: "user", content: "Hello Claude!" }],
+  headers: {
+    "x-api-key": "sk-ant-your-anthropic-key",
+  },
+});
+
+// Use Gemini key for Gemini models
+const geminiResponseWithHeader = await openai.chat.completions.create({
+  model: "gemini/gemini-2.5-flash",
+  messages: [{ role: "user", content: "Hello Gemini!" }],
+  headers: {
+    "x-goog-api-key": "sk-gemini-your-gemini-key",
+  },
+});
+```
+
+</Tab>
+</Tabs>
+
+For Azure, you can use the AzureOpenAI client and point it to Bifrost integration endpoint. The `x-bf-azure-endpoint` header is required to specify your Azure resource endpoint.
+
+<Tabs group="openai-sdk">
+<Tab title="Python">
+
+```python
+from openai import AzureOpenAI
+
+azure_client = AzureOpenAI(
+    api_key="your-azure-api-key",
+    api_version="2024-02-01",
+    azure_endpoint="http://localhost:8080/openai",  # Point to Bifrost
+    default_headers={
+        "x-bf-azure-endpoint": "https://your-resource.openai.azure.com"
+    }
+)
+
+azure_response = azure_client.chat.completions.create(
+    model="gpt-4-deployment",  # Your deployment name
+    messages=[{"role": "user", "content": "Hello from Azure!"}]
+)
+
+print(azure_response.choices[0].message.content)
+```
+
+</Tab>
+<Tab title="JavaScript">
+
+```javascript
+import { AzureOpenAI } from "openai";
+
+const azureClient = new AzureOpenAI({
+  apiKey: "your-azure-api-key",
+  apiVersion: "2024-02-01",
+  baseURL: "http://localhost:8080/openai", // Point to Bifrost
+  defaultHeaders: {
+    "x-bf-azure-endpoint": "https://your-resource.openai.azure.com"
+  }
+});
+
+const azureResponse = await azureClient.chat.completions.create({
+  model: "gpt-4-deployment", // Your deployment name
+  messages: [{ role: "user", content: "Hello from Azure!" }],
+});
+
+console.log(azureResponse.choices[0].message.content);
+```
+
+</Tab>
+</Tabs>
+
+---
+
+## Async Inference
+
+Submit inference requests asynchronously and poll for results later using the `x-bf-async` header. This is useful for long-running requests where you don't want to hold a connection open. See [Async Inference](../../features/async-inference) for full details.
+
+<Note>
+Async inference requires a [Logs Store](../../features/observability/default) to be configured and is not compatible with streaming.
+</Note>
+
+### Chat Completions
+
+<Tabs group="openai-sdk">
+<Tab title="Python">
+
+```python
+import openai
+import time
+
+client = openai.OpenAI(
+    base_url="http://localhost:8080/openai",
+    api_key="dummy-key"
+)
+
+# Submit async request
+initial = client.chat.completions.create(
+    model="openai/gpt-4o-mini",
+    messages=[{"role": "user", "content": "Tell me a short story."}],
+    extra_headers={"x-bf-async": "true"}
+)
+
+# If choices are present, the request completed synchronously
+if initial.choices:
+    print(initial.choices[0].message.content)
+else:
+    # Poll until completed
+    while True:
+        time.sleep(2)
+        poll = client.chat.completions.create(
+            model="openai/gpt-4o-mini",
+            messages=[{"role": "user", "content": "Tell me a short story."}],
+            extra_headers={"x-bf-async-id": initial.id}
+        )
+        if poll.choices:
+            print(poll.choices[0].message.content)
+            break
+```
+
+</Tab>
+<Tab title="JavaScript">
+
+```javascript
+import OpenAI from "openai";
+
+const openai = new OpenAI({
+  baseURL: "http://localhost:8080/openai",
+  apiKey: "dummy-key",
+});
+
+// Submit async request
+const initial = await openai.chat.completions.create(
+  {
+    model: "openai/gpt-4o-mini",
+    messages: [{ role: "user", content: "Tell me a short story." }],
+  },
+  { headers: { "x-bf-async": "true" } }
+);
+
+// If choices are present, the request completed synchronously
+if (initial.choices?.length > 0) {
+  console.log(initial.choices[0].message.content);
+} else {
+  // Poll until completed
+  while (true) {
+    await new Promise((r) => setTimeout(r, 2000));
+    const poll = await openai.chat.completions.create(
+      {
+        model: "openai/gpt-4o-mini",
+        messages: [{ role: "user", content: "Tell me a short story." }],
+      },
+      { headers: { "x-bf-async-id": initial.id } }
+    );
+    if (poll.choices?.length > 0) {
+      console.log(poll.choices[0].message.content);
+      break;
+    }
+  }
+}
+```
+
+</Tab>
+</Tabs>
+
+### Responses API
+
+<Tabs group="openai-sdk">
+<Tab title="Python">
+
+```python
+import openai
+import time
+
+client = openai.OpenAI(
+    base_url="http://localhost:8080/openai",
+    api_key="dummy-key"
+)
+
+# Submit async request
+initial = client.responses.create(
+    model="openai/gpt-4o-mini",
+    input="Tell me a short story.",
+    extra_headers={"x-bf-async": "true"}
+)
+
+# If status is "completed", the request completed synchronously
+if initial.status == "completed":
+    print(initial.output_text)
+else:
+    # Poll until completed
+    while True:
+        time.sleep(2)
+        poll = client.responses.create(
+            model="openai/gpt-4o-mini",
+            input="Tell me a short story.",
+            extra_headers={"x-bf-async-id": initial.id}
+        )
+        if poll.status == "completed":
+            print(poll.output_text)
+            break
+```
+
+</Tab>
+<Tab title="JavaScript">
+
+```javascript
+import OpenAI from "openai";
+
+const openai = new OpenAI({
+  baseURL: "http://localhost:8080/openai",
+  apiKey: "dummy-key",
+});
+
+// Submit async request
+const initial = await openai.responses.create(
+  { model: "openai/gpt-4o-mini", input: "Tell me a short story." },
+  { headers: { "x-bf-async": "true" } }
+);
+
+// If status is "completed", the request completed synchronously
+if (initial.status === "completed") {
+  console.log(initial.output_text);
+} else {
+  // Poll until completed
+  while (true) {
+    await new Promise((r) => setTimeout(r, 2000));
+    const poll = await openai.responses.create(
+      { model: "openai/gpt-4o-mini", input: "Tell me a short story." },
+      { headers: { "x-bf-async-id": initial.id } }
+    );
+    if (poll.status === "completed") {
+      console.log(poll.output_text);
+      break;
+    }
+  }
+}
+```
+
+</Tab>
+</Tabs>
+
+### Async Headers
+
+| Header | Description |
+|---|---|
+| `x-bf-async: true` | Submit the request as an async job. Returns immediately with a job ID. |
+| `x-bf-async-id: <job-id>` | Poll for results of a previously submitted async job. |
+| `x-bf-async-job-result-ttl: <seconds>` | Override the default result TTL (default: 3600s). |
+
+---
+
+## Supported Features
+
+The OpenAI integration supports all features that are available in both the OpenAI SDK and Bifrost core functionality. If the OpenAI SDK supports a feature and Bifrost supports it, the integration will work seamlessly.
+
+---
+
+## Next Steps
+
+- **[Files and Batch API](./files-and-batch)** - File uploads and batch processing
+- **[Anthropic SDK](../anthropic-sdk/overview)** - Claude integration patterns
+- **[Google GenAI SDK](../genai-sdk)** - Gemini integration patterns
+- **[Configuration](../../quickstart/README)** - Bifrost setup and configuration
+- **[Core Features](../../features/)** - Advanced Bifrost capabilities
+