first commit

This commit is contained in:
Beyhan Oğur
2026-04-26 21:52:23 +03:00
commit 880f412e2c
2662 changed files with 866266 additions and 0 deletions

View File

@@ -0,0 +1,669 @@
---
title: "Files and Batch API"
description: "Upload files and create batch jobs for asynchronous processing using the OpenAI SDK through Bifrost across multiple providers."
tag: "Beta"
icon: "folder-open"
---
## Overview
Bifrost supports the OpenAI Files API and Batch API with **cross-provider routing**. This means you can use the familiar OpenAI SDK to manage files and batch jobs across multiple providers including OpenAI, Anthropic, Bedrock, and Gemini.
The provider is specified using `extra_body` (for POST requests) or `extra_query` (for GET requests) parameters.
---
## Client Setup
The base client setup is the same for all providers. The provider is specified per-request:
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-api-key" # Your actual API key
)
```
---
## Files API
### Upload a File
<Note>
**Bedrock** requires S3 storage configuration. OpenAI and Gemini use their native file storage. Anthropic uses inline requests (no file upload).
</Note>
<Tabs group="provider">
<Tab title="OpenAI Provider">
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-openai-api-key"
)
# Create JSONL content for OpenAI batch format
jsonl_content = '''{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 100}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "How are you?"}], "max_tokens": 100}}'''
# Upload file (uses OpenAI's native file storage)
response = client.files.create(
file=("batch_input.jsonl", jsonl_content.encode(), "application/jsonl"),
purpose="batch",
extra_body={"provider": "openai"},
)
print(f"Uploaded file ID: {response.id}")
```
</Tab>
<Tab title="Bedrock Provider">
For Bedrock, you need to provide S3 storage configuration:
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-api-key"
)
# Create JSONL content using OpenAI-style format (Bifrost converts to Bedrock format internally)
jsonl_content = '''{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic.claude-3-sonnet-20240229-v1:0", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 100}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic.claude-3-sonnet-20240229-v1:0", "messages": [{"role": "user", "content": "How are you?"}], "max_tokens": 100}}'''
# Upload file with S3 storage configuration
response = client.files.create(
file=("batch_input.jsonl", jsonl_content.encode(), "application/jsonl"),
purpose="batch",
extra_body={
"provider": "bedrock",
"storage_config": {
"s3": {
"bucket": "your-s3-bucket",
"region": "us-west-2",
"prefix": "bifrost-batch-output",
},
},
},
)
print(f"Uploaded file ID: {response.id}")
```
</Tab>
<Tab title="Anthropic Provider">
Anthropic uses inline requests for batching (no file upload needed). See the Batch API section below.
</Tab>
<Tab title="Gemini Provider">
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-api-key"
)
# Create JSONL content using OpenAI-style format (Bifrost converts to Gemini format internally)
jsonl_content = '''{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 100}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash", "messages": [{"role": "user", "content": "How are you?"}], "max_tokens": 100}}'''
# Upload file (uses Gemini's native file storage)
response = client.files.create(
file=("batch_input.jsonl", jsonl_content.encode(), "application/jsonl"),
purpose="batch",
extra_body={"provider": "gemini"},
)
print(f"Uploaded file ID: {response.id}")
```
</Tab>
</Tabs>
### List Files
```python
# List files for OpenAI or Gemini (no S3 config needed)
response = client.files.list(
extra_query={"provider": "openai"} # or "gemini"
)
for file in response.data:
print(f"File ID: {file.id}, Name: {file.filename}")
# For Bedrock (requires S3 config)
response = client.files.list(
extra_query={
"provider": "bedrock",
"storage_config": {
"s3": {
"bucket": "your-s3-bucket",
"region": "us-west-2",
"prefix": "bifrost-batch-output",
},
},
}
)
```
### Retrieve File Metadata
```python
# Retrieve file metadata (specify provider)
file_id = "file-abc123"
response = client.files.retrieve(
file_id,
extra_query={"provider": "bedrock"} # or "openai", "gemini"
)
print(f"File ID: {response.id}")
print(f"Filename: {response.filename}")
print(f"Purpose: {response.purpose}")
print(f"Bytes: {response.bytes}")
```
### Delete a File
```python
# Delete file (specify provider)
file_id = "file-abc123"
response = client.files.delete(
file_id,
extra_query={"provider": "bedrock"} # or "openai", "gemini"
)
print(f"Deleted: {response.deleted}")
```
### Download File Content
```python
# Download file content (specify provider)
file_id = "file-abc123"
response = client.files.content(
file_id,
extra_query={"provider": "bedrock"} # or "openai", "gemini"
)
# Handle different response types
if hasattr(response, "read"):
content = response.read()
elif hasattr(response, "content"):
content = response.content
else:
content = response
# Decode bytes to string if needed
if isinstance(content, bytes):
content = content.decode("utf-8")
print(f"File content:\n{content}")
```
---
## Batch API
### Create a Batch
<Tabs group="provider">
<Tab title="OpenAI Provider">
For native OpenAI batching:
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-openai-api-key"
)
# First upload a file (see Files API section)
# Then create batch using the file ID
batch = client.batches.create(
input_file_id="file-abc123",
endpoint="/v1/chat/completions",
completion_window="24h",
extra_body={"provider": "openai"},
)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.status}")
```
</Tab>
<Tab title="Bedrock Provider">
For Bedrock, you need to provide output S3 URI:
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-api-key"
)
# First upload a file with S3 config (see Files API section)
# Then create batch using the file ID
batch = client.batches.create(
input_file_id="file-abc123",
endpoint="/v1/chat/completions",
completion_window="24h",
extra_body={
"provider": "bedrock",
"model": "anthropic.claude-3-sonnet-20240229-v1:0",
"output_s3_uri": "s3://your-bucket/batch-output",
},
)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.status}")
```
</Tab>
<Tab title="Anthropic Provider">
Anthropic supports inline requests (no file upload required):
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-anthropic-api-key"
)
# Create inline requests for Anthropic
requests = [
{
"custom_id": "request-1",
"params": {
"model": "claude-3-sonnet-20240229",
"max_tokens": 100,
"messages": [{"role": "user", "content": "Hello!"}]
}
},
{
"custom_id": "request-2",
"params": {
"model": "claude-3-sonnet-20240229",
"max_tokens": 100,
"messages": [{"role": "user", "content": "How are you?"}]
}
}
]
# Create batch with inline requests (no file ID needed)
batch = client.batches.create(
input_file_id="", # Empty for inline requests
endpoint="/v1/chat/completions",
completion_window="24h",
extra_body={
"provider": "anthropic",
"requests": requests,
},
)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.status}")
```
</Tab>
<Tab title="Gemini Provider">
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-api-key"
)
# First upload a file with Gemini format (see Files API section)
# Then create batch using the file ID
batch = client.batches.create(
input_file_id="file-abc123",
endpoint="/v1/chat/completions",
completion_window="24h",
extra_body={
"provider": "gemini",
"model": "gemini-1.5-flash",
},
)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.status}")
```
</Tab>
</Tabs>
### List Batches
```python
# List batches (specify provider)
response = client.batches.list(
limit=10,
extra_query={
"provider": "bedrock", # or "openai", "anthropic", "gemini"
"model": "anthropic.claude-3-sonnet-20240229-v1:0", # Required for bedrock
}
)
for batch in response.data:
print(f"Batch ID: {batch.id}, Status: {batch.status}")
```
### Retrieve Batch Status
```python
# Retrieve batch status (specify provider)
batch_id = "batch-abc123"
batch = client.batches.retrieve(
batch_id,
extra_query={"provider": "bedrock"} # or "openai", "anthropic", "gemini"
)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.status}")
if batch.request_counts:
print(f"Total: {batch.request_counts.total}")
print(f"Completed: {batch.request_counts.completed}")
print(f"Failed: {batch.request_counts.failed}")
```
### Cancel a Batch
```python
# Cancel batch (specify provider)
batch_id = "batch-abc123"
batch = client.batches.cancel(
batch_id,
extra_body={"provider": "bedrock"} # or "openai", "anthropic", "gemini"
)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.status}") # "cancelling" or "cancelled"
```
---
## End-to-End Workflows
### OpenAI Batch Workflow
```python
import time
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-openai-api-key"
)
# Configuration
provider = "openai"
# Step 1: Create OpenAI JSONL content
jsonl_content = '''{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "What is 2+2?"}], "max_tokens": 100}}
{"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 100}}'''
# Step 2: Upload file (uses OpenAI's native file storage)
print("Step 1: Uploading batch input file...")
uploaded_file = client.files.create(
file=("batch_e2e.jsonl", jsonl_content.encode(), "application/jsonl"),
purpose="batch",
extra_body={"provider": provider},
)
print(f" Uploaded file: {uploaded_file.id}")
# Step 3: Create batch
print("Step 2: Creating batch job...")
batch = client.batches.create(
input_file_id=uploaded_file.id,
endpoint="/v1/chat/completions",
completion_window="24h",
extra_body={"provider": provider},
)
print(f" Created batch: {batch.id}, status: {batch.status}")
# Step 4: Poll for completion
print("Step 3: Polling batch status...")
for i in range(10):
batch = client.batches.retrieve(batch.id, extra_query={"provider": provider})
print(f" Poll {i+1}: status = {batch.status}")
if batch.status in ["completed", "failed", "expired", "cancelled"]:
break
if batch.request_counts:
print(f" Completed: {batch.request_counts.completed}/{batch.request_counts.total}")
time.sleep(5)
print(f"\nSuccess! Batch {batch.id} workflow completed.")
```
### Bedrock Batch Workflow
```python
import time
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-api-key"
)
# Configuration
provider = "bedrock"
s3_bucket = "your-s3-bucket"
s3_region = "us-west-2"
model = "anthropic.claude-3-sonnet-20240229-v1:0"
# Step 1: Create JSONL content using OpenAI-style format (Bifrost converts to Bedrock format internally)
jsonl_content = '''{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic.claude-3-sonnet-20240229-v1:0", "messages": [{"role": "user", "content": "What is 2+2?"}], "max_tokens": 100}}
{"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic.claude-3-sonnet-20240229-v1:0", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 100}}'''
# Step 2: Upload file
print("Step 1: Uploading batch input file...")
uploaded_file = client.files.create(
file=("batch_e2e.jsonl", jsonl_content.encode(), "application/jsonl"),
purpose="batch",
extra_body={
"provider": provider,
"storage_config": {
"s3": {"bucket": s3_bucket, "region": s3_region, "prefix": "batch-input"},
},
},
)
print(f" Uploaded file: {uploaded_file.id}")
# Step 3: Create batch
print("Step 2: Creating batch job...")
batch = client.batches.create(
input_file_id=uploaded_file.id,
endpoint="/v1/chat/completions",
completion_window="24h",
extra_body={
"provider": provider,
"model": model,
"output_s3_uri": f"s3://{s3_bucket}/batch-output",
},
)
print(f" Created batch: {batch.id}, status: {batch.status}")
# Step 4: Poll for completion
print("Step 3: Polling batch status...")
for i in range(10):
batch = client.batches.retrieve(batch.id, extra_query={"provider": provider})
print(f" Poll {i+1}: status = {batch.status}")
if batch.status in ["completed", "failed", "expired", "cancelled"]:
break
if batch.request_counts:
print(f" Completed: {batch.request_counts.completed}/{batch.request_counts.total}")
time.sleep(5)
print(f"\nSuccess! Batch {batch.id} workflow completed.")
```
### Anthropic Inline Batch Workflow
```python
import time
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-anthropic-api-key"
)
provider = "anthropic"
# Step 1: Create inline requests
print("Step 1: Creating inline requests...")
requests = [
{
"custom_id": "math-question",
"params": {
"model": "claude-3-sonnet-20240229",
"max_tokens": 100,
"messages": [{"role": "user", "content": "What is 15 * 7?"}]
}
},
{
"custom_id": "geography-question",
"params": {
"model": "claude-3-sonnet-20240229",
"max_tokens": 100,
"messages": [{"role": "user", "content": "What is the largest ocean?"}]
}
}
]
print(f" Created {len(requests)} inline requests")
# Step 2: Create batch
print("Step 2: Creating batch job...")
batch = client.batches.create(
input_file_id="",
endpoint="/v1/chat/completions",
completion_window="24h",
extra_body={"provider": provider, "requests": requests},
)
print(f" Created batch: {batch.id}, status: {batch.status}")
# Step 3: Poll for completion
print("Step 3: Polling batch status...")
for i in range(10):
batch = client.batches.retrieve(batch.id, extra_query={"provider": provider})
print(f" Poll {i+1}: status = {batch.status}")
if batch.status in ["completed", "failed", "expired", "cancelled", "ended"]:
break
time.sleep(5)
print(f"\nSuccess! Batch {batch.id} workflow completed.")
```
### Gemini Batch Workflow
```python
import time
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-api-key"
)
# Configuration
provider = "gemini"
model = "gemini-1.5-flash"
# Step 1: Create JSONL content using OpenAI-style format (Bifrost converts to Gemini format internally)
jsonl_content = '''{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash", "messages": [{"role": "user", "content": "What is 2+2?"}], "max_tokens": 100}}
{"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 100}}'''
# Step 2: Upload file (uses Gemini's native file storage)
print("Step 1: Uploading batch input file...")
uploaded_file = client.files.create(
file=("batch_e2e.jsonl", jsonl_content.encode(), "application/jsonl"),
purpose="batch",
extra_body={"provider": provider},
)
print(f" Uploaded file: {uploaded_file.id}")
# Step 3: Create batch
print("Step 2: Creating batch job...")
batch = client.batches.create(
input_file_id=uploaded_file.id,
endpoint="/v1/chat/completions",
completion_window="24h",
extra_body={
"provider": provider,
"model": model,
},
)
print(f" Created batch: {batch.id}, status: {batch.status}")
# Step 4: Poll for completion
print("Step 3: Polling batch status...")
for i in range(10):
batch = client.batches.retrieve(batch.id, extra_query={"provider": provider})
print(f" Poll {i+1}: status = {batch.status}")
if batch.status in ["completed", "failed", "expired", "cancelled"]:
break
if batch.request_counts:
print(f" Completed: {batch.request_counts.completed}/{batch.request_counts.total}")
time.sleep(5)
print(f"\nSuccess! Batch {batch.id} workflow completed.")
```
---
## Provider-Specific Notes
| Provider | File Upload | Batch Creation | Extra Configuration |
|----------|-------------|----------------|---------------------|
| **OpenAI** | ✅ Native storage | ✅ File-based | None |
| **Bedrock** | ✅ S3-based | ✅ File-based | `storage_config`, `output_s3_uri` |
| **Anthropic** | ❌ Not supported | ✅ Inline requests | `requests` array in `extra_body` |
| **Gemini** | ✅ Native storage | ✅ File-based | `model` in `extra_body` |
<Note>
- **OpenAI** and **Gemini** use their native file storage - no S3 configuration needed
- **Bedrock** requires S3 storage configuration (`storage_config`, `output_s3_uri`)
- **Anthropic** does not support file-based batch operations - use inline requests instead
</Note>
---
## Next Steps
- **[Overview](./overview)** - OpenAI SDK integration basics
- **[Configuration](../../quickstart/gateway/provider-configuration)** - Bifrost setup and configuration
- **[Core Features](../../features/)** - Governance, semantic caching, and more

View File

@@ -0,0 +1,563 @@
---
title: "Overview"
description: "Use Bifrost as a drop-in replacement for OpenAI API with full compatibility and enhanced features."
icon: "book"
---
## Overview
Bifrost provides complete OpenAI API compatibility through protocol adaptation. The integration handles request transformation, response normalization, and error mapping between OpenAI's API specification and Bifrost's internal processing pipeline.
This integration enables you to utilize Bifrost's features like governance, load balancing, semantic caching, multi-provider support, and more, all while preserving your existing OpenAI SDK-based architecture.
**Endpoint:** `/openai`
---
## Setup
<Tabs group="openai-sdk">
<Tab title="Python">
```python {5}
import openai
# Configure client to use Bifrost
client = openai.OpenAI(
base_url="http://localhost:8080/openai",
api_key="dummy-key" # Keys handled by Bifrost
)
# Make requests as usual
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
```
</Tab>
<Tab title="JavaScript">
```javascript {5}
import OpenAI from "openai";
// Configure client to use Bifrost
const openai = new OpenAI({
baseURL: "http://localhost:8080/openai",
apiKey: "dummy-key", // Keys handled by Bifrost
});
// Make requests as usual
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);
```
</Tab>
</Tabs>
---
## Provider/Model Usage Examples
Use multiple providers through the same OpenAI SDK format by prefixing model names with the provider:
<Tabs group="openai-sdk">
<Tab title="Python">
```python
import openai
client = openai.OpenAI(
base_url="http://localhost:8080/openai",
api_key="dummy-key"
)
# OpenAI models (default)
openai_response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello from OpenAI!"}]
)
# Anthropic models via OpenAI SDK format
anthropic_response = client.chat.completions.create(
model="anthropic/claude-3-sonnet-20240229",
messages=[{"role": "user", "content": "Hello from Claude!"}]
)
# Google Vertex models via OpenAI SDK format
vertex_response = client.chat.completions.create(
model="vertex/gemini-pro",
messages=[{"role": "user", "content": "Hello from Gemini!"}]
)
# Azure models
azure_response = client.chat.completions.create(
model="azure/gpt-4o",
messages=[{"role": "user", "content": "Hello from Azure!"}]
)
# Local Ollama models
ollama_response = client.chat.completions.create(
model="ollama/llama3.1:8b",
messages=[{"role": "user", "content": "Hello from Ollama!"}]
)
```
</Tab>
<Tab title="JavaScript">
```javascript
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "http://localhost:8080/openai",
apiKey: "dummy-key",
});
// OpenAI models (default)
const openaiResponse = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Hello from OpenAI!" }],
});
// Anthropic models via OpenAI SDK format
const anthropicResponse = await openai.chat.completions.create({
model: "anthropic/claude-3-sonnet-20240229",
messages: [{ role: "user", content: "Hello from Claude!" }],
});
// Google Vertex models via OpenAI SDK format
const vertexResponse = await openai.chat.completions.create({
model: "vertex/gemini-pro",
messages: [{ role: "user", content: "Hello from Gemini!" }],
});
// Azure models
const azureResponse = await openai.chat.completions.create({
model: "azure/gpt-4o",
messages: [{ role: "user", content: "Hello from Azure!" }],
});
// Local Ollama models
const ollamaResponse = await openai.chat.completions.create({
model: "ollama/llama3.1:8b",
messages: [{ role: "user", content: "Hello from Ollama!" }],
});
```
</Tab>
</Tabs>
---
## Adding Custom Headers
Pass custom headers required by Bifrost plugins (like governance, telemetry, etc.):
<Tabs group="openai-sdk">
<Tab title="Python">
```python
import openai
client = openai.OpenAI(
base_url="http://localhost:8080/openai",
api_key="dummy-key",
default_headers={
"x-bf-vk": "vk_12345", # Virtual key for governance
}
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello with custom headers!"}]
)
```
</Tab>
<Tab title="JavaScript">
```javascript
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "http://localhost:8080/openai",
apiKey: "dummy-key",
defaultHeaders: {
"x-bf-vk": "vk_12345", // Virtual key for governance
},
});
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Hello with custom headers!" }],
});
```
</Tab>
</Tabs>
---
## Using Direct Keys
Pass API keys directly in requests to bypass Bifrost's load balancing. You can pass any provider's API key (OpenAI, Anthropic, Mistral, etc.) since Bifrost only looks for `Authorization` or `x-api-key` headers. This requires the **Allow Direct API keys** option to be enabled in Bifrost configuration.
> **Learn more:** See [Key Management](../../features/keys-management#direct-key-bypass) for enabling direct API key usage.
<Tabs group="openai-sdk">
<Tab title="Python">
```python
import openai
# Using OpenAI's API key directly
client_with_direct_key = openai.OpenAI(
base_url="http://localhost:8080/openai",
api_key="sk-your-openai-key" # OpenAI's API key works
)
openai_response = client_with_direct_key.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Hello from GPT!"}]
)
# Or pass different provider keys per request
client = openai.OpenAI(
base_url="http://localhost:8080/openai",
api_key="dummy-key"
)
# Use OpenAI key for GPT models
openai_response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello GPT!"}],
extra_headers={
"Authorization": "Bearer sk-your-openai-key"
}
)
# Use Anthropic key for Claude models
anthropic_response = client.chat.completions.create(
model="anthropic/claude-3-sonnet-20240229",
messages=[{"role": "user", "content": "Hello Claude!"}],
extra_headers={
"x-api-key": "sk-ant-your-anthropic-key"
}
)
# Use Gemini key for Gemini models
gemini_response = client.chat.completions.create(
model="gemini/gemini-2.5-flash",
messages=[{"role": "user", "content": "Hello Gemini!"}],
extra_headers={
"x-goog-api-key": "sk-gemini-your-gemini-key"
}
)
```
</Tab>
<Tab title="JavaScript">
```javascript
import OpenAI from "openai";
// Using OpenAI's API key directly
const openaiWithDirectKey = new OpenAI({
baseURL: "http://localhost:8080/openai",
apiKey: "sk-your-openai-key", // OpenAI's API key works
});
const openaiResponse = await openaiWithDirectKey.chat.completions.create({
model: "openai/gpt-4o-mini",
messages: [{ role: "user", content: "Hello from GPT!" }],
});
// Or pass different provider keys per request
const openai = new OpenAI({
baseURL: "http://localhost:8080/openai",
apiKey: "dummy-key",
});
// Use OpenAI key for GPT models
const openaiResponse = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Hello GPT!" }],
headers: {
"Authorization": "Bearer sk-your-openai-key",
},
});
// Use Anthropic key for Claude models
const anthropicResponseWithHeader = await openai.chat.completions.create({
model: "anthropic/claude-3-sonnet-20240229",
messages: [{ role: "user", content: "Hello Claude!" }],
headers: {
"x-api-key": "sk-ant-your-anthropic-key",
},
});
// Use Gemini key for Gemini models
const geminiResponseWithHeader = await openai.chat.completions.create({
model: "gemini/gemini-2.5-flash",
messages: [{ role: "user", content: "Hello Gemini!" }],
headers: {
"x-goog-api-key": "sk-gemini-your-gemini-key",
},
});
```
</Tab>
</Tabs>
For Azure, you can use the AzureOpenAI client and point it to Bifrost integration endpoint. The `x-bf-azure-endpoint` header is required to specify your Azure resource endpoint.
<Tabs group="openai-sdk">
<Tab title="Python">
```python
from openai import AzureOpenAI
azure_client = AzureOpenAI(
api_key="your-azure-api-key",
api_version="2024-02-01",
azure_endpoint="http://localhost:8080/openai", # Point to Bifrost
default_headers={
"x-bf-azure-endpoint": "https://your-resource.openai.azure.com"
}
)
azure_response = azure_client.chat.completions.create(
model="gpt-4-deployment", # Your deployment name
messages=[{"role": "user", "content": "Hello from Azure!"}]
)
print(azure_response.choices[0].message.content)
```
</Tab>
<Tab title="JavaScript">
```javascript
import { AzureOpenAI } from "openai";
const azureClient = new AzureOpenAI({
apiKey: "your-azure-api-key",
apiVersion: "2024-02-01",
baseURL: "http://localhost:8080/openai", // Point to Bifrost
defaultHeaders: {
"x-bf-azure-endpoint": "https://your-resource.openai.azure.com"
}
});
const azureResponse = await azureClient.chat.completions.create({
model: "gpt-4-deployment", // Your deployment name
messages: [{ role: "user", content: "Hello from Azure!" }],
});
console.log(azureResponse.choices[0].message.content);
```
</Tab>
</Tabs>
---
## Async Inference
Submit inference requests asynchronously and poll for results later using the `x-bf-async` header. This is useful for long-running requests where you don't want to hold a connection open. See [Async Inference](../../features/async-inference) for full details.
<Note>
Async inference requires a [Logs Store](../../features/observability/default) to be configured and is not compatible with streaming.
</Note>
### Chat Completions
<Tabs group="openai-sdk">
<Tab title="Python">
```python
import openai
import time
client = openai.OpenAI(
base_url="http://localhost:8080/openai",
api_key="dummy-key"
)
# Submit async request
initial = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Tell me a short story."}],
extra_headers={"x-bf-async": "true"}
)
# If choices are present, the request completed synchronously
if initial.choices:
print(initial.choices[0].message.content)
else:
# Poll until completed
while True:
time.sleep(2)
poll = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Tell me a short story."}],
extra_headers={"x-bf-async-id": initial.id}
)
if poll.choices:
print(poll.choices[0].message.content)
break
```
</Tab>
<Tab title="JavaScript">
```javascript
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "http://localhost:8080/openai",
apiKey: "dummy-key",
});
// Submit async request
const initial = await openai.chat.completions.create(
{
model: "openai/gpt-4o-mini",
messages: [{ role: "user", content: "Tell me a short story." }],
},
{ headers: { "x-bf-async": "true" } }
);
// If choices are present, the request completed synchronously
if (initial.choices?.length > 0) {
console.log(initial.choices[0].message.content);
} else {
// Poll until completed
while (true) {
await new Promise((r) => setTimeout(r, 2000));
const poll = await openai.chat.completions.create(
{
model: "openai/gpt-4o-mini",
messages: [{ role: "user", content: "Tell me a short story." }],
},
{ headers: { "x-bf-async-id": initial.id } }
);
if (poll.choices?.length > 0) {
console.log(poll.choices[0].message.content);
break;
}
}
}
```
</Tab>
</Tabs>
### Responses API
<Tabs group="openai-sdk">
<Tab title="Python">
```python
import openai
import time
client = openai.OpenAI(
base_url="http://localhost:8080/openai",
api_key="dummy-key"
)
# Submit async request
initial = client.responses.create(
model="openai/gpt-4o-mini",
input="Tell me a short story.",
extra_headers={"x-bf-async": "true"}
)
# If status is "completed", the request completed synchronously
if initial.status == "completed":
print(initial.output_text)
else:
# Poll until completed
while True:
time.sleep(2)
poll = client.responses.create(
model="openai/gpt-4o-mini",
input="Tell me a short story.",
extra_headers={"x-bf-async-id": initial.id}
)
if poll.status == "completed":
print(poll.output_text)
break
```
</Tab>
<Tab title="JavaScript">
```javascript
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "http://localhost:8080/openai",
apiKey: "dummy-key",
});
// Submit async request
const initial = await openai.responses.create(
{ model: "openai/gpt-4o-mini", input: "Tell me a short story." },
{ headers: { "x-bf-async": "true" } }
);
// If status is "completed", the request completed synchronously
if (initial.status === "completed") {
console.log(initial.output_text);
} else {
// Poll until completed
while (true) {
await new Promise((r) => setTimeout(r, 2000));
const poll = await openai.responses.create(
{ model: "openai/gpt-4o-mini", input: "Tell me a short story." },
{ headers: { "x-bf-async-id": initial.id } }
);
if (poll.status === "completed") {
console.log(poll.output_text);
break;
}
}
}
```
</Tab>
</Tabs>
### Async Headers
| Header | Description |
|---|---|
| `x-bf-async: true` | Submit the request as an async job. Returns immediately with a job ID. |
| `x-bf-async-id: <job-id>` | Poll for results of a previously submitted async job. |
| `x-bf-async-job-result-ttl: <seconds>` | Override the default result TTL (default: 3600s). |
---
## Supported Features
The OpenAI integration supports all features that are available in both the OpenAI SDK and Bifrost core functionality. If the OpenAI SDK supports a feature and Bifrost supports it, the integration will work seamlessly.
---
## Next Steps
- **[Files and Batch API](./files-and-batch)** - File uploads and batch processing
- **[Anthropic SDK](../anthropic-sdk/overview)** - Claude integration patterns
- **[Google GenAI SDK](../genai-sdk)** - Gemini integration patterns
- **[Configuration](../../quickstart/README)** - Bifrost setup and configuration
- **[Core Features](../../features/)** - Advanced Bifrost capabilities