first commit

This commit is contained in:
Beyhan Oğur
2026-04-26 21:52:23 +03:00
commit 880f412e2c
2662 changed files with 866266 additions and 0 deletions

View File

@@ -0,0 +1,618 @@
---
title: "Files and Batch API"
tag: "Beta"
description: "Upload files and create batch jobs for asynchronous processing using the Anthropic SDK through Bifrost across multiple providers."
icon: "folder-open"
---
## Overview
Bifrost supports the Anthropic Files API and Batch API (via the `beta` namespace) with **cross-provider routing**. This means you can use the Anthropic SDK to manage files and batch jobs across multiple providers including Anthropic, OpenAI, and Gemini.
The provider is specified using the `x-model-provider` header in `default_headers`.
<Note>
**Bedrock Limitation:** Bedrock batch operations require file-based input with S3 storage, which is not supported via the Anthropic SDK's inline batch API. For Bedrock batch operations, use the [Bedrock SDK](../bedrock-sdk/files-and-batch) directly.
</Note>
---
## Client Setup
<Note>
In API Key section, you can either send virtual key or a dummy key to escape client side validation.
</Note>
### Anthropic Provider (Default)
```python
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="virtual-key-or-dummy-key"
)
```
### Cross-Provider Client
To route requests to a different provider, set the `x-model-provider` header:
<Tabs group="provider">
<Tab title="OpenAI Provider">
```python
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="virtual-key-or-dummy-key",
default_headers={"x-model-provider": "openai"}
)
```
</Tab>
<Tab title="Bedrock Provider">
```python
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="virtual-key-or-dummy-key",
default_headers={"x-model-provider": "bedrock"}
)
```
<Warning>
Bedrock can be used for chat completions via the Anthropic SDK, but **batch operations are not supported**. Bedrock requires file-based batch input with S3 storage. Use the [Bedrock SDK](../bedrock-sdk/files-and-batch) for batch operations.
</Warning>
</Tab>
<Tab title="Gemini Provider">
```python
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="virtual-key-or-dummy-key",
default_headers={"x-model-provider": "gemini"}
)
```
</Tab>
</Tabs>
---
## Files API
The Files API is accessed through the `beta.files` namespace. Note that file support varies by provider.
### Upload a File
<Tabs group="provider">
<Tab title="Anthropic Provider">
Upload a text file for use with Anthropic:
```python
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="virtual-key-or-dummy-key"
)
# Upload a text file
text_content = b"This is a test file for Files API integration."
response = client.beta.files.upload(
file=("test_upload.txt", text_content, "text/plain"),
)
print(f"File ID: {response.id}")
print(f"Filename: {response.filename}")
```
</Tab>
<Tab title="OpenAI Provider">
Upload a JSONL file for OpenAI batch processing:
```python
import anthropic
# Client configured for OpenAI provider
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="virtual-key-or-dummy-key",
default_headers={"x-model-provider": "openai"}
)
# Create JSONL content in OpenAI batch format
jsonl_content = b'''{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 100}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "How are you?"}], "max_tokens": 100}}'''
response = client.beta.files.upload(
file=("batch_input.jsonl", jsonl_content, "application/jsonl"),
)
print(f"File ID: {response.id}")
```
</Tab>
</Tabs>
### List Files
<Tabs group="provider">
<Tab title="Anthropic Provider">
```python
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="virtual-key-or-dummy-key"
)
# List all files
response = client.beta.files.list()
for file in response.data:
print(f"File ID: {file.id}")
print(f"Filename: {file.filename}")
print(f"Size: {file.size} bytes")
print("---")
```
</Tab>
<Tab title="OpenAI Provider">
```python
import anthropic
# Client configured for OpenAI provider
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="virtual-key-or-dummy-key",
default_headers={"x-model-provider": "openai"}
)
# List all files from OpenAI
response = client.beta.files.list()
for file in response.data:
print(f"File ID: {file.id}, Name: {file.filename}")
```
</Tab>
</Tabs>
### Delete a File
```python
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="virtual-key-or-dummy-key",
default_headers={"x-model-provider": "openai"} # or omit for anthropic
)
# Delete a file
file_id = "file-abc123"
response = client.beta.files.delete(file_id)
print(f"Deleted file: {file_id}")
```
### Download File Content
Note: Anthropic only allows downloading files created by certain tools (like code execution). OpenAI allows downloading batch output files.
```python
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="virtual-key-or-dummy-key",
default_headers={"x-model-provider": "openai"}
)
# Download file content
file_id = "file-abc123"
response = client.beta.files.download(file_id)
content = response.text()
print(f"File content:\n{content}")
```
---
## Batch API
The Anthropic Batch API is accessed through `beta.messages.batches`. Anthropic's batch API uses **inline requests** rather than file uploads.
### Create a Batch with Inline Requests
<Tabs group="provider">
<Tab title="Anthropic Provider">
```python
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="virtual-key-or-dummy-key"
)
# Create batch with inline requests
batch_requests = [
{
"custom_id": "request-1",
"params": {
"model": "claude-3-sonnet-20240229",
"max_tokens": 100,
"messages": [
{"role": "user", "content": "What is 2+2?"}
]
}
},
{
"custom_id": "request-2",
"params": {
"model": "claude-3-sonnet-20240229",
"max_tokens": 100,
"messages": [
{"role": "user", "content": "What is the capital of France?"}
]
}
}
]
batch = client.beta.messages.batches.create(requests=batch_requests)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.processing_status}")
```
</Tab>
<Tab title="OpenAI Provider">
When routing to OpenAI, use OpenAI-compatible models:
```python
import anthropic
# Client configured for OpenAI provider
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="virtual-key-or-dummy-key",
default_headers={"x-model-provider": "openai"}
)
# Create batch with inline requests (using OpenAI models)
batch_requests = [
{
"custom_id": "request-1",
"params": {
"model": "gpt-4o-mini",
"max_tokens": 100,
"messages": [
{"role": "user", "content": "What is 2+2?"}
]
}
},
{
"custom_id": "request-2",
"params": {
"model": "gpt-4o-mini",
"max_tokens": 100,
"messages": [
{"role": "user", "content": "What is the capital of France?"}
]
}
}
]
batch = client.beta.messages.batches.create(requests=batch_requests)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.processing_status}")
```
</Tab>
<Tab title="Gemini Provider">
When routing to Gemini:
```python
import anthropic
# Client configured for Gemini provider
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="virtual-key-or-dummy-key",
default_headers={"x-model-provider": "gemini"}
)
# Create batch with inline requests (using Gemini models)
batch_requests = [
{
"custom_id": "request-1",
"params": {
"model": "gemini-1.5-flash",
"max_tokens": 100,
"messages": [
{"role": "user", "content": "What is 2+2?"}
]
}
},
{
"custom_id": "request-2",
"params": {
"model": "gemini-1.5-flash",
"max_tokens": 100,
"messages": [
{"role": "user", "content": "What is the capital of France?"}
]
}
}
]
batch = client.beta.messages.batches.create(requests=batch_requests)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.processing_status}")
```
</Tab>
</Tabs>
<Note>
**Bedrock Note:** Bedrock requires file-based batch creation with S3 storage. When routing to Bedrock from the Anthropic SDK, you'll need to use the Bedrock SDK directly for batch operations. See the [Bedrock SDK documentation](../bedrock-sdk/files-and-batch) for details.
</Note>
### List Batches
```python
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="virtual-key-or-dummy-key",
default_headers={"x-model-provider": "anthropic"} # or "openai", "gemini"
)
# List batches
response = client.beta.messages.batches.list(limit=10)
for batch in response.data:
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.processing_status}")
if batch.request_counts:
print(f"Processing: {batch.request_counts.processing}")
print(f"Succeeded: {batch.request_counts.succeeded}")
print(f"Errored: {batch.request_counts.errored}")
print("---")
```
### Retrieve Batch Status
```python
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="virtual-key-or-dummy-key",
default_headers={"x-model-provider": "anthropic"} # or "openai", "gemini"
)
# Retrieve batch status
batch_id = "batch-abc123"
batch = client.beta.messages.batches.retrieve(batch_id)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.processing_status}")
if batch.request_counts:
print(f"Processing: {batch.request_counts.processing}")
print(f"Succeeded: {batch.request_counts.succeeded}")
print(f"Errored: {batch.request_counts.errored}")
```
### Cancel a Batch
```python
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="virtual-key-or-dummy-key",
default_headers={"x-model-provider": "anthropic"} # or "openai", "gemini"
)
# Cancel batch
batch_id = "batch-abc123"
batch = client.beta.messages.batches.cancel(batch_id)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.processing_status}") # "canceling" or "ended"
```
### Get Batch Results
```python
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="virtual-key-or-dummy-key"
)
# Get batch results (only available after batch is completed)
batch_id = "batch-abc123"
results = client.beta.messages.batches.results(batch_id)
# Iterate over results
for result in results:
print(f"Custom ID: {result.custom_id}")
if result.result.type == "succeeded":
message = result.result.message
print(f"Response: {message.content[0].text}")
elif result.result.type == "errored":
print(f"Error: {result.result.error}")
print("---")
```
---
## End-to-End Workflows
### Anthropic Batch Workflow
```python
import time
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="virtual-key-or-dummy-key"
)
# Step 1: Create batch with inline requests
print("Step 1: Creating batch...")
batch_requests = [
{
"custom_id": "math-question",
"params": {
"model": "claude-3-sonnet-20240229",
"max_tokens": 100,
"messages": [{"role": "user", "content": "What is 15 * 7?"}]
}
},
{
"custom_id": "geography-question",
"params": {
"model": "claude-3-sonnet-20240229",
"max_tokens": 100,
"messages": [{"role": "user", "content": "What is the largest ocean?"}]
}
}
]
batch = client.beta.messages.batches.create(requests=batch_requests)
print(f" Created batch: {batch.id}, status: {batch.processing_status}")
# Step 2: Poll for completion
print("Step 2: Polling batch status...")
for i in range(20):
batch = client.beta.messages.batches.retrieve(batch.id)
print(f" Poll {i+1}: status = {batch.processing_status}")
if batch.processing_status == "ended":
print(" Batch completed!")
break
if batch.request_counts:
print(f" Processing: {batch.request_counts.processing}")
print(f" Succeeded: {batch.request_counts.succeeded}")
time.sleep(5)
# Step 3: Verify batch is in list
print("Step 3: Verifying batch in list...")
batch_list = client.beta.messages.batches.list(limit=20)
batch_ids = [b.id for b in batch_list.data]
assert batch.id in batch_ids, f"Batch {batch.id} should be in list"
print(f" Verified batch {batch.id} is in list")
# Step 4: Get results (if completed)
if batch.processing_status == "ended":
print("Step 4: Getting results...")
try:
results = client.beta.messages.batches.results(batch.id)
for result in results:
print(f" {result.custom_id}: ", end="")
if result.result.type == "succeeded":
print(result.result.message.content[0].text[:50] + "...")
else:
print(f"Error: {result.result.error}")
except Exception as e:
print(f" Results not yet available: {e}")
print(f"\nSuccess! Batch {batch.id} workflow completed.")
```
### Cross-Provider Batch Workflow (OpenAI via Anthropic SDK)
```python
import time
import anthropic
# Create client with OpenAI provider header
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="virtual-key-or-dummy-key",
default_headers={"x-model-provider": "openai"}
)
# Step 1: Create batch with OpenAI models
print("Step 1: Creating batch for OpenAI provider...")
batch_requests = [
{
"custom_id": "openai-request-1",
"params": {
"model": "gpt-4o-mini",
"max_tokens": 100,
"messages": [{"role": "user", "content": "Explain AI in one sentence."}]
}
},
{
"custom_id": "openai-request-2",
"params": {
"model": "gpt-4o-mini",
"max_tokens": 100,
"messages": [{"role": "user", "content": "What is machine learning?"}]
}
}
]
batch = client.beta.messages.batches.create(requests=batch_requests)
print(f" Created batch: {batch.id}, status: {batch.processing_status}")
# Step 2: Poll for completion
print("Step 2: Polling batch status...")
for i in range(10):
batch = client.beta.messages.batches.retrieve(batch.id)
print(f" Poll {i+1}: status = {batch.processing_status}")
if batch.processing_status in ["ended", "completed"]:
break
time.sleep(5)
print(f"\nSuccess! Cross-provider batch {batch.id} completed via Anthropic SDK.")
```
---
## Provider-Specific Notes
| Provider | Header Value | File Upload | Batch Type | Models |
|----------|--------------|-------------|------------|--------|
| **Anthropic** | `anthropic` or omit | ✅ Beta API | Inline requests | `claude-3-*` |
| **OpenAI** | `openai` | ✅ Beta API | Inline requests | `gpt-4o-*`, `gpt-4-*` |
| **Gemini** | `gemini` | ✅ Beta API | Inline requests | `gemini-1.5-*` |
| **Bedrock** | `bedrock` | ❌ Use Bedrock SDK | File-based (S3) | `anthropic.claude-*` |
---
## Next Steps
- **[Overview](./overview)** - Anthropic SDK integration basics
- **[Configuration](../../quickstart/gateway/provider-configuration)** - Bifrost setup and configuration
- **[Core Features](../../features/)** - Governance, semantic caching, and more

View File

@@ -0,0 +1,449 @@
---
title: "Overview"
description: "Use Bifrost as a drop-in replacement for Anthropic API with full compatibility and enhanced features."
icon: "book"
---
## Overview
Bifrost provides complete Anthropic API compatibility through protocol adaptation. The integration handles request transformation, response normalization, and error mapping between Anthropic's Messages API specification and Bifrost's internal processing pipeline.
This integration enables you to utilize Bifrost's features like governance, load balancing, semantic caching, multi-provider support, and more, all while preserving your existing Anthropic SDK-based architecture.
**Endpoint:** `/anthropic`
<Note>
**Enabling the beta header**: Anthropic frequently uses the `anthropic-beta` header to gate access to new features.
Clients like Vercels AI SDK use these. Bifrost will block unrecognized headers by default for security purposes.
To enable the beta header for full compatability, add `anthropic-beta` to the AllowList under Settings -> Client Settings in the UI.
</Note>
---
## Setup
<Tabs group="anthropic-sdk">
<Tab title="Python">
```python {5}
import anthropic
# Configure client to use Bifrost
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="dummy-key" # Keys handled by Bifrost
)
# Make requests as usual
response = client.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=1000,
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.content[0].text)
```
</Tab>
<Tab title="JavaScript">
```javascript {5}
import Anthropic from "@anthropic-ai/sdk";
// Configure client to use Bifrost
const anthropic = new Anthropic({
baseURL: "http://localhost:8080/anthropic",
apiKey: "dummy-key", // Keys handled by Bifrost
});
// Make requests as usual
const response = await anthropic.messages.create({
model: "claude-3-sonnet-20240229",
max_tokens: 1000,
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.content[0].text);
```
</Tab>
</Tabs>
---
## Provider/Model Usage Examples
Use multiple providers through the same Anthropic SDK format by prefixing model names with the provider:
<Tabs group="anthropic-sdk">
<Tab title="Python">
```python
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="dummy-key"
)
# Anthropic models (default)
anthropic_response = client.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=1000,
messages=[{"role": "user", "content": "Hello from Claude!"}]
)
# OpenAI models via Anthropic SDK format
openai_response = client.messages.create(
model="openai/gpt-4o-mini",
max_tokens=1000,
messages=[{"role": "user", "content": "Hello from OpenAI!"}]
)
# Google Vertex models via Anthropic SDK format
vertex_response = client.messages.create(
model="vertex/gemini-pro",
max_tokens=1000,
messages=[{"role": "user", "content": "Hello from Gemini!"}]
)
# Azure models
azure_response = client.messages.create(
model="azure/gpt-4o",
max_tokens=1000,
messages=[{"role": "user", "content": "Hello from Azure!"}]
)
# Local Ollama models
ollama_response = client.messages.create(
model="ollama/llama3.1:8b",
max_tokens=1000,
messages=[{"role": "user", "content": "Hello from Ollama!"}]
)
```
</Tab>
<Tab title="JavaScript">
```javascript
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic({
baseURL: "http://localhost:8080/anthropic",
apiKey: "dummy-key",
});
// Anthropic models (default)
const anthropicResponse = await anthropic.messages.create({
model: "claude-3-sonnet-20240229",
max_tokens: 1000,
messages: [{ role: "user", content: "Hello from Claude!" }],
});
// OpenAI models via Anthropic SDK format
const openaiResponse = await anthropic.messages.create({
model: "openai/gpt-4o-mini",
max_tokens: 1000,
messages: [{ role: "user", content: "Hello from OpenAI!" }],
});
// Google Vertex models via Anthropic SDK format
const vertexResponse = await anthropic.messages.create({
model: "vertex/gemini-pro",
max_tokens: 1000,
messages: [{ role: "user", content: "Hello from Gemini!" }],
});
// Azure models
const azureResponse = await anthropic.messages.create({
model: "azure/gpt-4o",
max_tokens: 1000,
messages: [{ role: "user", content: "Hello from Azure!" }],
});
// Local Ollama models
const ollamaResponse = await anthropic.messages.create({
model: "ollama/llama3.1:8b",
max_tokens: 1000,
messages: [{ role: "user", content: "Hello from Ollama!" }],
});
```
</Tab>
</Tabs>
---
## Adding Custom Headers
Pass custom headers required by Bifrost plugins (like governance, telemetry, etc.):
<Tabs group="anthropic-sdk">
<Tab title="Python">
```python
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="dummy-key",
default_headers={
"x-bf-vk": "vk_12345", # Virtual key for governance
}
)
response = client.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=1000,
messages=[{"role": "user", "content": "Hello with custom headers!"}]
)
```
</Tab>
<Tab title="JavaScript">
```javascript
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic({
baseURL: "http://localhost:8080/anthropic",
apiKey: "dummy-key",
defaultHeaders: {
"x-bf-vk": "vk_12345", // Virtual key for governance
},
});
const response = await anthropic.messages.create({
model: "claude-3-sonnet-20240229",
max_tokens: 1000,
messages: [{ role: "user", content: "Hello with custom headers!" }],
});
```
</Tab>
</Tabs>
---
## Using Direct Keys
Pass API keys directly in requests to bypass Bifrost's load balancing. You can pass any provider's API key (OpenAI, Anthropic, Mistral, etc.) since Bifrost only looks for `Authorization` or `x-api-key` headers. This requires the **Allow Direct API keys** option to be enabled in Bifrost configuration.
> **Learn more:** See [Key Management](../../features/keys-management#direct-key-bypass) for enabling direct API key usage.
<Tabs group="anthropic-sdk">
<Tab title="Python">
```python
import anthropic
# Using Anthropic's API key directly
client_with_direct_key = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="sk-your-anthropic-key" # Anthropic's API key works
)
anthropic_response = client_with_direct_key.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=1000,
messages=[{"role": "user", "content": "Hello from Claude!"}]
)
# or pass different provider keys per request using headers
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="dummy-key"
)
# Use Anthropic key for Claude
anthropic_response = client.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=1000,
messages=[{"role": "user", "content": "Hello Claude!"}],
extra_headers={
"x-api-key": "sk-ant-your-anthropic-key"
}
)
# Use OpenAI key for GPT models
openai_response = client.messages.create(
model="openai/gpt-4o-mini",
max_tokens=1000,
messages=[{"role": "user", "content": "Hello GPT!"}],
extra_headers={
"Authorization": "Bearer sk-your-openai-key"
}
)
```
</Tab>
<Tab title="JavaScript">
```javascript
import Anthropic from "@anthropic-ai/sdk";
// Using Anthropic's API key directly
const anthropicWithDirectKey = new Anthropic({
baseURL: "http://localhost:8080/anthropic",
apiKey: "sk-your-anthropic-key", // Anthropic's API key works
});
const anthropicResponse = await anthropicWithDirectKey.messages.create({
model: "claude-3-sonnet-20240229",
max_tokens: 1000,
messages: [{ role: "user", content: "Hello from Claude!" }],
});
// or pass different provider keys per request using headers
const anthropic = new Anthropic({
baseURL: "http://localhost:8080/anthropic",
apiKey: "dummy-key",
});
// Use Anthropic key for Claude
const anthropicResponse = await anthropic.messages.create({
model: "claude-3-sonnet-20240229",
max_tokens: 1000,
messages: [{ role: "user", content: "Hello Claude!" }],
headers: {
"x-api-key": "sk-ant-your-anthropic-key",
},
});
// Use OpenAI key for GPT models
const openaiResponseWithHeader = await anthropic.messages.create({
model: "openai/gpt-4o-mini",
max_tokens: 1000,
messages: [{ role: "user", content: "Hello GPT!" }],
headers: {
"Authorization": "Bearer sk-your-openai-key",
},
});
```
</Tab>
</Tabs>
---
## Async Inference
Submit inference requests asynchronously and poll for results later using the `x-bf-async` header. This is useful for long-running requests where you don't want to hold a connection open. See [Async Inference](../../features/async-inference) for full details.
<Note>
Async inference requires a [Logs Store](../../features/observability/default) to be configured and is not compatible with streaming.
</Note>
### Messages
<Tabs group="anthropic-sdk">
<Tab title="Python">
```python
import anthropic
import time
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="dummy-key"
)
# Submit async request
initial = client.messages.create(
model="anthropic/claude-sonnet-4-20250514",
max_tokens=256,
messages=[{"role": "user", "content": "Tell me a short story."}],
extra_headers={"x-bf-async": "true"}
)
# If content is present, the request completed synchronously
if initial.content:
print(initial.content[0].text)
else:
# Poll until completed
while True:
time.sleep(2)
poll = client.messages.create(
model="anthropic/claude-sonnet-4-20250514",
max_tokens=256,
messages=[{"role": "user", "content": "Tell me a short story."}],
extra_headers={"x-bf-async-id": initial.id}
)
if poll.content:
print(poll.content[0].text)
break
```
</Tab>
<Tab title="JavaScript">
```javascript
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic({
baseURL: "http://localhost:8080/anthropic",
apiKey: "dummy-key",
});
// Submit async request
const initial = await anthropic.messages.create(
{
model: "anthropic/claude-sonnet-4-20250514",
max_tokens: 256,
messages: [{ role: "user", content: "Tell me a short story." }],
},
{ headers: { "x-bf-async": "true" } }
);
// If content is present, the request completed synchronously
if (initial.content?.length > 0) {
console.log(initial.content[0].text);
} else {
// Poll until completed
while (true) {
await new Promise((r) => setTimeout(r, 2000));
const poll = await anthropic.messages.create(
{
model: "anthropic/claude-sonnet-4-20250514",
max_tokens: 256,
messages: [{ role: "user", content: "Tell me a short story." }],
},
{ headers: { "x-bf-async-id": initial.id } }
);
if (poll.content?.length > 0) {
console.log(poll.content[0].text);
break;
}
}
}
```
</Tab>
</Tabs>
### Async Headers
| Header | Description |
|---|---|
| `x-bf-async: true` | Submit the request as an async job. Returns immediately with a job ID. |
| `x-bf-async-id: <job-id>` | Poll for results of a previously submitted async job. |
| `x-bf-async-job-result-ttl: <seconds>` | Override the default result TTL (default: 3600s). |
---
## Supported Features
The Anthropic integration supports all features that are available in both the Anthropic SDK and Bifrost core functionality. If the Anthropic SDK supports a feature and Bifrost supports it, the integration will work seamlessly.
---
## Next Steps
- **[Files and Batch API](./files-and-batch)** - File uploads and batch processing
- **[OpenAI SDK](../openai-sdk/overview)** - GPT integration patterns
- **[Google GenAI SDK](../genai-sdk)** - Gemini integration patterns
- **[Configuration](../../quickstart/README)** - Bifrost setup and configuration
- **[Core Features](../../features/)** - Advanced Bifrost capabilities

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,269 @@
---
title: "Overview"
description: "Use Bifrost as a Bedrock-compatible gateway for the Converse and Invoke APIs, with Bifrost features on top."
icon: "book"
---
## Overview
Bifrost provides a Bedrock-compatible endpoint for the **Converse** and **Invoke** APIs via protocol adaptation. The integration handles request transformation, response normalization, and error mapping between AWS Bedrock's API specification and Bifrost's internal processing pipeline.
This integration enables you to utilize Bifrost's features like governance, load balancing, semantic caching, multi-provider support, and more, all while preserving your existing Bedrock SDK-based architecture.
**Endpoint:** `/bedrock`
## Setup
<Tabs group="bedrock-sdk">
<Tab title="Python">
```python {6}
import boto3
# Configure boto3 Bedrock client to use Bifrost
# Note: When using Bifrost keys, dummy credentials are required
# because boto3 needs credentials to sign requests, even though
# Bifrost will use its own configured keys.
client = boto3.client(
service_name="bedrock-runtime",
endpoint_url="http://localhost:8080/bedrock",
region_name="us-west-2",
aws_access_key_id="bifrost-dummy-key", # Required when using Bifrost keys
aws_secret_access_key="bifrost-dummy-secret" # Required when using Bifrost keys
)
# Make requests as usual
response = client.converse(
modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
messages=[
{
"role": "user",
"content": [{"text": "Hello!"}]
}
]
)
print(response)
```
</Tab>
</Tabs>
## Provider/Model Usage Examples
Because Bedrock itself is a multi-provider platform, you can use any Bedrock-supported model ID and still route through Bifrost. Bifrost will handle governance, observability, and other cross-cutting concerns.
```python
import boto3
import json
client = boto3.client(
service_name="bedrock-runtime",
endpoint_url="http://localhost:8080/bedrock",
region_name="us-west-2",
aws_access_key_id="bifrost-dummy-key",
aws_secret_access_key="bifrost-dummy-secret"
)
# Anthropic via Bedrock (Converse API)
anthropic_response = client.converse(
modelId="anthropic.claude-3-sonnet-20240229",
messages=[{"role": "user", "content": [{"text": "Hello from Claude!"}]}]
)
# Mistral via Bedrock (Converse API)
mistral_response = client.converse(
modelId="mistral.mistral-large-2407",
messages=[{"role": "user", "content": [{"text": "Hello from Mistral!"}]}]
)
# Mistral via Bedrock (Invoke API)
mistral_invoke_response = client.invoke_model(
modelId="mistral.mistral-large-2407",
contentType="application/json",
accept="application/json",
body=json.dumps({
"prompt": "Say hello from Mistral using Invoke API.",
"max_tokens": 50,
"temperature": 0.7
}),
)
```
---
## Adding Custom Headers
Pass custom headers required by Bifrost plugins (like governance, telemetry, etc.) using boto3's event system:
<Tabs group="bedrock-sdk">
<Tab title="Python">
```python
import boto3
def add_bifrost_headers(request, **kwargs):
"""Add custom Bifrost headers to the request before signing."""
request.headers.add_header("x-bf-vk", "vk_12345") # Virtual key for governance
request.headers.add_header("x-bf-env", "production") # Environment tag
client = boto3.client(
service_name="bedrock-runtime",
endpoint_url="http://localhost:8080/bedrock",
region_name="us-west-2",
aws_access_key_id="bifrost-dummy-key",
aws_secret_access_key="bifrost-dummy-secret"
)
# Register the header injection for all Bedrock API calls
client.meta.events.register_first(
"before-sign.bedrock-runtime.*",
add_bifrost_headers,
)
# Now make requests with custom headers
response = client.converse(
modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
messages=[{"role": "user", "content": [{"text": "Hello with custom headers!"}]}]
)
```
> **Note:** Use `register_first` to ensure headers are added before request signing. The event name format is `before-sign.<service-name>.<operation-name>`. You need to register for each API operation you plan to use (Converse, ConverseStream, InvokeModel, etc.).
</Tab>
</Tabs>
---
## Streaming Examples
### Converse Stream
Use `converse_stream` for chat-based streaming with a unified interface across models.
```python
import boto3
client = boto3.client(
service_name="bedrock-runtime",
endpoint_url="http://localhost:8080/bedrock",
region_name="us-west-2",
aws_access_key_id="bifrost-dummy-key",
aws_secret_access_key="bifrost-dummy-secret"
)
response = client.converse_stream(
modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
messages=[{"role": "user", "content": [{"text": "Tell me a story about a brave knight."}]}],
inferenceConfig={"maxTokens": 512, "temperature": 0.5}
)
print("Response:")
for chunk in response["stream"]:
if "contentBlockDelta" in chunk:
text = chunk["contentBlockDelta"]["delta"]["text"]
print(text, end="", flush=True)
```
### Invoke Stream
Use `invoke_model_with_response_stream` for model-specific streaming payloads.
```python
import boto3
import json
client = boto3.client(
service_name="bedrock-runtime",
endpoint_url="http://localhost:8080/bedrock",
region_name="us-west-2",
aws_access_key_id="bifrost-dummy-key",
aws_secret_access_key="bifrost-dummy-secret"
)
# Example for Claude 3 (Messages API format)
body = json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Write a haiku about coding."}
]
})
response = client.invoke_model_with_response_stream(
modelId="anthropic.claude-3-haiku-20240307-v1:0",
body=body,
contentType="application/json",
accept="application/json"
)
print("Response:")
for event in response.get("body"):
if "chunk" in event:
chunk = event["chunk"]
if "bytes" in chunk:
# The chunk bytes contain the model-specific JSON response
result = json.loads(chunk["bytes"].decode("utf-8"))
# Extract content based on model (e.g., Claude)
if "delta" in result and "text" in result["delta"]:
print(result["delta"]["text"], end="", flush=True)
elif "completion" in result:
print(result["completion"], end="", flush=True)
```
## Using Direct Keys
Pass AWS credentials or Bedrock API keys directly in requests to bypass Bifrost's load balancing. This requires the **Allow Direct API keys** option to be enabled in Bifrost configuration.
> **Learn more:** See [Key Management](../../features/keys-management#direct-key-bypass) for enabling direct API key usage.
When direct keys are enabled, you can pass your AWS credentials directly to the boto3 client instead of using dummy credentials.
<Tabs group="bedrock-sdk">
<Tab title="Python">
```python
import boto3
# When direct keys are enabled, pass real AWS credentials to boto3
client = boto3.client(
service_name="bedrock-runtime",
endpoint_url="http://localhost:8080/bedrock",
region_name="us-west-2",
aws_access_key_id="your-aws-access-key", # Real credentials when direct keys enabled
aws_secret_access_key="your-aws-secret-key" # Real credentials when direct keys enabled
)
response = client.converse(
modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
messages=[{"role": "user", "content": [{"text": "Hello!"}]}]
)
```
</Tab>
</Tabs>
> **Note:** When using Bifrost's configured keys (not direct keys), you must provide dummy AWS credentials (`aws_access_key_id` and `aws_secret_access_key`) to the boto3 client. This is because boto3 requires credentials to sign requests, even though Bifrost will use its own configured keys. The dummy values can be any string (e.g., `"bifrost-dummy-key"` and `"bifrost-dummy-secret"`).
---
## Supported Features
The Bedrock integration currently supports:
- **Converse** API (`/bedrock/model/{modelId}/converse`) for text/chat-style workloads
- **Invoke** API (`/bedrock/model/{modelId}/invoke`) for model-specific text completion workloads
- **Streaming** via `converse_stream` and `invoke_model_with_response_stream`
- **Tools** via `toolConfig`, `toolUse`, and `toolResult` inside Converse requests
- **Image and multimodal** responses where supported by the underlying Bedrock model
- All Bifrost core features that apply to these flows (governance, load balancing, semantic cache, observability, etc.)
---
## Next Steps
- **[Files and Batch API](./files-and-batch)** - S3-based file operations and batch processing
- **[What is an integration?](../what-is-an-integration)** - Core integration concepts
- **[Configuration](../../quickstart/gateway/provider-configuration)** - Bedrock provider setup and API key management
- **[Core Features](../../features/)** - Governance, semantic caching, and more

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,317 @@
---
title: "Overview"
description: "Use Bifrost as a drop-in replacement for Google GenAI API with full compatibility and enhanced features."
icon: "book"
---
## Overview
Bifrost provides complete Google GenAI API compatibility through protocol adaptation. The integration handles request transformation, response normalization, and error mapping between Google's GenAI API specification and Bifrost's internal processing pipeline.
This integration enables you to utilize Bifrost's features like governance, load balancing, semantic caching, multi-provider support, and more, all while preserving your existing Google GenAI SDK-based architecture.
**Endpoint:** `/genai`
---
## Setup
<Tabs group="genai-sdk">
<Tab title="Python">
```python {7}
from google import genai
from google.genai.types import HttpOptions
# Configure client to use Bifrost
client = genai.Client(
api_key="dummy-key", # Keys handled by Bifrost
http_options=HttpOptions(base_url="http://localhost:8080/genai")
)
# Make requests as usual
response = client.models.generate_content(
model="gemini-1.5-flash",
contents="Hello!"
)
print(response.text)
```
</Tab>
<Tab title="JavaScript">
```javascript {5}
import { GoogleGenerativeAI } from "@google/generative-ai";
// Configure client to use Bifrost
const genAI = new GoogleGenerativeAI("dummy-key", {
baseUrl: "http://localhost:8080/genai", // Keys handled by Bifrost
});
// Make requests as usual
const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash" });
const response = await model.generateContent("Hello!");
console.log(response.response.text());
```
</Tab>
</Tabs>
---
## Provider/Model Usage Examples
Use multiple providers through the same GenAI SDK format by prefixing model names with the provider:
<Tabs group="genai-sdk">
<Tab title="Python">
```python
from google import genai
from google.genai.types import HttpOptions
client = genai.Client(
api_key="dummy-key",
http_options=HttpOptions(base_url="http://localhost:8080/genai")
)
# Google Vertex models (default)
vertex_response = client.models.generate_content(
model="gemini-1.5-flash",
contents="Hello from Gemini!"
)
# OpenAI models via GenAI SDK format
openai_response = client.models.generate_content(
model="openai/gpt-4o-mini",
contents="Hello from OpenAI!"
)
# Anthropic models via GenAI SDK format
anthropic_response = client.models.generate_content(
model="anthropic/claude-3-sonnet-20240229",
contents="Hello from Claude!"
)
# Azure models
azure_response = client.models.generate_content(
model="azure/gpt-4o",
contents="Hello from Azure!"
)
# Local Ollama models
ollama_response = client.models.generate_content(
model="ollama/llama3.1:8b",
contents="Hello from Ollama!"
)
```
</Tab>
<Tab title="JavaScript">
```javascript
import { GoogleGenerativeAI } from "@google/generative-ai";
const genAI = new GoogleGenerativeAI("dummy-key", {
baseUrl: "http://localhost:8080/genai",
});
// Google Vertex models (default)
const geminiModel = genAI.getGenerativeModel({ model: "gemini-1.5-flash" });
const vertexResponse = await geminiModel.generateContent("Hello from Gemini!");
// OpenAI models via GenAI SDK format
const openaiModel = genAI.getGenerativeModel({ model: "openai/gpt-4o-mini" });
const openaiResponse = await openaiModel.generateContent("Hello from OpenAI!");
// Anthropic models via GenAI SDK format
const anthropicModel = genAI.getGenerativeModel({ model: "anthropic/claude-3-sonnet-20240229" });
const anthropicResponse = await anthropicModel.generateContent("Hello from Claude!");
// Azure models
const azureModel = genAI.getGenerativeModel({ model: "azure/gpt-4o" });
const azureResponse = await azureModel.generateContent("Hello from Azure!");
// Local Ollama models
const ollamaModel = genAI.getGenerativeModel({ model: "ollama/llama3.1:8b" });
const ollamaResponse = await ollamaModel.generateContent("Hello from Ollama!");
```
</Tab>
</Tabs>
---
## Adding Custom Headers
Pass custom headers required by Bifrost plugins (like governance, telemetry, etc.):
<Tabs group="genai-sdk">
<Tab title="Python">
```python
from google import genai
from google.genai.types import HttpOptions
# Configure client with custom headers
client = genai.Client(
api_key="dummy-key",
http_options=HttpOptions(
base_url="http://localhost:8080/genai",
headers={
"x-bf-vk": "vk_12345", # Virtual key for governance
}
)
)
response = client.models.generate_content(
model="gemini-1.5-flash",
contents="Hello with custom headers!"
)
```
</Tab>
<Tab title="JavaScript">
```javascript
import { GoogleGenerativeAI } from "@google/generative-ai";
// Configure client with custom headers
const genAI = new GoogleGenerativeAI("dummy-key", {
baseUrl: "http://localhost:8080/genai",
customHeaders: {
"x-bf-vk": "vk_12345", // Virtual key for governance
},
});
const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash" });
const response = await model.generateContent("Hello with custom headers!");
```
</Tab>
</Tabs>
---
## Using Direct Keys
Pass API keys directly in requests to bypass Bifrost's load balancing. You can pass any provider's API key (OpenAI, Anthropic, Mistral, etc.) since Bifrost only looks for `Authorization`, `x-api-key` and `x-goog-api-key` headers. This requires the **Allow Direct API keys** option to be enabled in Bifrost configuration.
> **Learn more:** See [Key Management](../../features/keys-management#direct-key-bypass) for enabling direct API key usage.
<Tabs group="genai-sdk">
<Tab title="Python">
```python
from google import genai
from google.genai.types import HttpOptions
# Pass different provider keys per request using headers
client = genai.Client(
api_key="gemini-key",
http_options=HttpOptions(base_url="http://localhost:8080/genai")
)
# Use Gemini key directly
gemini_response = client.models.generate_content(
model="gemini-1.5-flash",
contents="Hello Gemini!"
)
# Use Anthropic key for Claude models
anthropic_response = client.models.generate_content(
model="anthropic/claude-3-sonnet-20240229",
contents="Hello Claude!",
request_options={
"headers": {"x-api-key": "your-anthropic-api-key"}
}
)
# Use OpenAI key for GPT models
openai_response = client.models.generate_content(
model="openai/gpt-4o-mini",
contents="Hello GPT!",
request_options={
"headers": {"Authorization": "Bearer sk-your-openai-key"}
}
)
```
</Tab>
<Tab title="JavaScript">
```javascript
import { GoogleGenerativeAI } from "@google/generative-ai";
// Pass different provider keys per request using headers
const genAI = new GoogleGenerativeAI("gemini-key", {
baseUrl: "http://localhost:8080/genai",
});
// Use Gemini key directly
const geminiModel = genAI.getGenerativeModel({
model: "gemini-1.5-flash"
});
const geminiResponse = await geminiModel.generateContent("Hello Gemini!");
// Use Anthropic key for Claude models
const anthropicModel = genAI.getGenerativeModel({
model: "anthropic/claude-3-sonnet-20240229",
requestOptions: {
customHeaders: { "x-api-key": "your-anthropic-api-key" }
}
});
const anthropicResponse = await anthropicModel.generateContent("Hello Claude!");
// Use OpenAI key for GPT models
const gptModel = genAI.getGenerativeModel({
model: "openai/gpt-4o-mini",
requestOptions: {
customHeaders: { "Authorization": "Bearer sk-your-openai-key" }
}
});
const gptResponse = await gptModel.generateContent("Hello GPT!");
```
</Tab>
</Tabs>
---
## Dynamic Thinking Budget
When `thinkingConfig.thinkingBudget` is set to `-1`, Bifrost handles it differently per provider:
- **Gemini**: Preserves `-1` for native dynamic thinking support
- **Anthropic**, **Bedrock**, **Cohere**: Converts to minimum reasoning budget value (1024)
- **OpenAI**: Converts to medium reasoning effort
```python
response = client.models.glenerate_content(
model="gemini-2.5-flash",
contents="Complex reasoning task",
config={
"thinking_config": {
"include_thoughts": true,
"thinking_budget": -1 # Dynamic thinking
}
}
)
```
---
## Supported Features
The Google GenAI integration supports all features that are available in both the Google GenAI SDK and Bifrost core functionality. If the Google GenAI SDK supports a feature and Bifrost supports it, the integration will work seamlessly.
---
## Next Steps
- **[OpenAI SDK](../openai-sdk/overview)** - GPT integration patterns
- **[Configuration](../../quickstart/gateway/provider-configuration)** - Bifrost setup and configuration
- **[Core Features](../../features/)** - Advanced Bifrost capabilities

View File

@@ -0,0 +1,104 @@
---
title: "AWS Bedrock Guardrails"
description: "Integrate AWS Bedrock Guardrails with Bifrost for enterprise-grade content filtering, PII protection, prompt attack detection, and image content analysis."
icon: "aws"
---
Bifrost integrates with **Amazon Bedrock Guardrails** to provide enterprise-grade content filtering and safety features with deep AWS integration. This page covers the configuration and capabilities of the AWS Bedrock guardrail provider.
![AWS Bedrock Guardrails configuration form](/media/guardrails/bedrock-guardrails-provider-details.png)
## Capabilities
- **Content Filters**: Hate speech, insults, sexual content, violence, misconduct
- **Denied Topics**: Block specific topics or categories
- **Word Filters**: Custom profanity and sensitive word blocking
- **PII Protection**: Detect and redact 50+ PII entity types
- **Contextual Grounding**: Verify responses against source documents
- **Prompt Attack Detection**: Identify injection and jailbreak attempts
- **Image Content Support**: Analyze images in addition to text (PNG, JPEG)
## Configuration Fields
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `access_key` | string | Yes | - | AWS Access Key ID |
| `secret_key` | string | Yes | - | AWS Secret Access Key |
| `guardrail_arn` | string | Yes | - | ARN of the Bedrock guardrail |
| `guardrail_version` | string | Yes | - | Version of the guardrail (e.g., "1", "DRAFT") |
| `region` | string | Yes | - | AWS region |
## Authentication
Uses AWS SDK with static credentials:
```json
{
"access_key": "AKIAXXXXXXXXXXXXXXXXXX",
"secret_key": "your-secret-access-key",
"guardrail_arn": "arn:aws:bedrock:us-east-1:123456789:guardrail/abc123",
"guardrail_version": "1",
"region": "us-east-1"
}
```
## Supported AWS Regions
| Region Code | Region Name |
|-------------|-------------|
| `us-east-1` | US East (N. Virginia) |
| `us-east-2` | US East (Ohio) |
| `us-west-1` | US West (N. California) |
| `us-west-2` | US West (Oregon) |
| `ap-south-1` | Asia Pacific (Mumbai) |
| `ap-northeast-1` | Asia Pacific (Tokyo) |
| `ap-northeast-2` | Asia Pacific (Seoul) |
| `ap-southeast-1` | Asia Pacific (Singapore) |
| `ap-southeast-2` | Asia Pacific (Sydney) |
| `eu-central-1` | Europe (Frankfurt) |
| `eu-west-1` | Europe (Ireland) |
| `eu-west-2` | Europe (London) |
| `eu-west-3` | Europe (Paris) |
## Supported Content Types
- Text content
- Images (PNG, JPEG formats)
## Usage Metrics Returned
Bedrock guardrails return detailed usage metrics for cost tracking and monitoring:
| Metric | Description |
|--------|-------------|
| `content_policy_units` | Units consumed by content policy evaluation |
| `contextual_grounding_policy_units` | Units for grounding checks |
| `sensitive_information_policy_units` | Units for PII detection |
| `topic_policy_units` | Units for topic filtering |
| `word_policy_units` | Units for word filtering |
| `automated_reasoning_policy_units` | Units for reasoning checks |
| `content_policy_image_units` | Units for image content analysis |
## Supported PII Types
- Personal identifiers (SSN, passport, driver's license)
- Financial information (credit cards, bank accounts)
- Contact information (email, phone, address)
- Medical information (health records, insurance)
- Device identifiers (IP addresses, MAC addresses)
## Provider Capabilities Comparison
| Capability | AWS Bedrock | Azure Content Safety | GraySwan | Patronus AI |
|------------|-------------|---------------------|----------|-------------|
| PII Detection | Yes | No | No | Yes |
| Content Filtering | Yes | Yes | Yes | Yes |
| Prompt Injection | Yes | Yes | Yes | Yes |
| Hallucination Detection | No | No | No | Yes |
| Toxicity Screening | Yes | Yes | Yes | Yes |
| Custom Policies | Yes | Yes | Yes | Yes |
| Custom Natural Language Rules | No | No | Yes | No |
| Image Support | Yes | No | No | No |
| IPI Detection | No | Yes | Yes | No |
| Mutation Detection | No | No | Yes | No |
For information on configuring guardrail rules and profiles, see [Guardrails](/enterprise/guardrails).

View File

@@ -0,0 +1,80 @@
---
title: "Azure Content Safety"
description: "Integrate Azure AI Content Safety with Bifrost for multi-modal content moderation, severity-based filtering, prompt shield, and custom blocklist support."
icon: "microsoft"
---
Bifrost integrates with **Azure AI Content Safety** to provide multi-modal content moderation powered by Microsoft's advanced AI models. This page covers the configuration and capabilities of the Azure Content Safety guardrail provider.
![Azure Content Safety configuration form](/media/guardrails/azure-config-on-bifrost.png)
## Capabilities
- **Severity-Based Filtering**: 4-level severity classification (Safe, Low, Medium, High)
- **Multi-Category Detection**: Hate, sexual, violence, self-harm content
- **Prompt Shield**: Advanced jailbreak and injection detection
- **Indirect Attack Detection**: Identify hidden malicious instructions
- **Protected Material**: Detect copyrighted content (output only)
- **Custom Blocklists**: Define organization-specific blocked terms
## Configuration Fields
| Field | Type | Required | Default | Description |
| -------------------------------- | ------- | -------- | -------- | ------------------------------------------------------------ |
| `endpoint` | string | Yes | - | Azure Content Safety endpoint URL |
| `api_key` | string | Yes | - | Azure subscription key |
| `analyze_enabled` | boolean | No | true | Enable content analysis for Hate, Sexual, Violence, SelfHarm |
| `analyze_severity_threshold` | enum | No | "medium" | Severity level to trigger: `low`, `medium`, or `high` |
| `jailbreak_shield_enabled` | boolean | No | false | Enable jailbreak detection (input only) |
| `indirect_attack_shield_enabled` | boolean | No | false | Enable indirect prompt attack detection (input only) |
| `copyright_enabled` | boolean | No | false | Enable copyrighted content detection (output only) |
| `text_blocklist_enabled` | boolean | No | false | Enable custom blocklist filtering |
| `blocklist_names` | array | No | - | List of Azure blocklist names to apply |
## Collecting your API key and URL
Navigate to Azure foundry dashboard
<Frame>
<img src="/media/guardrails/azure-api-key.png" alt="Azure foundry dashboard" />
</Frame>
- Copy API key to use it in the Azure content moderation config form
- Copy project endpoint and use base URL as endpoint in the form. e.g. (`https://xxx-resource.services.ai.azure.com`)
## Severity Threshold Levels
| Threshold | Numeric Value | Behavior |
| --------- | ------------- | ----------------------------------------- |
| `low` | 2 | Most strict - blocks severity 2 and above |
| `medium` | 4 | Balanced - blocks severity 4 and above |
| `high` | 6 | Least strict - blocks only severity 6 |
## Detection Categories
- Hate and fairness
- Sexual content
- Violence
- Self-harm
<Note>
**Input-only features:** Jailbreak Shield and Indirect Attack Shield only apply to input validation. **Output-only
features:** Copyright detection only applies to output validation.
</Note>
## Provider Capabilities Comparison
| Capability | AWS Bedrock | Azure Content Safety | GraySwan | Patronus AI |
| ----------------------------- | ----------- | -------------------- | -------- | ----------- |
| PII Detection | Yes | No | No | Yes |
| Content Filtering | Yes | Yes | Yes | Yes |
| Prompt Injection | Yes | Yes | Yes | Yes |
| Hallucination Detection | No | No | No | Yes |
| Toxicity Screening | Yes | Yes | Yes | Yes |
| Custom Policies | Yes | Yes | Yes | Yes |
| Custom Natural Language Rules | No | No | Yes | No |
| Image Support | Yes | No | No | No |
| IPI Detection | No | Yes | Yes | No |
| Mutation Detection | No | No | Yes | No |
For information on configuring guardrail rules and profiles, see [Guardrails](/enterprise/guardrails).

View File

@@ -0,0 +1,70 @@
---
title: "GraySwan Cygnal"
description: "Integrate GraySwan Cygnal Monitor with Bifrost for AI safety monitoring with natural language rule definitions, violation scoring, and advanced threat detection."
icon: "shield-check"
---
Bifrost integrates with **GraySwan Cygnal Monitor** to provide AI safety monitoring with natural language rule definitions and advanced threat detection capabilities. This page covers the configuration and capabilities of the GraySwan Cygnal guardrail provider.
![GraySwan configuration form](/media/guardrails/gray-swan-config-on-bifrost.png)
## Capabilities
- **Violation Scoring**: Continuous 0-1 scale violation detection with configurable thresholds
- **Custom Natural Language Rules**: Define safety rules in plain English without code
- **Policy Management**: Use pre-built policies from GraySwan platform or create custom ones
- **Indirect Prompt Injection (IPI) Detection**: Identify hidden instructions in user inputs
- **Mutation Detection**: Detect attempts to manipulate or alter content
- **Reasoning Modes**: Choose from fast ("off"), balanced ("hybrid"), or thorough ("thinking") analysis
## Configuration Fields
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `api_key` | string | Yes | - | GraySwan API key |
| `violation_threshold` | number | No | 0.5 | Score threshold (0-1) for triggering intervention. Lower values are more strict. |
| `reasoning_mode` | enum | No | "off" | Analysis depth: `off` (fastest), `hybrid` (balanced), or `thinking` (most thorough) |
| `policy_id` | string | No | - | Single custom policy ID from GraySwan platform |
| `policy_ids` | array | No | - | Multiple policy IDs for aggregated rule evaluation |
| `rules` | object | No | - | Custom natural language rules as key-value pairs |
## Custom Rules Example
![GraySwan custom rules](/media/guardrails/gray-swan-custom-rule.png)
Rules are defined as key-value pairs where the key is the rule name and the value is a natural language description:
```json
{
"rules": {
"no_profanity": "Do not allow profanity or vulgar language",
"no_pii": "Do not allow personally identifiable information",
"professional_tone": "Ensure all responses maintain a professional tone"
}
}
```
## Detection Features
- Real-time violation scoring
- Multi-rule evaluation
- IPI attack detection
- Content mutation monitoring
- Detailed violation descriptions with rule attribution
## Provider Capabilities Comparison
| Capability | AWS Bedrock | Azure Content Safety | GraySwan | Patronus AI |
|------------|-------------|---------------------|----------|-------------|
| PII Detection | Yes | No | No | Yes |
| Content Filtering | Yes | Yes | Yes | Yes |
| Prompt Injection | Yes | Yes | Yes | Yes |
| Hallucination Detection | No | No | No | Yes |
| Toxicity Screening | Yes | Yes | Yes | Yes |
| Custom Policies | Yes | Yes | Yes | Yes |
| Custom Natural Language Rules | No | No | Yes | No |
| Image Support | Yes | No | No | No |
| IPI Detection | No | Yes | Yes | No |
| Mutation Detection | No | No | Yes | No |
For information on configuring guardrail rules and profiles, see [Guardrails](/enterprise/guardrails).

View File

@@ -0,0 +1,40 @@
---
title: "Patronus AI"
description: "Integrate Patronus AI with Bifrost for LLM security and safety including hallucination detection, PII identification, toxicity screening, and custom evaluators."
icon: "brain"
---
Bifrost integrates with **Patronus AI** to provide specialized LLM security and safety with advanced evaluation capabilities. This page covers the configuration and capabilities of the Patronus AI guardrail provider.
## Capabilities
- **Hallucination Detection**: Identify factually incorrect responses
- **PII Detection**: Comprehensive personal data identification
- **Toxicity Screening**: Multi-language toxic content detection
- **Prompt Injection Defense**: Advanced attack pattern recognition
- **Custom Evaluators**: Build organization-specific safety checks
- **Real-Time Monitoring**: Continuous safety validation
## Advanced Features
- Context-aware evaluation
- Multi-turn conversation analysis
- Custom policy templates
- Integration with existing safety workflows
## Provider Capabilities Comparison
| Capability | AWS Bedrock | Azure Content Safety | GraySwan | Patronus AI |
|------------|-------------|---------------------|----------|-------------|
| PII Detection | Yes | No | No | Yes |
| Content Filtering | Yes | Yes | Yes | Yes |
| Prompt Injection | Yes | Yes | Yes | Yes |
| Hallucination Detection | No | No | No | Yes |
| Toxicity Screening | Yes | Yes | Yes | Yes |
| Custom Policies | Yes | Yes | Yes | Yes |
| Custom Natural Language Rules | No | No | Yes | No |
| Image Support | Yes | No | No | No |
| IPI Detection | No | Yes | Yes | No |
| Mutation Detection | No | No | Yes | No |
For information on configuring guardrail rules and profiles, see [Guardrails](/enterprise/guardrails).

View File

@@ -0,0 +1,724 @@
---
title: "Langchain SDK"
description: "Use Bifrost as a drop-in proxy for Langchain applications with zero code changes."
icon: "crow"
---
Since Langchain already provides multi-provider abstraction and chaining capabilities, Bifrost adds enterprise features like governance, semantic caching, MCP tools, observability, etc, on top of your existing setup.
**Endpoint:** `/langchain`
<Warning>
**Provider Compatibility:** This integration only works for AI providers that both Langchain and Bifrost support. If you're using a provider specific to Langchain that Bifrost doesn't support (or vice versa), those requests will fail.
</Warning>
---
## Setup
<Tabs group="langchain-sdk">
<Tab title="Python">
```python {7}
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
# Configure client to use Bifrost
llm = ChatOpenAI(
model="gpt-4o-mini",
openai_api_base="http://localhost:8080/langchain", # Point to Bifrost
openai_api_key="dummy-key" # Keys managed by Bifrost
)
response = llm.invoke([HumanMessage(content="Hello!")])
print(response.content)
```
</Tab>
<Tab title="JavaScript">
```javascript {7}
import { ChatOpenAI } from "@langchain/openai";
// Configure client to use Bifrost
const llm = new ChatOpenAI({
model: "gpt-4o-mini",
configuration: {
baseURL: "http://localhost:8080/langchain", // Point to Bifrost
},
openAIApiKey: "dummy-key" // Keys managed by Bifrost
});
const response = await llm.invoke("Hello!");
console.log(response.content);
```
</Tab>
</Tabs>
---
## Provider/Model Usage Examples
Your existing Langchain provider switching works unchanged through Bifrost:
<Tabs group="langchain-sdk">
<Tab title="Python">
```python
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage
base_url = "http://localhost:8080/langchain"
# OpenAI models via Langchain
openai_llm = ChatOpenAI(
model="gpt-4o-mini",
openai_api_base=base_url
)
# Anthropic models via Langchain
anthropic_llm = ChatAnthropic(
model="claude-3-sonnet-20240229",
anthropic_api_url=base_url
)
# Google models via Langchain
google_llm = ChatGoogleGenerativeAI(
model="gemini-1.5-flash",
google_api_base=base_url
)
# All work the same way
openai_response = openai_llm.invoke([HumanMessage(content="Hello GPT!")])
anthropic_response = anthropic_llm.invoke([HumanMessage(content="Hello Claude!")])
google_response = google_llm.invoke([HumanMessage(content="Hello Gemini!")])
```
</Tab>
<Tab title="JavaScript">
```javascript
import { ChatOpenAI } from "@langchain/openai";
import { ChatAnthropic } from "@langchain/anthropic";
import { ChatGoogleGenerativeAI } from "@langchain/google-genai";
const baseURL = "http://localhost:8080/langchain";
// OpenAI models via Langchain
const openaiLlm = new ChatOpenAI({
model: "gpt-4o-mini",
configuration: { baseURL }
});
// Anthropic models via Langchain
const anthropicLlm = new ChatAnthropic({
model: "claude-3-sonnet-20240229",
clientOptions: { baseURL }
});
// Google models via Langchain
const googleLlm = new ChatGoogleGenerativeAI({
model: "gemini-1.5-flash",
baseURL
});
// All work the same way
const openaiResponse = await openaiLlm.invoke("Hello GPT!");
const anthropicResponse = await anthropicLlm.invoke("Hello Claude!");
const googleResponse = await googleLlm.invoke("Hello Gemini!");
```
</Tab>
</Tabs>
---
## Adding Custom Headers
Add Bifrost-specific headers for governance and tracking. Different LangChain provider classes support different methods for adding custom headers:
<Tabs group="langchain-sdk">
<Tab title="Python">
### ChatOpenAI
Use `default_headers` parameter for OpenAI models:
```python
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
llm = ChatOpenAI(
model="gpt-4o-mini",
openai_api_base="http://localhost:8080/langchain",
default_headers={
"x-bf-vk": "your-virtual-key",
}
)
response = llm.invoke([HumanMessage(content="Hello!")])
print(response.content)
```
### ChatAnthropic
Use `default_headers` parameter for Anthropic models:
```python
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage
llm = ChatAnthropic(
model="claude-3-sonnet-20240229",
anthropic_api_url="http://localhost:8080/langchain",
default_headers={
"x-bf-vk": "your-virtual-key", # Virtual key for governance
}
)
response = llm.invoke([HumanMessage(content="Hello!")])
print(response.content)
```
### ChatGoogleGenerativeAI
Use `additional_headers` parameter for Google/Gemini models:
```python
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage
llm = ChatGoogleGenerativeAI(
model="gemini-2.5-flash",
google_api_base="http://localhost:8080/langchain",
additional_headers={
"x-bf-vk": "your-virtual-key", # Virtual key for governance
}
)
response = llm.invoke([HumanMessage(content="Hello!")])
print(response.content)
```
### ChatBedrockConverse
For Bedrock models, there are two approaches:
**Method 1: Using the client's event system (after initialization)**
```python
from langchain_aws import ChatBedrockConverse
from langchain_core.messages import HumanMessage
llm = ChatBedrockConverse(
model="us.anthropic.claude-haiku-4-5-20251001-v1:0",
region_name="us-west-2",
endpoint_url="http://localhost:8080/langchain",
aws_access_key_id="dummy-access-key",
aws_secret_access_key="dummy-secret-key",
max_tokens=2000
)
def add_bifrost_headers(request, **kwargs):
"""Add custom headers to Bedrock requests"""
request.headers.add_header("x-bf-vk", "your-virtual-key")
# Register header injection for all Bedrock operations
llm.client.meta.events.register_first(
"before-sign.bedrock-runtime.*",
add_bifrost_headers
)
response = llm.invoke([HumanMessage(content="Hello!")])
print(response.content)
```
**Method 2: Pre-configuring a boto3 client**
```python
from langchain_aws import ChatBedrockConverse
from langchain_core.messages import HumanMessage
import boto3
# Create and configure boto3 client
bedrock_client = boto3.client(
service_name="bedrock-runtime",
region_name="us-west-2",
endpoint_url="http://localhost:8080/langchain",
aws_access_key_id="dummy-access-key",
aws_secret_access_key="dummy-secret-key"
)
def add_bifrost_headers(request, **kwargs):
"""Add custom headers to Bedrock requests"""
request.headers.add_header("x-bf-vk", "your-virtual-key")
# Register header injection before creating LLM
bedrock_client.meta.events.register_first(
"before-sign.bedrock-runtime.*",
add_bifrost_headers
)
# Pass the configured client to ChatBedrockConverse
llm = ChatBedrockConverse(
model="us.anthropic.claude-haiku-4-5-20251001-v1:0",
client=bedrock_client,
max_tokens=2000
)
response = llm.invoke([HumanMessage(content="Hello!")])
print(response.content)
```
</Tab>
<Tab title="JavaScript">
### ChatOpenAI
Use `defaultHeaders` in configuration for OpenAI models:
```javascript
import { ChatOpenAI } from "@langchain/openai";
const llm = new ChatOpenAI({
model: "gpt-4o-mini",
configuration: {
baseURL: "http://localhost:8080/langchain",
defaultHeaders: {
"x-bf-vk": "your-virtual-key", // Virtual key for governance
}
}
});
const response = await llm.invoke("Hello!");
console.log(response.content);
```
### ChatAnthropic
Use `defaultHeaders` in clientOptions for Anthropic models:
```javascript
import { ChatAnthropic } from "@langchain/anthropic";
const llm = new ChatAnthropic({
model: "claude-3-sonnet-20240229",
clientOptions: {
baseURL: "http://localhost:8080/langchain",
defaultHeaders: {
"x-bf-vk": "your-virtual-key", // Virtual key for governance
}
}
});
const response = await llm.invoke("Hello!");
console.log(response.content);
```
### ChatGoogleGenerativeAI
Use `additionalHeaders` for Google/Gemini models:
```javascript
import { ChatGoogleGenerativeAI } from "@langchain/google-genai";
const llm = new ChatGoogleGenerativeAI({
model: "gemini-2.5-flash",
baseURL: "http://localhost:8080/langchain",
additionalHeaders: {
"x-bf-vk": "your-virtual-key", // Virtual key for governance
}
});
const response = await llm.invoke("Hello!");
console.log(response.content);
```
</Tab>
</Tabs>
---
## Using Direct Keys
Pass API keys directly to bypass Bifrost's key management. You can pass any provider's API key since Bifrost only looks for `Authorization` or `x-api-key` headers. This requires the **Allow Direct API keys** option to be enabled in Bifrost configuration.
> **Learn more:** See [Key Management](../features/keys-management#direct-key-bypass) for enabling direct API key usage.
<Tabs group="langchain-sdk">
<Tab title="Python">
```python
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage
# Using OpenAI key directly
openai_llm = ChatOpenAI(
model="gpt-4o-mini",
openai_api_base="http://localhost:8080/langchain",
default_headers={
"Authorization": "Bearer sk-your-openai-key"
}
)
# Using Anthropic key for Claude models
anthropic_llm = ChatAnthropic(
model="claude-3-sonnet-20240229",
anthropic_api_url="http://localhost:8080/langchain",
default_headers={
"x-api-key": "sk-ant-your-anthropic-key"
}
)
# Using Azure with direct Azure key
from langchain_openai import AzureChatOpenAI
azure_llm = AzureChatOpenAI(
deployment_name="gpt-4o-aug",
api_key="your-azure-api-key",
azure_endpoint="http://localhost:8080/langchain",
api_version="2024-05-01-preview",
max_tokens=100,
default_headers={
"x-bf-azure-endpoint": "https://your-resource.openai.azure.com",
}
)
openai_response = openai_llm.invoke([HumanMessage(content="Hello GPT!")])
anthropic_response = anthropic_llm.invoke([HumanMessage(content="Hello Claude!")])
azure_response = azure_llm.invoke([HumanMessage(content="Hello from Azure!")])
```
</Tab>
<Tab title="JavaScript">
```javascript
import { ChatOpenAI } from "@langchain/openai";
import { ChatAnthropic } from "@langchain/anthropic";
// Using OpenAI key directly
const openaiLlm = new ChatOpenAI({
model: "gpt-4o-mini",
configuration: {
baseURL: "http://localhost:8080/langchain",
defaultHeaders: {
"Authorization": "Bearer sk-your-openai-key"
}
}
});
// Using Anthropic key for Claude models
const anthropicLlm = new ChatAnthropic({
model: "claude-3-sonnet-20240229",
clientOptions: {
baseURL: "http://localhost:8080/langchain",
defaultHeaders: {
"x-api-key": "sk-ant-your-anthropic-key"
}
}
});
// Using Azure with direct Azure key
import { AzureChatOpenAI } from "@langchain/openai";
const azureLlm = new AzureChatOpenAI({
deploymentName: "gpt-4o-aug",
apiKey: "your-azure-api-key",
azureOpenAIEndpoint: "http://localhost:8080/langchain",
apiVersion: "2024-05-01-preview",
maxTokens: 100,
configuration: {
defaultHeaders: {
"x-bf-azure-endpoint": "https://your-resource.openai.azure.com",
}
}
});
const openaiResponse = await openaiLlm.invoke("Hello GPT!");
const anthropicResponse = await anthropicLlm.invoke("Hello Claude!");
const azureResponse = await azureLlm.invoke("Hello from Azure!");
```
</Tab>
</Tabs>
---
## Reasoning/Thinking Models
Control extended reasoning capabilities for models that support thinking/reasoning modes.
### Azure OpenAI Models
For Azure OpenAI reasoning models, use `ChatOpenAI` with the `reasoning` parameter and Azure-specific headers:
<Tabs group="langchain-sdk">
<Tab title="Python">
```python
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
# Azure OpenAI with reasoning control
llm = ChatOpenAI(
model="azure/gpt-5.1", # Azure deployment name
base_url="http://localhost:8080/langchain",
api_key="dummy-key",
reasoning={
"effort": "high", # "minimal" | "low" | "medium" | "high"
"summary": "detailed" # "auto" | "concise" | "detailed"
},
default_headers={
"authorization": "Bearer your-azure-api-key",
"x-bf-azure-endpoint": "https://your-resource.openai.azure.com"
}
)
response = llm.invoke([HumanMessage(content="Solve this complex problem...")])
```
</Tab>
<Tab title="JavaScript">
```javascript
import { ChatOpenAI } from "@langchain/openai";
// Azure OpenAI with reasoning control
const llm = new ChatOpenAI({
model: "azure/gpt-5.1", // Azure deployment name
configuration: {
baseURL: "http://localhost:8080/langchain",
defaultHeaders: {
"authorization": "Bearer your-azure-api-key",
"x-bf-azure-endpoint": "https://your-resource.openai.azure.com"
}
},
openAIApiKey: "dummy-key",
reasoning: {
effort: "high",
summary: "detailed"
}
});
const response = await llm.invoke("Solve this complex problem...");
```
</Tab>
</Tabs>
### OpenAI Models
For OpenAI reasoning models, use `ChatOpenAI` with the `reasoning` parameter:
<Tabs group="langchain-sdk">
<Tab title="Python">
```python
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
# OpenAI with reasoning control
llm = ChatOpenAI(
model="gpt-5",
base_url="http://localhost:8080/langchain",
api_key="dummy-key",
max_tokens=2000,
reasoning={
"effort": "high",
"summary": "detailed"
}
)
response = llm.invoke([HumanMessage(content="Solve this complex problem...")])
```
</Tab>
<Tab title="JavaScript">
```javascript
import { ChatOpenAI } from "@langchain/openai";
const llm = new ChatOpenAI({
model: "gpt-5",
configuration: {
baseURL: "http://localhost:8080/langchain"
},
openAIApiKey: "dummy-key",
reasoning: {
effort: "high",
summary: "detailed"
}
});
const response = await llm.invoke("Solve this complex problem...");
```
</Tab>
</Tabs>
### Bedrock Models (Anthropic & Nova)
Both Anthropic Claude and Amazon Nova models support reasoning/thinking capabilities via Bedrock. Use `ChatBedrockConverse` with model-specific configuration formats.
<Tabs group="langchain-sdk">
<Tab title="Python">
#### Anthropic Claude Models
```python
from langchain_aws import ChatBedrockConverse
from langchain_core.messages import HumanMessage
# Bedrock Claude with reasoning control
llm = ChatBedrockConverse(
model="us.anthropic.claude-opus-4-5-20251101-v1:0",
region_name="dummy-region",
endpoint_url="http://localhost:8080/langchain",
aws_access_key_id="dummy-access-key",
aws_secret_access_key="dummy-secret-key",
max_tokens=2000,
additional_model_request_fields={ # Anthropic format
"reasoning_config": {
"type": "enabled",
"budget_tokens": 1500, # Control thinking token budget
}
}
)
response = llm.invoke([HumanMessage(content="Reason through this problem...")])
```
#### Amazon Nova Models
```python
from langchain_aws import ChatBedrockConverse
from langchain_core.messages import HumanMessage
# Bedrock Nova with reasoning control
llm = ChatBedrockConverse(
model="global.amazon.nova-2-lite-v1:0",
region_name="dummy-region",
endpoint_url="http://localhost:8080/langchain",
aws_access_key_id="dummy-access-key",
aws_secret_access_key="dummy-secret-key",
max_tokens=2000,
additional_model_request_fields={ # Nova format
"reasoningConfig": {
"type": "enabled",
"maxReasoningEffort": "high", # "low" | "medium" | "high"
}
}
)
response = llm.invoke([HumanMessage(content="Reason through this problem...")])
```
</Tab>
</Tabs>
<Note>
**Model-Specific Configuration:**
- **Anthropic Claude models** use `reasoning_config` (snake_case) with `budget_tokens` to control the token budget for reasoning
- **Amazon Nova models** use `reasoningConfig` (camelCase) with `maxReasoningEffort` to control reasoning intensity ("low", "medium", "high")
</Note>
### Google/Vertex AI Models
For Google Gemini 2.5 models (Pro, Flash) and Gemini 3, use `ChatGoogleGenerativeAI` with the `thinking_budget` parameter:
<Tabs group="langchain-sdk">
<Tab title="Python">
```python
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage
# Gemini with thinking budget control
llm = ChatGoogleGenerativeAI(
model="gemini/gemini-2.5-flash", # or "vertex/gemini-2.5-flash"
base_url="http://localhost:8080/langchain",
api_key="dummy-key",
max_tokens=4000,
thinking_budget=1024, # 0=disable, -1=dynamic, >0=constrained token budget
include_thoughts=True, # Include reasoning in response
)
response = llm.invoke([HumanMessage(content="Reason through this problem...")])
```
</Tab>
</Tabs>
<Warning>
**Experimental Module:** `ChatGoogleGenerativeAI` is a recently released module that deprecates `ChatVertexAI`. It may have some issues or breaking changes. If you encounter problems, you can use `ChatAnthropic` with `model="gemini/..."` or `model="vertex/..."` as an alternative, which provides stable access to Gemini and Vertex AI models through Bifrost.
</Warning>
---
## Embeddings
LangChain's `OpenAIEmbeddings` class can be used to generate embeddings through Bifrost:
```python
from langchain_openai import OpenAIEmbeddings
# Create embeddings instance
embeddings = OpenAIEmbeddings(
model="text-embedding-3-small",
base_url="http://localhost:8080/langchain",
api_key="dummy-key"
)
# Embed a single query
query_embedding = embeddings.embed_query("What is machine learning?")
# Embed multiple documents
doc_embeddings = embeddings.embed_documents([
"Machine learning is a subset of AI",
"Deep learning uses neural networks",
"NLP helps computers understand text"
])
```
<Warning>
**Provider Compatibility Limitation:** LangChain's `OpenAIEmbeddings` class converts text to int array before sending to the API. While OpenAI's API supports both text strings and int arrays as input, other providers like Cohere, Bedrock, and Gemini only accept text strings.
**This means `OpenAIEmbeddings` only works reliably with OpenAI embedding models.** Using it with other providers (e.g., `model="cohere/embed-v4.0"`) will fail because those providers cannot process int array inputs.
</Warning>
### Cross-Provider Embeddings
For embedding models from other providers (Cohere, Bedrock, Gemini, etc.), you can use `GoogleGenerativeAIEmbeddings` from the `langchain_google_genai` package. This module sends text strings directly and works across multiple providers:
```python
from langchain_google_genai import GoogleGenerativeAIEmbeddings
# Works with any provider's embedding models
embeddings = GoogleGenerativeAIEmbeddings(
model="cohere/cohere-embed-v4.0", # or bedrock/..., gemini/..., etc.
base_url="http://localhost:8080/langchain",
api_key="dummy-key"
)
query_embedding = embeddings.embed_query("What is machine learning?")
doc_embeddings = embeddings.embed_documents([
"Machine learning is a subset of AI",
"Deep learning uses neural networks"
])
```
---
## Supported Features
The Langchain integration supports all features that are available in both the Langchain SDK and Bifrost core functionality. Your existing Langchain chains and workflows work seamlessly with Bifrost's enterprise features. 😄
---
## Next Steps
- **[Governance Features](../features/governance)** - Virtual keys and team management
- **[Semantic Caching](../features/semantic-caching)** - Intelligent response caching
- **[Configuration](../quickstart/README)** - Provider setup and API key management

View File

@@ -0,0 +1,180 @@
---
title: "LiteLLM SDK"
description: "Use Bifrost as a drop-in proxy for LiteLLM applications with zero code changes."
icon: "train"
---
Since LiteLLM already provides multi-provider abstraction, Bifrost adds enterprise features like governance, semantic caching, MCP tools, observability, etc, on top of your existing setup.
**Endpoint:** `/litellm`
<Warning>
**Provider Compatibility:** This integration only works for AI providers that both LiteLLM and Bifrost support. If you're using a provider specific to LiteLLM that Bifrost doesn't support (or vice versa), those requests will fail.
</Warning>
---
## Setup
<Tabs group="litellm-sdk">
<Tab title="Python">
```python {7}
from litellm import completion
# Configure client to use Bifrost
response = completion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}],
base_url="http://localhost:8080/litellm" # Point to Bifrost
)
print(response.choices[0].message.content)
```
</Tab>
</Tabs>
---
## Provider/Model Usage Examples
Your existing LiteLLM provider switching works unchanged through Bifrost:
<Tabs group="litellm-sdk">
<Tab title="Python">
```python {4}
from litellm import completion
# All your existing LiteLLM patterns work the same
base_url = "http://localhost:8080/litellm"
# OpenAI models
openai_response = completion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello GPT!"}],
base_url=base_url
)
# Anthropic models
anthropic_response = completion(
model="claude-3-sonnet-20240229",
messages=[{"role": "user", "content": "Hello Claude!"}],
base_url=base_url
)
# Google models
google_response = completion(
model="gemini/gemini-1.5-flash",
messages=[{"role": "user", "content": "Hello Gemini!"}],
base_url=base_url
)
# Azure models
azure_response = completion(
model="azure/gpt-4o",
messages=[{"role": "user", "content": "Hello Azure!"}],
base_url=base_url
)
```
</Tab>
</Tabs>
---
## Adding Custom Headers
Add Bifrost-specific headers for governance and tracking:
<Tabs group="litellm-sdk">
<Tab title="Python">
```python
from litellm import completion
# Add custom headers for Bifrost features
response = completion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}],
base_url="http://localhost:8080/litellm",
extra_headers={
"x-bf-vk": "your-virtual-key", # Virtual key for governance
}
)
print(response.choices[0].message.content)
```
</Tab>
</Tabs>
---
## Using Direct Keys
Pass API keys directly to bypass Bifrost's key management. You can pass any provider's API key since Bifrost only looks for `Authorization` or `x-api-key` headers. This requires the **Allow Direct API keys** option to be enabled in Bifrost configuration.
> **Learn more:** See [Key Management](../features/keys-management#direct-key-bypass) for enabling direct API key usage.
<Tabs group="litellm-sdk">
<Tab title="Python">
```python
from litellm import completion
# Using OpenAI key directly
openai_response = completion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello GPT!"}],
base_url="http://localhost:8080/litellm",
extra_headers={
"Authorization": "Bearer sk-your-openai-key"
}
)
# Using Anthropic key for Claude models
anthropic_response = completion(
model="claude-3-sonnet-20240229",
messages=[{"role": "user", "content": "Hello Claude!"}],
base_url="http://localhost:8080/litellm",
extra_headers={
"x-api-key": "sk-ant-your-anthropic-key"
}
)
# Using Azure with direct Azure key
import os
deployment = os.getenv("AZURE_OPENAI_DEPLOYMENT", "my-azure-deployment")
model = f"azure/{deployment}"
azure_response = completion(
model=model,
messages=[{"role": "user", "content": "Hello from LiteLLM (Azure demo)!"}],
base_url="http://localhost:8080/litellm",
api_key=os.getenv("AZURE_API_KEY", "your-azure-api-key"),
deployment_id=os.getenv("AZURE_OPENAI_DEPLOYMENT", "gpt-4o-aug"),
max_tokens=100,
extra_headers={
"x-bf-azure-endpoint": "https://your-resource.openai.azure.com",
}
)
```
</Tab>
</Tabs>
---
## Supported Features
The LiteLLM integration supports all features that are available in both the LiteLLM SDK and Bifrost core functionality. Your existing LiteLLM code works seamlessly with Bifrost's enterprise features. 😄
---
## Next Steps
- **[Governance Features](../features/governance)** - Virtual keys and team management
- **[Semantic Caching](../features/semantic-caching)** - Intelligent response caching
- **[Configuration](../quickstart/README)** - Provider setup and API key management

View File

@@ -0,0 +1,669 @@
---
title: "Files and Batch API"
description: "Upload files and create batch jobs for asynchronous processing using the OpenAI SDK through Bifrost across multiple providers."
tag: "Beta"
icon: "folder-open"
---
## Overview
Bifrost supports the OpenAI Files API and Batch API with **cross-provider routing**. This means you can use the familiar OpenAI SDK to manage files and batch jobs across multiple providers including OpenAI, Anthropic, Bedrock, and Gemini.
The provider is specified using `extra_body` (for POST requests) or `extra_query` (for GET requests) parameters.
---
## Client Setup
The base client setup is the same for all providers. The provider is specified per-request:
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-api-key" # Your actual API key
)
```
---
## Files API
### Upload a File
<Note>
**Bedrock** requires S3 storage configuration. OpenAI and Gemini use their native file storage. Anthropic uses inline requests (no file upload).
</Note>
<Tabs group="provider">
<Tab title="OpenAI Provider">
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-openai-api-key"
)
# Create JSONL content for OpenAI batch format
jsonl_content = '''{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 100}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "How are you?"}], "max_tokens": 100}}'''
# Upload file (uses OpenAI's native file storage)
response = client.files.create(
file=("batch_input.jsonl", jsonl_content.encode(), "application/jsonl"),
purpose="batch",
extra_body={"provider": "openai"},
)
print(f"Uploaded file ID: {response.id}")
```
</Tab>
<Tab title="Bedrock Provider">
For Bedrock, you need to provide S3 storage configuration:
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-api-key"
)
# Create JSONL content using OpenAI-style format (Bifrost converts to Bedrock format internally)
jsonl_content = '''{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic.claude-3-sonnet-20240229-v1:0", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 100}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic.claude-3-sonnet-20240229-v1:0", "messages": [{"role": "user", "content": "How are you?"}], "max_tokens": 100}}'''
# Upload file with S3 storage configuration
response = client.files.create(
file=("batch_input.jsonl", jsonl_content.encode(), "application/jsonl"),
purpose="batch",
extra_body={
"provider": "bedrock",
"storage_config": {
"s3": {
"bucket": "your-s3-bucket",
"region": "us-west-2",
"prefix": "bifrost-batch-output",
},
},
},
)
print(f"Uploaded file ID: {response.id}")
```
</Tab>
<Tab title="Anthropic Provider">
Anthropic uses inline requests for batching (no file upload needed). See the Batch API section below.
</Tab>
<Tab title="Gemini Provider">
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-api-key"
)
# Create JSONL content using OpenAI-style format (Bifrost converts to Gemini format internally)
jsonl_content = '''{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 100}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash", "messages": [{"role": "user", "content": "How are you?"}], "max_tokens": 100}}'''
# Upload file (uses Gemini's native file storage)
response = client.files.create(
file=("batch_input.jsonl", jsonl_content.encode(), "application/jsonl"),
purpose="batch",
extra_body={"provider": "gemini"},
)
print(f"Uploaded file ID: {response.id}")
```
</Tab>
</Tabs>
### List Files
```python
# List files for OpenAI or Gemini (no S3 config needed)
response = client.files.list(
extra_query={"provider": "openai"} # or "gemini"
)
for file in response.data:
print(f"File ID: {file.id}, Name: {file.filename}")
# For Bedrock (requires S3 config)
response = client.files.list(
extra_query={
"provider": "bedrock",
"storage_config": {
"s3": {
"bucket": "your-s3-bucket",
"region": "us-west-2",
"prefix": "bifrost-batch-output",
},
},
}
)
```
### Retrieve File Metadata
```python
# Retrieve file metadata (specify provider)
file_id = "file-abc123"
response = client.files.retrieve(
file_id,
extra_query={"provider": "bedrock"} # or "openai", "gemini"
)
print(f"File ID: {response.id}")
print(f"Filename: {response.filename}")
print(f"Purpose: {response.purpose}")
print(f"Bytes: {response.bytes}")
```
### Delete a File
```python
# Delete file (specify provider)
file_id = "file-abc123"
response = client.files.delete(
file_id,
extra_query={"provider": "bedrock"} # or "openai", "gemini"
)
print(f"Deleted: {response.deleted}")
```
### Download File Content
```python
# Download file content (specify provider)
file_id = "file-abc123"
response = client.files.content(
file_id,
extra_query={"provider": "bedrock"} # or "openai", "gemini"
)
# Handle different response types
if hasattr(response, "read"):
content = response.read()
elif hasattr(response, "content"):
content = response.content
else:
content = response
# Decode bytes to string if needed
if isinstance(content, bytes):
content = content.decode("utf-8")
print(f"File content:\n{content}")
```
---
## Batch API
### Create a Batch
<Tabs group="provider">
<Tab title="OpenAI Provider">
For native OpenAI batching:
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-openai-api-key"
)
# First upload a file (see Files API section)
# Then create batch using the file ID
batch = client.batches.create(
input_file_id="file-abc123",
endpoint="/v1/chat/completions",
completion_window="24h",
extra_body={"provider": "openai"},
)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.status}")
```
</Tab>
<Tab title="Bedrock Provider">
For Bedrock, you need to provide output S3 URI:
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-api-key"
)
# First upload a file with S3 config (see Files API section)
# Then create batch using the file ID
batch = client.batches.create(
input_file_id="file-abc123",
endpoint="/v1/chat/completions",
completion_window="24h",
extra_body={
"provider": "bedrock",
"model": "anthropic.claude-3-sonnet-20240229-v1:0",
"output_s3_uri": "s3://your-bucket/batch-output",
},
)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.status}")
```
</Tab>
<Tab title="Anthropic Provider">
Anthropic supports inline requests (no file upload required):
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-anthropic-api-key"
)
# Create inline requests for Anthropic
requests = [
{
"custom_id": "request-1",
"params": {
"model": "claude-3-sonnet-20240229",
"max_tokens": 100,
"messages": [{"role": "user", "content": "Hello!"}]
}
},
{
"custom_id": "request-2",
"params": {
"model": "claude-3-sonnet-20240229",
"max_tokens": 100,
"messages": [{"role": "user", "content": "How are you?"}]
}
}
]
# Create batch with inline requests (no file ID needed)
batch = client.batches.create(
input_file_id="", # Empty for inline requests
endpoint="/v1/chat/completions",
completion_window="24h",
extra_body={
"provider": "anthropic",
"requests": requests,
},
)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.status}")
```
</Tab>
<Tab title="Gemini Provider">
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-api-key"
)
# First upload a file with Gemini format (see Files API section)
# Then create batch using the file ID
batch = client.batches.create(
input_file_id="file-abc123",
endpoint="/v1/chat/completions",
completion_window="24h",
extra_body={
"provider": "gemini",
"model": "gemini-1.5-flash",
},
)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.status}")
```
</Tab>
</Tabs>
### List Batches
```python
# List batches (specify provider)
response = client.batches.list(
limit=10,
extra_query={
"provider": "bedrock", # or "openai", "anthropic", "gemini"
"model": "anthropic.claude-3-sonnet-20240229-v1:0", # Required for bedrock
}
)
for batch in response.data:
print(f"Batch ID: {batch.id}, Status: {batch.status}")
```
### Retrieve Batch Status
```python
# Retrieve batch status (specify provider)
batch_id = "batch-abc123"
batch = client.batches.retrieve(
batch_id,
extra_query={"provider": "bedrock"} # or "openai", "anthropic", "gemini"
)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.status}")
if batch.request_counts:
print(f"Total: {batch.request_counts.total}")
print(f"Completed: {batch.request_counts.completed}")
print(f"Failed: {batch.request_counts.failed}")
```
### Cancel a Batch
```python
# Cancel batch (specify provider)
batch_id = "batch-abc123"
batch = client.batches.cancel(
batch_id,
extra_body={"provider": "bedrock"} # or "openai", "anthropic", "gemini"
)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.status}") # "cancelling" or "cancelled"
```
---
## End-to-End Workflows
### OpenAI Batch Workflow
```python
import time
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-openai-api-key"
)
# Configuration
provider = "openai"
# Step 1: Create OpenAI JSONL content
jsonl_content = '''{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "What is 2+2?"}], "max_tokens": 100}}
{"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 100}}'''
# Step 2: Upload file (uses OpenAI's native file storage)
print("Step 1: Uploading batch input file...")
uploaded_file = client.files.create(
file=("batch_e2e.jsonl", jsonl_content.encode(), "application/jsonl"),
purpose="batch",
extra_body={"provider": provider},
)
print(f" Uploaded file: {uploaded_file.id}")
# Step 3: Create batch
print("Step 2: Creating batch job...")
batch = client.batches.create(
input_file_id=uploaded_file.id,
endpoint="/v1/chat/completions",
completion_window="24h",
extra_body={"provider": provider},
)
print(f" Created batch: {batch.id}, status: {batch.status}")
# Step 4: Poll for completion
print("Step 3: Polling batch status...")
for i in range(10):
batch = client.batches.retrieve(batch.id, extra_query={"provider": provider})
print(f" Poll {i+1}: status = {batch.status}")
if batch.status in ["completed", "failed", "expired", "cancelled"]:
break
if batch.request_counts:
print(f" Completed: {batch.request_counts.completed}/{batch.request_counts.total}")
time.sleep(5)
print(f"\nSuccess! Batch {batch.id} workflow completed.")
```
### Bedrock Batch Workflow
```python
import time
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-api-key"
)
# Configuration
provider = "bedrock"
s3_bucket = "your-s3-bucket"
s3_region = "us-west-2"
model = "anthropic.claude-3-sonnet-20240229-v1:0"
# Step 1: Create JSONL content using OpenAI-style format (Bifrost converts to Bedrock format internally)
jsonl_content = '''{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic.claude-3-sonnet-20240229-v1:0", "messages": [{"role": "user", "content": "What is 2+2?"}], "max_tokens": 100}}
{"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic.claude-3-sonnet-20240229-v1:0", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 100}}'''
# Step 2: Upload file
print("Step 1: Uploading batch input file...")
uploaded_file = client.files.create(
file=("batch_e2e.jsonl", jsonl_content.encode(), "application/jsonl"),
purpose="batch",
extra_body={
"provider": provider,
"storage_config": {
"s3": {"bucket": s3_bucket, "region": s3_region, "prefix": "batch-input"},
},
},
)
print(f" Uploaded file: {uploaded_file.id}")
# Step 3: Create batch
print("Step 2: Creating batch job...")
batch = client.batches.create(
input_file_id=uploaded_file.id,
endpoint="/v1/chat/completions",
completion_window="24h",
extra_body={
"provider": provider,
"model": model,
"output_s3_uri": f"s3://{s3_bucket}/batch-output",
},
)
print(f" Created batch: {batch.id}, status: {batch.status}")
# Step 4: Poll for completion
print("Step 3: Polling batch status...")
for i in range(10):
batch = client.batches.retrieve(batch.id, extra_query={"provider": provider})
print(f" Poll {i+1}: status = {batch.status}")
if batch.status in ["completed", "failed", "expired", "cancelled"]:
break
if batch.request_counts:
print(f" Completed: {batch.request_counts.completed}/{batch.request_counts.total}")
time.sleep(5)
print(f"\nSuccess! Batch {batch.id} workflow completed.")
```
### Anthropic Inline Batch Workflow
```python
import time
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-anthropic-api-key"
)
provider = "anthropic"
# Step 1: Create inline requests
print("Step 1: Creating inline requests...")
requests = [
{
"custom_id": "math-question",
"params": {
"model": "claude-3-sonnet-20240229",
"max_tokens": 100,
"messages": [{"role": "user", "content": "What is 15 * 7?"}]
}
},
{
"custom_id": "geography-question",
"params": {
"model": "claude-3-sonnet-20240229",
"max_tokens": 100,
"messages": [{"role": "user", "content": "What is the largest ocean?"}]
}
}
]
print(f" Created {len(requests)} inline requests")
# Step 2: Create batch
print("Step 2: Creating batch job...")
batch = client.batches.create(
input_file_id="",
endpoint="/v1/chat/completions",
completion_window="24h",
extra_body={"provider": provider, "requests": requests},
)
print(f" Created batch: {batch.id}, status: {batch.status}")
# Step 3: Poll for completion
print("Step 3: Polling batch status...")
for i in range(10):
batch = client.batches.retrieve(batch.id, extra_query={"provider": provider})
print(f" Poll {i+1}: status = {batch.status}")
if batch.status in ["completed", "failed", "expired", "cancelled", "ended"]:
break
time.sleep(5)
print(f"\nSuccess! Batch {batch.id} workflow completed.")
```
### Gemini Batch Workflow
```python
import time
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/openai",
api_key="your-api-key"
)
# Configuration
provider = "gemini"
model = "gemini-1.5-flash"
# Step 1: Create JSONL content using OpenAI-style format (Bifrost converts to Gemini format internally)
jsonl_content = '''{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash", "messages": [{"role": "user", "content": "What is 2+2?"}], "max_tokens": 100}}
{"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 100}}'''
# Step 2: Upload file (uses Gemini's native file storage)
print("Step 1: Uploading batch input file...")
uploaded_file = client.files.create(
file=("batch_e2e.jsonl", jsonl_content.encode(), "application/jsonl"),
purpose="batch",
extra_body={"provider": provider},
)
print(f" Uploaded file: {uploaded_file.id}")
# Step 3: Create batch
print("Step 2: Creating batch job...")
batch = client.batches.create(
input_file_id=uploaded_file.id,
endpoint="/v1/chat/completions",
completion_window="24h",
extra_body={
"provider": provider,
"model": model,
},
)
print(f" Created batch: {batch.id}, status: {batch.status}")
# Step 4: Poll for completion
print("Step 3: Polling batch status...")
for i in range(10):
batch = client.batches.retrieve(batch.id, extra_query={"provider": provider})
print(f" Poll {i+1}: status = {batch.status}")
if batch.status in ["completed", "failed", "expired", "cancelled"]:
break
if batch.request_counts:
print(f" Completed: {batch.request_counts.completed}/{batch.request_counts.total}")
time.sleep(5)
print(f"\nSuccess! Batch {batch.id} workflow completed.")
```
---
## Provider-Specific Notes
| Provider | File Upload | Batch Creation | Extra Configuration |
|----------|-------------|----------------|---------------------|
| **OpenAI** | ✅ Native storage | ✅ File-based | None |
| **Bedrock** | ✅ S3-based | ✅ File-based | `storage_config`, `output_s3_uri` |
| **Anthropic** | ❌ Not supported | ✅ Inline requests | `requests` array in `extra_body` |
| **Gemini** | ✅ Native storage | ✅ File-based | `model` in `extra_body` |
<Note>
- **OpenAI** and **Gemini** use their native file storage - no S3 configuration needed
- **Bedrock** requires S3 storage configuration (`storage_config`, `output_s3_uri`)
- **Anthropic** does not support file-based batch operations - use inline requests instead
</Note>
---
## Next Steps
- **[Overview](./overview)** - OpenAI SDK integration basics
- **[Configuration](../../quickstart/gateway/provider-configuration)** - Bifrost setup and configuration
- **[Core Features](../../features/)** - Governance, semantic caching, and more

View File

@@ -0,0 +1,563 @@
---
title: "Overview"
description: "Use Bifrost as a drop-in replacement for OpenAI API with full compatibility and enhanced features."
icon: "book"
---
## Overview
Bifrost provides complete OpenAI API compatibility through protocol adaptation. The integration handles request transformation, response normalization, and error mapping between OpenAI's API specification and Bifrost's internal processing pipeline.
This integration enables you to utilize Bifrost's features like governance, load balancing, semantic caching, multi-provider support, and more, all while preserving your existing OpenAI SDK-based architecture.
**Endpoint:** `/openai`
---
## Setup
<Tabs group="openai-sdk">
<Tab title="Python">
```python {5}
import openai
# Configure client to use Bifrost
client = openai.OpenAI(
base_url="http://localhost:8080/openai",
api_key="dummy-key" # Keys handled by Bifrost
)
# Make requests as usual
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
```
</Tab>
<Tab title="JavaScript">
```javascript {5}
import OpenAI from "openai";
// Configure client to use Bifrost
const openai = new OpenAI({
baseURL: "http://localhost:8080/openai",
apiKey: "dummy-key", // Keys handled by Bifrost
});
// Make requests as usual
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);
```
</Tab>
</Tabs>
---
## Provider/Model Usage Examples
Use multiple providers through the same OpenAI SDK format by prefixing model names with the provider:
<Tabs group="openai-sdk">
<Tab title="Python">
```python
import openai
client = openai.OpenAI(
base_url="http://localhost:8080/openai",
api_key="dummy-key"
)
# OpenAI models (default)
openai_response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello from OpenAI!"}]
)
# Anthropic models via OpenAI SDK format
anthropic_response = client.chat.completions.create(
model="anthropic/claude-3-sonnet-20240229",
messages=[{"role": "user", "content": "Hello from Claude!"}]
)
# Google Vertex models via OpenAI SDK format
vertex_response = client.chat.completions.create(
model="vertex/gemini-pro",
messages=[{"role": "user", "content": "Hello from Gemini!"}]
)
# Azure models
azure_response = client.chat.completions.create(
model="azure/gpt-4o",
messages=[{"role": "user", "content": "Hello from Azure!"}]
)
# Local Ollama models
ollama_response = client.chat.completions.create(
model="ollama/llama3.1:8b",
messages=[{"role": "user", "content": "Hello from Ollama!"}]
)
```
</Tab>
<Tab title="JavaScript">
```javascript
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "http://localhost:8080/openai",
apiKey: "dummy-key",
});
// OpenAI models (default)
const openaiResponse = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Hello from OpenAI!" }],
});
// Anthropic models via OpenAI SDK format
const anthropicResponse = await openai.chat.completions.create({
model: "anthropic/claude-3-sonnet-20240229",
messages: [{ role: "user", content: "Hello from Claude!" }],
});
// Google Vertex models via OpenAI SDK format
const vertexResponse = await openai.chat.completions.create({
model: "vertex/gemini-pro",
messages: [{ role: "user", content: "Hello from Gemini!" }],
});
// Azure models
const azureResponse = await openai.chat.completions.create({
model: "azure/gpt-4o",
messages: [{ role: "user", content: "Hello from Azure!" }],
});
// Local Ollama models
const ollamaResponse = await openai.chat.completions.create({
model: "ollama/llama3.1:8b",
messages: [{ role: "user", content: "Hello from Ollama!" }],
});
```
</Tab>
</Tabs>
---
## Adding Custom Headers
Pass custom headers required by Bifrost plugins (like governance, telemetry, etc.):
<Tabs group="openai-sdk">
<Tab title="Python">
```python
import openai
client = openai.OpenAI(
base_url="http://localhost:8080/openai",
api_key="dummy-key",
default_headers={
"x-bf-vk": "vk_12345", # Virtual key for governance
}
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello with custom headers!"}]
)
```
</Tab>
<Tab title="JavaScript">
```javascript
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "http://localhost:8080/openai",
apiKey: "dummy-key",
defaultHeaders: {
"x-bf-vk": "vk_12345", // Virtual key for governance
},
});
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Hello with custom headers!" }],
});
```
</Tab>
</Tabs>
---
## Using Direct Keys
Pass API keys directly in requests to bypass Bifrost's load balancing. You can pass any provider's API key (OpenAI, Anthropic, Mistral, etc.) since Bifrost only looks for `Authorization` or `x-api-key` headers. This requires the **Allow Direct API keys** option to be enabled in Bifrost configuration.
> **Learn more:** See [Key Management](../../features/keys-management#direct-key-bypass) for enabling direct API key usage.
<Tabs group="openai-sdk">
<Tab title="Python">
```python
import openai
# Using OpenAI's API key directly
client_with_direct_key = openai.OpenAI(
base_url="http://localhost:8080/openai",
api_key="sk-your-openai-key" # OpenAI's API key works
)
openai_response = client_with_direct_key.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Hello from GPT!"}]
)
# Or pass different provider keys per request
client = openai.OpenAI(
base_url="http://localhost:8080/openai",
api_key="dummy-key"
)
# Use OpenAI key for GPT models
openai_response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello GPT!"}],
extra_headers={
"Authorization": "Bearer sk-your-openai-key"
}
)
# Use Anthropic key for Claude models
anthropic_response = client.chat.completions.create(
model="anthropic/claude-3-sonnet-20240229",
messages=[{"role": "user", "content": "Hello Claude!"}],
extra_headers={
"x-api-key": "sk-ant-your-anthropic-key"
}
)
# Use Gemini key for Gemini models
gemini_response = client.chat.completions.create(
model="gemini/gemini-2.5-flash",
messages=[{"role": "user", "content": "Hello Gemini!"}],
extra_headers={
"x-goog-api-key": "sk-gemini-your-gemini-key"
}
)
```
</Tab>
<Tab title="JavaScript">
```javascript
import OpenAI from "openai";
// Using OpenAI's API key directly
const openaiWithDirectKey = new OpenAI({
baseURL: "http://localhost:8080/openai",
apiKey: "sk-your-openai-key", // OpenAI's API key works
});
const openaiResponse = await openaiWithDirectKey.chat.completions.create({
model: "openai/gpt-4o-mini",
messages: [{ role: "user", content: "Hello from GPT!" }],
});
// Or pass different provider keys per request
const openai = new OpenAI({
baseURL: "http://localhost:8080/openai",
apiKey: "dummy-key",
});
// Use OpenAI key for GPT models
const openaiResponse = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Hello GPT!" }],
headers: {
"Authorization": "Bearer sk-your-openai-key",
},
});
// Use Anthropic key for Claude models
const anthropicResponseWithHeader = await openai.chat.completions.create({
model: "anthropic/claude-3-sonnet-20240229",
messages: [{ role: "user", content: "Hello Claude!" }],
headers: {
"x-api-key": "sk-ant-your-anthropic-key",
},
});
// Use Gemini key for Gemini models
const geminiResponseWithHeader = await openai.chat.completions.create({
model: "gemini/gemini-2.5-flash",
messages: [{ role: "user", content: "Hello Gemini!" }],
headers: {
"x-goog-api-key": "sk-gemini-your-gemini-key",
},
});
```
</Tab>
</Tabs>
For Azure, you can use the AzureOpenAI client and point it to Bifrost integration endpoint. The `x-bf-azure-endpoint` header is required to specify your Azure resource endpoint.
<Tabs group="openai-sdk">
<Tab title="Python">
```python
from openai import AzureOpenAI
azure_client = AzureOpenAI(
api_key="your-azure-api-key",
api_version="2024-02-01",
azure_endpoint="http://localhost:8080/openai", # Point to Bifrost
default_headers={
"x-bf-azure-endpoint": "https://your-resource.openai.azure.com"
}
)
azure_response = azure_client.chat.completions.create(
model="gpt-4-deployment", # Your deployment name
messages=[{"role": "user", "content": "Hello from Azure!"}]
)
print(azure_response.choices[0].message.content)
```
</Tab>
<Tab title="JavaScript">
```javascript
import { AzureOpenAI } from "openai";
const azureClient = new AzureOpenAI({
apiKey: "your-azure-api-key",
apiVersion: "2024-02-01",
baseURL: "http://localhost:8080/openai", // Point to Bifrost
defaultHeaders: {
"x-bf-azure-endpoint": "https://your-resource.openai.azure.com"
}
});
const azureResponse = await azureClient.chat.completions.create({
model: "gpt-4-deployment", // Your deployment name
messages: [{ role: "user", content: "Hello from Azure!" }],
});
console.log(azureResponse.choices[0].message.content);
```
</Tab>
</Tabs>
---
## Async Inference
Submit inference requests asynchronously and poll for results later using the `x-bf-async` header. This is useful for long-running requests where you don't want to hold a connection open. See [Async Inference](../../features/async-inference) for full details.
<Note>
Async inference requires a [Logs Store](../../features/observability/default) to be configured and is not compatible with streaming.
</Note>
### Chat Completions
<Tabs group="openai-sdk">
<Tab title="Python">
```python
import openai
import time
client = openai.OpenAI(
base_url="http://localhost:8080/openai",
api_key="dummy-key"
)
# Submit async request
initial = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Tell me a short story."}],
extra_headers={"x-bf-async": "true"}
)
# If choices are present, the request completed synchronously
if initial.choices:
print(initial.choices[0].message.content)
else:
# Poll until completed
while True:
time.sleep(2)
poll = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Tell me a short story."}],
extra_headers={"x-bf-async-id": initial.id}
)
if poll.choices:
print(poll.choices[0].message.content)
break
```
</Tab>
<Tab title="JavaScript">
```javascript
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "http://localhost:8080/openai",
apiKey: "dummy-key",
});
// Submit async request
const initial = await openai.chat.completions.create(
{
model: "openai/gpt-4o-mini",
messages: [{ role: "user", content: "Tell me a short story." }],
},
{ headers: { "x-bf-async": "true" } }
);
// If choices are present, the request completed synchronously
if (initial.choices?.length > 0) {
console.log(initial.choices[0].message.content);
} else {
// Poll until completed
while (true) {
await new Promise((r) => setTimeout(r, 2000));
const poll = await openai.chat.completions.create(
{
model: "openai/gpt-4o-mini",
messages: [{ role: "user", content: "Tell me a short story." }],
},
{ headers: { "x-bf-async-id": initial.id } }
);
if (poll.choices?.length > 0) {
console.log(poll.choices[0].message.content);
break;
}
}
}
```
</Tab>
</Tabs>
### Responses API
<Tabs group="openai-sdk">
<Tab title="Python">
```python
import openai
import time
client = openai.OpenAI(
base_url="http://localhost:8080/openai",
api_key="dummy-key"
)
# Submit async request
initial = client.responses.create(
model="openai/gpt-4o-mini",
input="Tell me a short story.",
extra_headers={"x-bf-async": "true"}
)
# If status is "completed", the request completed synchronously
if initial.status == "completed":
print(initial.output_text)
else:
# Poll until completed
while True:
time.sleep(2)
poll = client.responses.create(
model="openai/gpt-4o-mini",
input="Tell me a short story.",
extra_headers={"x-bf-async-id": initial.id}
)
if poll.status == "completed":
print(poll.output_text)
break
```
</Tab>
<Tab title="JavaScript">
```javascript
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "http://localhost:8080/openai",
apiKey: "dummy-key",
});
// Submit async request
const initial = await openai.responses.create(
{ model: "openai/gpt-4o-mini", input: "Tell me a short story." },
{ headers: { "x-bf-async": "true" } }
);
// If status is "completed", the request completed synchronously
if (initial.status === "completed") {
console.log(initial.output_text);
} else {
// Poll until completed
while (true) {
await new Promise((r) => setTimeout(r, 2000));
const poll = await openai.responses.create(
{ model: "openai/gpt-4o-mini", input: "Tell me a short story." },
{ headers: { "x-bf-async-id": initial.id } }
);
if (poll.status === "completed") {
console.log(poll.output_text);
break;
}
}
}
```
</Tab>
</Tabs>
### Async Headers
| Header | Description |
|---|---|
| `x-bf-async: true` | Submit the request as an async job. Returns immediately with a job ID. |
| `x-bf-async-id: <job-id>` | Poll for results of a previously submitted async job. |
| `x-bf-async-job-result-ttl: <seconds>` | Override the default result TTL (default: 3600s). |
---
## Supported Features
The OpenAI integration supports all features that are available in both the OpenAI SDK and Bifrost core functionality. If the OpenAI SDK supports a feature and Bifrost supports it, the integration will work seamlessly.
---
## Next Steps
- **[Files and Batch API](./files-and-batch)** - File uploads and batch processing
- **[Anthropic SDK](../anthropic-sdk/overview)** - Claude integration patterns
- **[Google GenAI SDK](../genai-sdk)** - Gemini integration patterns
- **[Configuration](../../quickstart/README)** - Bifrost setup and configuration
- **[Core Features](../../features/)** - Advanced Bifrost capabilities

View File

@@ -0,0 +1,298 @@
---
title: "Passthrough"
description: "Forward provider-native requests through Bifrost with full core pipeline processing, including logs and observability."
icon: "route"
---
## Overview
Passthrough integrations let you call provider-native API paths and payloads through Bifrost without route-level request/response conversion.
When you use passthrough endpoints, the request still flows through Bifrost core logic. You keep Bifrost features such as logging and observability while sending provider-native paths and bodies.
---
## Endpoints
- `/openai_passthrough`
Default provider: `openai`
- `/anthropic_passthrough`
Default provider: `anthropic`
- `/azure_passthrough`
Default provider: `azure`
- `/genai_passthrough`
Default provider: `gemini` (with automatic Vertex detection for clients configured to use Vertex)
---
## How It Works
1. Send your request to a passthrough endpoint (OpenAI, Anthropic, Azure, or GenAI passthrough).
2. The integration strips the passthrough prefix and forwards the remaining provider-native path/body.
3. Bifrost handles provider execution through core inference and plugin pipelines.
4. Response status, headers, and body are returned as passthrough output (for both stream and non-stream requests).
---
## Provider Selection Rules
### OpenAI Passthrough
- Uses `openai` as the default provider.
### Anthropic Passthrough
- Uses `anthropic` as the default provider.
### Azure Passthrough
- Uses `azure` as the default provider.
- Requires an Azure key with `endpoint` configured. `api-version` is injected automatically:
- **Key config `api_version`** takes priority (consistent with how auth is handled).
- Falls back to any `api-version` the client supplied in the query string.
### GenAI Passthrough
- Uses `gemini` by default.
- Automatically switches to `vertex` when Vertex patterns are detected, such as:
- URL path containing `/projects/{PROJECT_ID}/locations/{LOCATION}/`
- Request body `model` containing a Vertex resource path
- OAuth token pattern typically used for Vertex (`Bearer ya29...`)
---
## Usage Examples
### OpenAI Passthrough
<Tabs group="openai-passthrough">
<Tab title="Python SDK">
```python
import openai
client = openai.OpenAI(
base_url="http://localhost:8080/openai_passthrough/v1",
api_key="dummy-key"
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "hello from passthrough"}]
)
print(response.choices[0].message.content)
```
</Tab>
<Tab title="cURL">
```bash
curl -X POST "http://localhost:8080/openai_passthrough/v1/chat/completions" \
-H "content-type: application/json" \
-H "authorization: Bearer sk-your-openai-key" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role":"user","content":"hello from passthrough"}]
}'
```
</Tab>
</Tabs>
### Anthropic Passthrough
<Tabs group="anthropic-passthrough">
<Tab title="Python SDK">
```python
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic_passthrough",
api_key="dummy-key"
)
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "hello from passthrough"}]
)
print(response.content[0].text)
```
</Tab>
<Tab title="cURL">
```bash
curl -X POST "http://localhost:8080/anthropic_passthrough/v1/messages" \
-H "content-type: application/json" \
-H "x-api-key: your-anthropic-key" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role":"user","content":"hello from passthrough"}]
}'
```
</Tab>
</Tabs>
### Azure Passthrough
<Tabs group="azure-passthrough">
<Tab title="Azure OpenAI SDK">
```python
from openai import AzureOpenAI
client = AzureOpenAI(
azure_endpoint="http://localhost:8080/azure_passthrough",
api_key="dummy-key",
api_version="2024-10-21", # overridden by key config api_version if set
)
response = client.chat.completions.create(
model="gpt-4o", # your Azure deployment name
messages=[{"role": "user", "content": "hello from azure passthrough"}]
)
print(response.choices[0].message.content)
```
</Tab>
<Tab title="OpenAI SDK">
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/azure_passthrough/openai/v1/",
api_key="dummy-key",
)
response = client.responses.create(
model="gpt-4.1", # your Azure deployment name
input="hello from azure passthrough",
)
print(response.output_text)
```
</Tab>
<Tab title="Anthropic SDK (Anthropic on Azure)">
```python
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:8080/azure_passthrough",
api_key="dummy-key",
)
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "hello from azure passthrough"}]
)
print(response.content[0].text)
```
</Tab>
<Tab title="cURL">
```bash
curl -X POST "http://localhost:8080/azure_passthrough/openai/deployments/gpt-4o/chat/completions" \
-H "content-type: application/json" \
-d '{
"messages": [{"role": "user", "content": "hello from azure passthrough"}]
}'
```
</Tab>
</Tabs>
### GenAI Passthrough (Gemini)
<Tabs group="genai-passthrough">
<Tab title="Python SDK">
```python
from google import genai
from google.genai.types import HttpOptions
client = genai.Client(
api_key="dummy-key",
http_options=HttpOptions(base_url="http://localhost:8080/genai_passthrough")
)
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="hello from passthrough"
)
print(response.text)
```
</Tab>
<Tab title="cURL">
```bash
curl -X POST "http://localhost:8080/genai_passthrough/v1beta/models/gemini-2.5-flash:generateContent" \
-H "content-type: application/json" \
-H "x-goog-api-key: your-gemini-key" \
-d '{
"contents":[{"parts":[{"text":"hello from passthrough"}]}]
}'
```
</Tab>
</Tabs>
### GenAI Passthrough (Vertex-style request)
<Tabs group="vertex-passthrough">
<Tab title="Python SDK">
```python
from google import genai
from google.genai.types import HttpOptions
client = genai.Client(
vertexai=True,
api_key="dummy-key",
http_options=HttpOptions(base_url="http://localhost:8080/genai_passthrough")
)
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="hello from vertex passthrough"
)
print(response.text)
```
</Tab>
<Tab title="cURL">
```bash
curl -X POST "http://localhost:8080/genai_passthrough/v1/projects/my-project/locations/us-central1/publishers/google/models/gemini-2.5-flash:generateContent" \
-H "content-type: application/json" \
-H "authorization: Bearer ya29.your-vertex-token" \
-d '{
"contents":[{"parts":[{"text":"hello from vertex passthrough"}]}]
}'
```
</Tab>
</Tabs>
---
## Notes
- Use passthrough when you need a provider endpoint that is not directly supported by Bifrost integration routes yet.
- For Azure passthrough, auth headers (`api-key`, `x-api-key`, OAuth token) are always sourced from the Bifrost key config and never forwarded from the client request.

View File

@@ -0,0 +1,409 @@
---
title: "Pydantic AI SDK"
description: "Use Bifrost as a drop-in proxy for Pydantic AI agents with zero code changes."
icon: "triangle"
---
Pydantic AI is a Python agent framework that brings FastAPI-like ergonomics to GenAI development. Since Pydantic AI uses standard provider SDKs under the hood, Bifrost adds enterprise features like governance, semantic caching, MCP tools, observability, etc, on top of your existing agent setup.
**Endpoint:** `/pydanticai`
<Warning>
**Provider Compatibility:** This integration only works for AI providers that both Pydantic AI and Bifrost support. Currently supported: OpenAI, Anthropic, and Google Gemini.
</Warning>
---
## Setup
<Tabs group="pydanticai-sdk">
<Tab title="Python">
```python {7-8}
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.providers.openai import OpenAIProvider
# Configure provider to use Bifrost
provider = OpenAIProvider(
base_url="http://localhost:8080/pydanticai/v1", # Point to Bifrost
api_key="dummy-key" # Keys managed by Bifrost, Or add virtual key
)
model = OpenAIChatModel("gpt-4o-mini", provider=provider)
# Create agent with Bifrost-routed model
agent = Agent(model, instructions="Be concise and helpful.")
result = agent.run_sync("Hello! How are you?")
print(result.output)
```
</Tab>
</Tabs>
---
## Provider/Model Usage Examples
Your existing Pydantic AI provider switching works unchanged through Bifrost:
<Tabs group="pydanticai-sdk">
<Tab title="Python">
```python {7,10,14}
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.models.anthropic import AnthropicModel
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.openai import OpenAIProvider
from pydantic_ai.providers.anthropic import AnthropicProvider
from pydantic_ai.providers.google import GoogleProvider
base_url = "http://localhost:8080/pydanticai"
# OpenAI models via Pydantic AI
openai_provider = OpenAIProvider(base_url=f"{base_url}/v1")
openai_model = OpenAIChatModel("gpt-4o-mini", provider=openai_provider)
openai_agent = Agent(openai_model)
# Anthropic models via Pydantic AI
# Note: Anthropic SDK adds /v1 internally, so we don't append it here
anthropic_provider = AnthropicProvider(base_url=base_url)
anthropic_model = AnthropicModel("claude-3-haiku-20240307", provider=anthropic_provider)
anthropic_agent = Agent(anthropic_model)
# Google Gemini models via Pydantic AI
google_provider = GoogleProvider(base_url=base_url, api_key="dummy-key")
google_model = GoogleModel("gemini-2.0-flash", provider=google_provider)
google_agent = Agent(google_model)
# All work the same way
openai_result = openai_agent.run_sync("Hello GPT!")
anthropic_result = anthropic_agent.run_sync("Hello Claude!")
gemini_result = google_agent.run_sync("Hello Gemini!")
print(openai_result.output)
print(anthropic_result.output)
print(gemini_result.output)
```
</Tab>
</Tabs>
---
## Tool Calling
Pydantic AI's powerful tool system works seamlessly through Bifrost:
<Tabs group="pydanticai-sdk">
<Tab title="Python">
```python {7}
from pydantic_ai import Agent, RunContext, Tool
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.providers.openai import OpenAIProvider
from dataclasses import dataclass
# Configure Bifrost
provider = OpenAIProvider(base_url="http://localhost:8080/pydanticai/v1")
model = OpenAIChatModel("gpt-4o-mini", provider=provider)
# Define tools as functions
def get_weather(location: str) -> str:
"""Get the current weather for a location."""
return f"The weather in {location} is 72°F and sunny."
def calculate(expression: str) -> str:
"""Perform a mathematical calculation."""
result = eval(expression) # Use safe evaluation in production
return f"The result is {result}"
# Create agent with tools
agent = Agent(
model,
tools=[get_weather, calculate],
instructions="You can check weather and do calculations."
)
result = agent.run_sync("What's the weather in Boston?")
print(result.output)
```
</Tab>
</Tabs>
---
## Tools with Dependency Injection
Use `RunContext` to pass dependencies to your tools:
<Tabs group="pydanticai-sdk">
<Tab title="Python">
```python {12}
from pydantic_ai import Agent, RunContext, Tool
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.providers.openai import OpenAIProvider
from dataclasses import dataclass
@dataclass
class UserContext:
user_id: int
user_name: str
# Configure Bifrost
provider = OpenAIProvider(base_url="http://localhost:8080/pydanticai/v1")
model = OpenAIChatModel("gpt-4o-mini", provider=provider)
def get_user_info(ctx: RunContext[UserContext]) -> str:
"""Get information about the current user."""
return f"User: {ctx.deps.user_name} (ID: {ctx.deps.user_id})"
agent = Agent(
model,
deps_type=UserContext,
tools=[Tool(get_user_info, takes_ctx=True)],
instructions="You can look up user information."
)
# Pass dependencies at runtime
deps = UserContext(user_id=123, user_name="Alice")
result = agent.run_sync("What is my user information?", deps=deps)
print(result.output)
```
</Tab>
</Tabs>
---
## Structured Output
Define response types using Pydantic models:
<Tabs group="pydanticai-sdk">
<Tab title="Python">
```python {13}
from pydantic import BaseModel, Field
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.providers.openai import OpenAIProvider
# Define structured output type
class CityInfo(BaseModel):
city: str = Field(description="Name of the city")
country: str = Field(description="Country where the city is located")
population: int = Field(description="Approximate population")
# Configure Bifrost
provider = OpenAIProvider(base_url="http://localhost:8080/pydanticai/v1")
model = OpenAIChatModel("gpt-4o-mini", provider=provider)
# Agent with typed output
agent = Agent(
model,
output_type=CityInfo,
instructions="Extract city information from user queries."
)
result = agent.run_sync("Tell me about Tokyo, Japan")
# result.output is typed as CityInfo
print(f"City: {result.output.city}")
print(f"Country: {result.output.country}")
print(f"Population: {result.output.population}")
```
</Tab>
</Tabs>
---
## Streaming Responses
Stream responses in real-time for better UX:
<Tabs group="pydanticai-sdk">
<Tab title="Python">
```python {7}
import asyncio
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.providers.openai import OpenAIProvider
# Configure Bifrost
provider = OpenAIProvider(base_url="http://localhost:8080/pydanticai/v1")
model = OpenAIChatModel("gpt-4o-mini", provider=provider)
agent = Agent(model, instructions="Tell engaging stories.")
async def stream_story():
async with agent.run_stream("Tell me a short story about a robot.") as response:
async for chunk in response.stream_text():
print(chunk, end="", flush=True)
print() # Newline at end
asyncio.run(stream_story())
```
</Tab>
</Tabs>
---
## Adding Custom Headers
Add Bifrost-specific headers for governance and tracking:
<Tabs group="pydanticai-sdk">
<Tab title="Python">
```python {15}
from httpx import AsyncClient
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.providers.openai import OpenAIProvider
# Create HTTP client with custom headers
http_client = AsyncClient(
headers={
"x-bf-vk": "your-virtual-key", # Virtual key for governance
}
)
# Configure provider with custom client
provider = OpenAIProvider(
base_url="http://localhost:8080/pydanticai/v1",
http_client=http_client
)
model = OpenAIChatModel("gpt-4o-mini", provider=provider)
agent = Agent(model)
result = agent.run_sync("Hello!")
print(result.output)
```
</Tab>
</Tabs>
---
## Using Direct Keys
Pass API keys directly to bypass Bifrost's key management. This requires the **Allow Direct API keys** option to be enabled in Bifrost configuration.
> **Learn more:** See [Key Management](../features/keys-management#direct-key-bypass) for enabling direct API key usage.
<Tabs group="pydanticai-sdk">
<Tab title="Python">
```python {8,15,26}
from httpx import AsyncClient
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.models.anthropic import AnthropicModel
from pydantic_ai.providers.openai import OpenAIProvider
from pydantic_ai.providers.anthropic import AnthropicProvider
base_url = "http://localhost:8080/pydanticai"
# Using OpenAI key directly
openai_client = AsyncClient(
headers={"Authorization": "Bearer sk-your-openai-key"}
)
openai_provider = OpenAIProvider(
base_url=f"{base_url}/v1",
http_client=openai_client
)
openai_model = OpenAIChatModel("gpt-4o-mini", provider=openai_provider)
openai_agent = Agent(openai_model)
# Using Anthropic key directly
# Note: Anthropic SDK adds /v1 internally, so we don't append it here
anthropic_client = AsyncClient(
headers={"x-api-key": "sk-ant-your-anthropic-key"}
)
anthropic_provider = AnthropicProvider(
base_url=base_url,
http_client=anthropic_client
)
anthropic_model = AnthropicModel("claude-3-haiku-20240307", provider=anthropic_provider)
anthropic_agent = Agent(anthropic_model)
# Both work through Bifrost with your own keys
openai_result = openai_agent.run_sync("Hello GPT!")
anthropic_result = anthropic_agent.run_sync("Hello Claude!")
```
</Tab>
</Tabs>
---
## Multi-turn Conversations
Maintain conversation history across multiple turns:
<Tabs group="pydanticai-sdk">
<Tab title="Python">
```python {6}
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.providers.openai import OpenAIProvider
# Configure Bifrost
provider = OpenAIProvider(base_url="http://localhost:8080/pydanticai/v1")
model = OpenAIChatModel("gpt-4o-mini", provider=provider)
agent = Agent(model, instructions="Remember context from previous messages.")
# First turn
result1 = agent.run_sync("My name is Alice and I live in Paris.")
# Second turn - pass message history to maintain context
result2 = agent.run_sync(
"What is my name and where do I live?",
message_history=result1.all_messages()
)
print(result2.output) # Should mention Alice and Paris
```
</Tab>
</Tabs>
---
## Supported Features
The Pydantic AI integration supports all features available in both the Pydantic AI SDK and Bifrost core functionality:
| Feature | Supported |
|---------|-----------|
| Chat Completions | ✅ |
| Tool/Function Calling | ✅ |
| Structured Output | ✅ |
| Streaming | ✅ |
| Multi-turn Conversations | ✅ |
| Dependency Injection | ✅ |
| OpenAI Models | ✅ |
| Anthropic Models | ✅ |
| Google Gemini Models | ✅ |
| Embeddings | ✅ |
| Speech/TTS | ✅ |
| Transcription | ✅ |
Your existing Pydantic AI agents work seamlessly with Bifrost's enterprise features. 😄
---
## Next Steps
- **[Governance Features](../features/governance)** - Virtual keys and team management
- **[Semantic Caching](../features/semantic-caching)** - Intelligent response caching
- **[Configuration](../quickstart/README)** - Provider setup and API key management

View File

@@ -0,0 +1,97 @@
---
title: "Pinecone"
description: "Pinecone vector database integration for semantic caching in Bifrost."
icon: "database"
---
## Pinecone
[Pinecone](https://www.pinecone.io/) is a managed vector database service designed for machine learning applications, offering both serverless and pod-based deployment options.
### Key Features
- **Managed Service**: Fully managed with no infrastructure to maintain
- **Serverless Option**: Pay-per-use pricing with automatic scaling
- **High Performance**: Optimized for low-latency vector search
- **Metadata Filtering**: Advanced filtering on vector metadata
- **Namespaces**: Organize vectors into separate namespaces within an index
### Setup & Installation
**Pinecone Cloud:**
- Sign up at [pinecone.io](https://www.pinecone.io/)
- Create a new index with the desired dimensions
- Get your API key and index host URL from the console
**Local Development (Pinecone Local):**
```bash
docker run -d \
--name pinecone-local \
-p 5081:5081 \
ghcr.io/pinecone-io/pinecone-index:latest
```
### Configuration Options
<Tabs group="pinecone-config">
<Tab title="Go SDK">
```go
vectorConfig := &vectorstore.Config{
Enabled: true,
Type: vectorstore.VectorStoreTypePinecone,
Config: vectorstore.PineconeConfig{
APIKey: "your-pinecone-api-key",
IndexHost: "your-index-host.svc.environment.pinecone.io",
},
}
store, err := vectorstore.NewVectorStore(context.Background(), vectorConfig, logger)
```
</Tab>
<Tab title="config.json">
**Cloud Setup:**
```json
{
"vector_store": {
"enabled": true,
"type": "pinecone",
"config": {
"api_key": "your-pinecone-api-key",
"index_host": "your-index-host.svc.environment.pinecone.io"
}
}
}
```
**Local Development:**
```json
{
"vector_store": {
"enabled": true,
"type": "pinecone",
"config": {
"api_key": "pclocal",
"index_host": "localhost:5081"
}
}
}
```
</Tab>
</Tabs>
<Note>
For local development with Pinecone Local, any API key value works (e.g., "pclocal"). The index host should point to localhost:5081 by default.
</Note>
<Warning>
Pinecone requires all IDs to be unique strings. Namespaces are created automatically when you first upsert vectors.
</Warning>
For the VectorStore interface API and usage examples, see [Vector Store Architecture](/architecture/framework/vector-store). For semantic caching setup, see [Semantic Caching](/features/semantic-caching).

View File

@@ -0,0 +1,94 @@
---
title: "Qdrant"
description: "Qdrant vector database integration for semantic caching in Bifrost."
icon: "database"
---
## Qdrant
[Qdrant](https://qdrant.tech/) is a high-performance vector search engine built in Rust.
### Setup & Installation
**Local Qdrant:**
```bash
# Using Docker
docker run -d \
--name qdrant \
-p 6333:6333 \
-p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant:latest
```
**Qdrant Cloud:**
Sign up at [cloud.qdrant.io](https://cloud.qdrant.io)
### Configuration Options
<Tabs group="qdrant-config">
<Tab title="Go SDK">
```go
vectorConfig := &vectorstore.Config{
Enabled: true,
Type: vectorstore.VectorStoreTypeQdrant,
Config: vectorstore.QdrantConfig{
Host: "localhost",
Port: 6334,
APIKey: "",
UseTLS: false,
},
}
store, err := vectorstore.NewVectorStore(context.Background(), vectorConfig, logger)
```
</Tab>
<Tab title="config.json">
**Local Setup:**
```json
{
"vector_store": {
"enabled": true,
"type": "qdrant",
"config": {
"host": "localhost",
"port": 6334
}
}
}
```
**Cloud Setup:**
```json
{
"vector_store": {
"enabled": true,
"type": "qdrant",
"config": {
"host": "your-qdrant-cluster.cloud.qdrant.io",
"port": 6334,
"api_key": "your-qdrant-api-key",
"use_tls": true
}
}
}
```
</Tab>
</Tabs>
<Note>
Qdrant uses port 6334 for gRPC and port 6333 for REST. Bifrost uses the gRPC port.
</Note>
<Warning>
Qdrant requires all IDs to be valid UUIDs. Use `uuid.New().String()` to generate IDs.
</Warning>
For the VectorStore interface API and usage examples, see [Vector Store Architecture](/architecture/framework/vector-store). For semantic caching setup, see [Semantic Caching](/features/semantic-caching).

View File

@@ -0,0 +1,241 @@
---
title: "Redis / Valkey"
description: "Redis and Valkey vector store integration for semantic caching in Bifrost."
icon: "database"
---
## Redis
Redis provides high-performance in-memory vector storage using RediSearch-compatible APIs, ideal for applications requiring sub-millisecond response times and fast semantic search capabilities. Valkey deployments that expose compatible `FT.*` commands are supported through the same configuration.
### Key Features
- **High Performance**: Sub-millisecond cache retrieval with Redis's in-memory storage
- **Cost Effective**: Open-source solution with no licensing costs
- **HNSW Algorithm**: Fast vector similarity search with excellent recall rates
- **Connection Pooling**: Advanced connection management for high-throughput applications
- **TTL Support**: Automatic expiration of cached entries
- **Streaming Support**: Full streaming response caching with proper chunk ordering
- **Flexible Filtering**: Advanced metadata filtering with exact string matching
### Setup & Installation
**Redis Cloud:**
- Sign up at [cloud.redis.io](https://cloud.redis.io)
- Create a new database with RediSearch module enabled
- Get your connection details
**Local Redis with RediSearch:**
```bash
# Using Docker with Redis Stack (includes RediSearch)
docker run -d --name redis-stack -p 6379:6379 redis/redis-stack:latest
```
**Local Valkey Bundle:**
```bash
# Example Valkey bundle with search/vector support
docker run -d --name valkey-bundle -p 6379:6379 valkey/valkey-bundle:9.0.0
```
### Configuration Options
<Tabs group="redis-config">
<Tab title="Go SDK">
```go
// Configure Redis-compatible vector store (Redis or Valkey endpoint)
vectorConfig := &vectorstore.Config{
Enabled: true,
Type: vectorstore.VectorStoreTypeRedis, // Keep type as "redis" for Valkey too
Config: vectorstore.RedisConfig{
Addr: "localhost:6379", // Redis/Valkey server address - REQUIRED
Username: "", // Optional: Redis username
Password: "", // Optional: Redis password
DB: 0, // Optional: Redis database number (default: 0)
// Optional: TLS and cluster settings
UseTLS: false, // Enable TLS for encrypted connections
InsecureSkipVerify: false, // Skip TLS cert verification
ClusterMode: false, // Use Redis Cluster client for cluster endpoints
// Optional: Connection pool settings
PoolSize: 10, // Maximum socket connections
MaxActiveConns: 10, // Maximum active connections
MinIdleConns: 5, // Minimum idle connections
MaxIdleConns: 10, // Maximum idle connections
// Optional: Timeout settings
DialTimeout: 5 * time.Second, // Connection timeout
ReadTimeout: 3 * time.Second, // Read timeout
WriteTimeout: 3 * time.Second, // Write timeout
ContextTimeout: 10 * time.Second, // Operation timeout
},
}
// Create vector store
store, err := vectorstore.NewVectorStore(context.Background(), vectorConfig, logger)
if err != nil {
log.Fatal("Failed to create vector store:", err)
}
```
</Tab>
<Tab title="config.json">
```json
{
"vector_store": {
"enabled": true,
"type": "redis",
"config": {
"addr": "localhost:6379",
"username": "",
"password": "",
"db": 0,
"use_tls": false,
"insecure_skip_verify": false,
"ca_cert_pem": "",
"cluster_mode": false,
"pool_size": 10,
"max_active_conns": 10,
"min_idle_conns": 5,
"max_idle_conns": 10,
"dial_timeout": "5s",
"read_timeout": "3s",
"write_timeout": "3s",
"context_timeout": "10s"
}
}
}
```
**For Redis Cloud or Valkey service endpoints:**
```json
{
"vector_store": {
"enabled": true,
"type": "redis",
"config": {
"addr": "your-redis-host:port",
"username": "your-username",
"password": "your-password",
"db": 0,
"use_tls": true,
"ca_cert_pem": "-----BEGIN CERTIFICATE-----\n...\n-----END CERTIFICATE-----",
"cluster_mode": false,
"context_timeout": "10s"
}
}
}
```
**For managed Redis Cluster endpoints:**
```json
{
"vector_store": {
"enabled": true,
"type": "redis",
"config": {
"addr": "your-cluster-endpoint:6379",
"username": "your-username",
"password": "your-password",
"db": 0,
"use_tls": true,
"ca_cert_pem": "-----BEGIN CERTIFICATE-----\n...\n-----END CERTIFICATE-----",
"cluster_mode": true,
"context_timeout": "10s"
}
}
}
```
</Tab>
</Tabs>
### Redis-Specific Features
**Vector Search Algorithm:**
Redis uses the **HNSW (Hierarchical Navigable Small World)** algorithm for vector similarity search, which provides:
- **Fast Search**: O(log N) search complexity
- **High Accuracy**: Excellent recall rates for similarity search
- **Memory Efficient**: Optimized for in-memory operations
- **Cosine Similarity**: Uses cosine distance metric for semantic similarity
**Connection Pool Management:**
Redis provides extensive connection pool configuration:
```go
config := vectorstore.RedisConfig{
Addr: "localhost:6379",
UseTLS: true, // Enable TLS
ClusterMode: true, // Enable cluster mode
PoolSize: 20, // Max socket connections
MaxActiveConns: 20, // Max active connections
MinIdleConns: 5, // Min idle connections
MaxIdleConns: 10, // Max idle connections
ConnMaxLifetime: 30 * time.Minute, // Connection lifetime
ConnMaxIdleTime: 5 * time.Minute, // Idle connection timeout
DialTimeout: 5 * time.Second, // Connection timeout
ReadTimeout: 3 * time.Second, // Read timeout
WriteTimeout: 3 * time.Second, // Write timeout
ContextTimeout: 10 * time.Second, // Operation timeout
}
```
### Performance Optimization
**Connection Pool Tuning:**
For high-throughput applications, tune the connection pool settings:
```json
{
"vector_store": {
"config": {
"pool_size": 50, // Increase for high concurrency
"max_active_conns": 50, // Match pool_size
"min_idle_conns": 10, // Keep connections warm
"max_idle_conns": 20, // Allow some idle connections
"conn_max_lifetime": "1h", // Refresh connections periodically
"conn_max_idle_time": "10m" // Close idle connections
}
}
}
```
**Memory Optimization:**
- **TTL**: Use appropriate TTL values to prevent memory bloat
- **Namespace Cleanup**: Regularly clean up unused namespaces
**Batch Operations:**
Redis supports efficient batch operations:
```go
// Batch retrieval
results, err := store.GetChunks(ctx, namespace, []string{"id1", "id2", "id3"})
// Batch deletion
deleteResults, err := store.DeleteAll(ctx, namespace, queries)
```
### Production Considerations
<Info>
**TLS and Cluster Mode**: Set `use_tls: true` to enable TLS encryption for the Redis connection, and `insecure_skip_verify: true` if using self-signed certificates. Set `cluster_mode: true` when connecting to a Redis Cluster endpoint. When cluster mode is enabled, the `db` field must be `0` (Redis Cluster does not support database selection).
</Info>
<Info>
**Search Module Required**: Redis/Valkey integration requires a search module/API that supports `FT.*` commands (index creation and vector search). If `FT.INFO` or `FT.SEARCH` is unavailable, semantic caching will not work.
</Info>
<Warning>
**Production Considerations**:
- Use Redis AUTH for production deployments
- Configure appropriate connection timeouts
- Monitor memory usage and set appropriate TTL values
</Warning>
For the VectorStore interface API and usage examples, see [Vector Store Architecture](/architecture/framework/vector-store). For semantic caching setup, see [Semantic Caching](/features/semantic-caching).

View File

@@ -0,0 +1,146 @@
---
title: "Weaviate"
description: "Weaviate vector database integration for semantic caching in Bifrost."
icon: "database"
---
## Weaviate
Weaviate is a production-ready vector database solution that provides advanced querying capabilities, gRPC support for high performance, and flexible schema management for production deployments.
### Key Features
- **gRPC Support**: Enhanced performance with gRPC connections
- **Advanced Filtering**: Complex query operations with multiple conditions
- **Schema Management**: Flexible schema definition for different data types
- **Cloud & Self-Hosted**: Support for both Weaviate Cloud and self-hosted deployments
- **Scalable Storage**: Handle millions of vectors with efficient indexing
### Setup & Installation
**Weaviate Cloud:**
- Sign up at [cloud.weaviate.io](https://cloud.weaviate.io)
- Create a new cluster
- Get your API key and cluster URL
**Local Weaviate:**
```bash
# Using Docker
docker run -d \
--name weaviate \
-p 8080:8080 \
-e QUERY_DEFAULTS_LIMIT=25 \
-e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED='true' \
-e PERSISTENCE_DATA_PATH='/var/lib/weaviate' \
semitechnologies/weaviate:latest
```
### Configuration Options
<Tabs group="weaviate-config">
<Tab title="Go SDK">
```go
// Configure Weaviate vector store
vectorConfig := &vectorstore.Config{
Enabled: true,
Type: vectorstore.VectorStoreTypeWeaviate,
Config: vectorstore.WeaviateConfig{
Scheme: "http", // "http" for local, "https" for cloud
Host: "localhost:8080", // Your Weaviate host
APIKey: "your-weaviate-api-key", // Required for Weaviate Cloud; optional for local/self-hosted
// Enable gRPC for improved performance (optional)
GrpcConfig: &vectorstore.WeaviateGrpcConfig{
Host: "localhost:50051", // gRPC port
Secured: false, // true for TLS
},
},
}
// Create vector store
store, err := vectorstore.NewVectorStore(context.Background(), vectorConfig, logger)
if err != nil {
log.Fatal("Failed to create vector store:", err)
}
```
</Tab>
<Tab title="config.json">
**Local Setup:**
```json
{
"vector_store": {
"enabled": true,
"type": "weaviate",
"config": {
"scheme": "http",
"host": "localhost:8080"
}
}
}
```
**Cloud Setup with gRPC:**
```json
{
"vector_store": {
"enabled": true,
"type": "weaviate",
"config": {
"scheme": "https",
"host": "your-weaviate-host",
"api_key": "your-weaviate-api-key",
"grpc_config": {
"host": "your-weaviate-grpc-host",
"secured": true
}
}
}
}
```
</Tab>
</Tabs>
<Note>
gRPC host should include the port. If no port is specified, port 80 is used for insecured connections and port 443 for secured connections.
</Note>
### Advanced Features
**gRPC Performance Optimization:**
Enable gRPC for better performance in production:
```go
vectorConfig := &vectorstore.Config{
Type: vectorstore.VectorStoreTypeWeaviate,
Config: vectorstore.WeaviateConfig{
Scheme: "https",
Host: "your-weaviate-host",
APIKey: "your-api-key",
// Enable gRPC for better performance
GrpcConfig: &vectorstore.WeaviateGrpcConfig{
Host: "your-weaviate-grpc-host:443",
Secured: true,
},
},
}
```
### Production Considerations
<Info>
**Performance**: For production environments, consider using gRPC configuration for better performance and enable appropriate authentication mechanisms for your Weaviate deployment.
</Info>
<Warning>
**Authentication**: Always use API keys for Weaviate Cloud deployments and configure proper authentication for self-hosted instances in production.
</Warning>
For the VectorStore interface API and usage examples, see [Vector Store Architecture](/architecture/framework/vector-store). For semantic caching setup, see [Semantic Caching](/features/semantic-caching).

View File

@@ -0,0 +1,332 @@
---
title: "What is an integration?"
description: "Protocol adapters that translate between Bifrost's unified API and provider-specific API formats like OpenAI, Anthropic, and Google GenAI."
icon: "box"
---
## Overview
An integration is a protocol adapter that translates between Bifrost's unified API and provider-specific API formats. Each integration handles request transformation, response normalization, and error mapping between the external API contract and Bifrost's internal processing pipeline.
Integrations enable you to utilize Bifrost's features like governance, MCP tools, load balancing, semantic caching, multi-provider support, and more, all while preserving your existing SDK-based architecture. Bifrost handles all the overhead of structure conversion, requiring only a single URL change to switch from direct provider APIs to Bifrost's gateway.
Bifrost converts the request/response format of the provider API to the Bifrost API format based on the integration used, so you don't have to.
---
## Quick Migration
### **Before (Direct Provider)**
```python
import openai
client = openai.OpenAI(
api_key="your-openai-key"
)
```
### **After (Bifrost)**
```python {4}
import openai
client = openai.OpenAI(
base_url="http://localhost:8080/openai", # Point to Bifrost
api_key="dummy-key" # Keys are handled in Bifrost now
)
```
**That's it!** Your application now benefits from Bifrost's features with no other changes.
---
## Supported Integrations
1. [OpenAI](./openai-sdk)
2. [Anthropic](./anthropic-sdk)
3. [Google GenAI](./genai-sdk)
4. [LiteLLM](./litellm-sdk)
5. [Langchain](./langchain-sdk)
6. [AWS Bedrock](./bedrock-sdk)
---
## Provider-Prefixed Models
Use multiple providers seamlessly by prefixing model names with the provider:
<Tabs>
<Tab title="OpenAI">
```python
import openai
# Single client, multiple providers
client = openai.OpenAI(
base_url="http://localhost:8080/openai",
api_key="dummy" # API keys configured in Bifrost
)
# OpenAI models
response1 = client.chat.completions.create(
model="gpt-4o-mini", # (default OpenAI since it's OpenAI's SDK)
messages=[{"role": "user", "content": "Hello!"}]
)
```
</Tab>
<Tab title="Anthropic">
```python
import openai
# Anthropic models using OpenAI SDK format
response2 = client.chat.completions.create(
model="anthropic/claude-3-sonnet-20240229",
messages=[{"role": "user", "content": "Hello!"}]
)
```
</Tab>
<Tab title="Azure">
```python
import openai
# Azure models
response4 = client.chat.completions.create(
model="azure/gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
```
</Tab>
<Tab title="Vertex">
```python
import openai
# Google Vertex models
response3 = client.chat.completions.create(
model="vertex/gemini-pro",
messages=[{"role": "user", "content": "Hello!"}]
)
```
</Tab>
<Tab title="Ollama">
```python
import openai
# Local Ollama models
response5 = client.chat.completions.create(
model="ollama/llama3.1:8b",
messages=[{"role": "user", "content": "Hello!"}]
)
```
</Tab>
</Tabs>
---
## Direct API Usage
For custom HTTP clients or when you have existing provider-specific setup and want to use Bifrost gateway without restructuring your codebase:
```python {5,18,31,}
import requests
# Fully OpenAI compatible endpoint
response = requests.post(
"http://localhost:8080/openai/v1/chat/completions",
headers={
"Authorization": f"Bearer {openai_key}",
"Content-Type": "application/json"
},
json={
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello!"}]
}
)
# Fully Anthropic compatible endpoint
response = requests.post(
"http://localhost:8080/anthropic/v1/messages",
headers={
"Content-Type": "application/json",
},
json={
"model": "claude-3-sonnet-20240229",
"max_tokens": 1000,
"messages": [{"role": "user", "content": "Hello!"}]
}
)
# Fully Google GenAI compatible endpoint
response = requests.post(
"http://localhost:8080/genai/v1beta/models/gemini-1.5-flash/generateContent",
headers={
"Content-Type": "application/json",
},
json={
"contents": [
{"parts": [{"text": "Hello!"}]}
],
"generation_config": {
"max_output_tokens": 1000,
"temperature": 1
}
}
)
```
---
## Listing Models
All integrations support listing available models through their respective list models endpoints (e.g., `/openai/v1/models`, `/anthropic/v1/models`). By default, list models requests return models from **all configured providers** in Bifrost.
### Filtering by Provider
You can control which provider's models to list using the `x-bf-list-models-provider` header:
<Tabs>
<Tab title="Python">
```python
import openai
client = openai.OpenAI(
base_url="http://localhost:8080/openai",
api_key="dummy-key"
)
# List models from all providers (default behavior)
all_models = client.models.list()
# List models from a specific provider only
openai_models = client.models.list(
extra_headers={
"x-bf-list-models-provider": "openai"
}
)
anthropic_models = client.models.list(
extra_headers={
"x-bf-list-models-provider": "anthropic"
}
)
```
</Tab>
<Tab title="JavaScript">
```javascript
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "http://localhost:8080/openai",
apiKey: "dummy-key",
});
// List models from all providers (default behavior)
const allModels = await openai.models.list();
// List models from a specific provider only
const openaiModels = await openai.models.list({
headers: {
"x-bf-list-models-provider": "openai",
},
});
const anthropicModels = await openai.models.list({
headers: {
"x-bf-list-models-provider": "anthropic",
},
});
```
</Tab>
<Tab title="cURL">
```bash
# List models from all providers (default)
curl http://localhost:8080/openai/v1/models
# List models from specific provider
curl http://localhost:8080/openai/v1/models \
-H "x-bf-list-models-provider: openai"
# Explicitly request all providers
curl http://localhost:8080/openai/v1/models \
-H "x-bf-list-models-provider: all"
```
</Tab>
</Tabs>
### Header Behavior
| Header Value | Behavior |
|--------------|----------|
| Not set (default) | Lists models from **all configured providers** |
| `all` | Lists models from **all configured providers** |
| `openai` | Lists models from **OpenAI provider only** |
| `anthropic` | Lists models from **Anthropic provider only** |
| `vertex` | Lists models from **Vertex AI provider only** |
| Any valid provider | Lists models from that specific provider |
### Response Fields
When listing models from all providers, some provider-specific fields may be empty or contain default values if the information is not available from all providers. This is normal behavior as different providers expose different model metadata.
---
## Migration Strategies
### **Gradual Migration**
1. **Start with development** - Test Bifrost in dev environment
2. **Canary deployment** - Route 5% of traffic through Bifrost
3. **Feature-by-feature** - Migrate specific endpoints gradually
4. **Full migration** - Switch all traffic to Bifrost
### **Blue-Green Migration**
```python
import os
import random
# Route traffic based on feature flag
def get_base_url(provider: str) -> str:
if os.getenv("USE_BIFROST", "false") == "true":
return f"http://bifrost:8080/{provider}"
else:
return f"https://api.{provider}.com"
# Gradual rollout
def should_use_bifrost() -> bool:
rollout_percentage = int(os.getenv("BIFROST_ROLLOUT", "0"))
return random.randint(1, 100) <= rollout_percentage
```
### **Feature Flag Integration**
```python
# Using feature flags for safe migration
import openai
from feature_flags import get_flag
def create_client():
if get_flag("use_bifrost_openai"):
base_url = "http://bifrost:8080/openai"
else:
base_url = "https://api.openai.com"
return openai.OpenAI(
base_url=base_url,
api_key=os.getenv("OPENAI_API_KEY")
)
```
---
## Next Steps
- **[HTTP Transport Overview](../quickstart/gateway/setting-up)** - Main HTTP transport guide
- **[Endpoints](../openapi/openapi.json)** - Complete API reference
- **[Configuration](../quickstart/gateway/provider-configuration)** - Provider setup and config