first commit
This commit is contained in:
618
docs/integrations/anthropic-sdk/files-and-batch.mdx
Normal file
618
docs/integrations/anthropic-sdk/files-and-batch.mdx
Normal file
@@ -0,0 +1,618 @@
|
||||
---
|
||||
title: "Files and Batch API"
|
||||
tag: "Beta"
|
||||
description: "Upload files and create batch jobs for asynchronous processing using the Anthropic SDK through Bifrost across multiple providers."
|
||||
icon: "folder-open"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Bifrost supports the Anthropic Files API and Batch API (via the `beta` namespace) with **cross-provider routing**. This means you can use the Anthropic SDK to manage files and batch jobs across multiple providers including Anthropic, OpenAI, and Gemini.
|
||||
|
||||
The provider is specified using the `x-model-provider` header in `default_headers`.
|
||||
|
||||
<Note>
|
||||
**Bedrock Limitation:** Bedrock batch operations require file-based input with S3 storage, which is not supported via the Anthropic SDK's inline batch API. For Bedrock batch operations, use the [Bedrock SDK](../bedrock-sdk/files-and-batch) directly.
|
||||
</Note>
|
||||
|
||||
---
|
||||
|
||||
## Client Setup
|
||||
|
||||
<Note>
|
||||
In API Key section, you can either send virtual key or a dummy key to escape client side validation.
|
||||
</Note>
|
||||
|
||||
### Anthropic Provider (Default)
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="virtual-key-or-dummy-key"
|
||||
)
|
||||
```
|
||||
|
||||
### Cross-Provider Client
|
||||
|
||||
To route requests to a different provider, set the `x-model-provider` header:
|
||||
|
||||
<Tabs group="provider">
|
||||
<Tab title="OpenAI Provider">
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="virtual-key-or-dummy-key",
|
||||
default_headers={"x-model-provider": "openai"}
|
||||
)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Bedrock Provider">
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="virtual-key-or-dummy-key",
|
||||
default_headers={"x-model-provider": "bedrock"}
|
||||
)
|
||||
```
|
||||
|
||||
<Warning>
|
||||
Bedrock can be used for chat completions via the Anthropic SDK, but **batch operations are not supported**. Bedrock requires file-based batch input with S3 storage. Use the [Bedrock SDK](../bedrock-sdk/files-and-batch) for batch operations.
|
||||
</Warning>
|
||||
|
||||
</Tab>
|
||||
<Tab title="Gemini Provider">
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="virtual-key-or-dummy-key",
|
||||
default_headers={"x-model-provider": "gemini"}
|
||||
)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Files API
|
||||
|
||||
The Files API is accessed through the `beta.files` namespace. Note that file support varies by provider.
|
||||
|
||||
### Upload a File
|
||||
|
||||
<Tabs group="provider">
|
||||
<Tab title="Anthropic Provider">
|
||||
|
||||
Upload a text file for use with Anthropic:
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="virtual-key-or-dummy-key"
|
||||
)
|
||||
|
||||
# Upload a text file
|
||||
text_content = b"This is a test file for Files API integration."
|
||||
|
||||
response = client.beta.files.upload(
|
||||
file=("test_upload.txt", text_content, "text/plain"),
|
||||
)
|
||||
|
||||
print(f"File ID: {response.id}")
|
||||
print(f"Filename: {response.filename}")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="OpenAI Provider">
|
||||
|
||||
Upload a JSONL file for OpenAI batch processing:
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
|
||||
# Client configured for OpenAI provider
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="virtual-key-or-dummy-key",
|
||||
default_headers={"x-model-provider": "openai"}
|
||||
)
|
||||
|
||||
# Create JSONL content in OpenAI batch format
|
||||
jsonl_content = b'''{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 100}}
|
||||
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "How are you?"}], "max_tokens": 100}}'''
|
||||
|
||||
response = client.beta.files.upload(
|
||||
file=("batch_input.jsonl", jsonl_content, "application/jsonl"),
|
||||
)
|
||||
|
||||
print(f"File ID: {response.id}")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### List Files
|
||||
|
||||
<Tabs group="provider">
|
||||
<Tab title="Anthropic Provider">
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="virtual-key-or-dummy-key"
|
||||
)
|
||||
|
||||
# List all files
|
||||
response = client.beta.files.list()
|
||||
|
||||
for file in response.data:
|
||||
print(f"File ID: {file.id}")
|
||||
print(f"Filename: {file.filename}")
|
||||
print(f"Size: {file.size} bytes")
|
||||
print("---")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="OpenAI Provider">
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
|
||||
# Client configured for OpenAI provider
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="virtual-key-or-dummy-key",
|
||||
default_headers={"x-model-provider": "openai"}
|
||||
)
|
||||
|
||||
# List all files from OpenAI
|
||||
response = client.beta.files.list()
|
||||
|
||||
for file in response.data:
|
||||
print(f"File ID: {file.id}, Name: {file.filename}")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### Delete a File
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="virtual-key-or-dummy-key",
|
||||
default_headers={"x-model-provider": "openai"} # or omit for anthropic
|
||||
)
|
||||
|
||||
# Delete a file
|
||||
file_id = "file-abc123"
|
||||
response = client.beta.files.delete(file_id)
|
||||
|
||||
print(f"Deleted file: {file_id}")
|
||||
```
|
||||
|
||||
### Download File Content
|
||||
|
||||
Note: Anthropic only allows downloading files created by certain tools (like code execution). OpenAI allows downloading batch output files.
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="virtual-key-or-dummy-key",
|
||||
default_headers={"x-model-provider": "openai"}
|
||||
)
|
||||
|
||||
# Download file content
|
||||
file_id = "file-abc123"
|
||||
response = client.beta.files.download(file_id)
|
||||
|
||||
content = response.text()
|
||||
print(f"File content:\n{content}")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Batch API
|
||||
|
||||
The Anthropic Batch API is accessed through `beta.messages.batches`. Anthropic's batch API uses **inline requests** rather than file uploads.
|
||||
|
||||
### Create a Batch with Inline Requests
|
||||
|
||||
<Tabs group="provider">
|
||||
<Tab title="Anthropic Provider">
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="virtual-key-or-dummy-key"
|
||||
)
|
||||
|
||||
# Create batch with inline requests
|
||||
batch_requests = [
|
||||
{
|
||||
"custom_id": "request-1",
|
||||
"params": {
|
||||
"model": "claude-3-sonnet-20240229",
|
||||
"max_tokens": 100,
|
||||
"messages": [
|
||||
{"role": "user", "content": "What is 2+2?"}
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"custom_id": "request-2",
|
||||
"params": {
|
||||
"model": "claude-3-sonnet-20240229",
|
||||
"max_tokens": 100,
|
||||
"messages": [
|
||||
{"role": "user", "content": "What is the capital of France?"}
|
||||
]
|
||||
}
|
||||
}
|
||||
]
|
||||
|
||||
batch = client.beta.messages.batches.create(requests=batch_requests)
|
||||
|
||||
print(f"Batch ID: {batch.id}")
|
||||
print(f"Status: {batch.processing_status}")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="OpenAI Provider">
|
||||
|
||||
When routing to OpenAI, use OpenAI-compatible models:
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
|
||||
# Client configured for OpenAI provider
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="virtual-key-or-dummy-key",
|
||||
default_headers={"x-model-provider": "openai"}
|
||||
)
|
||||
|
||||
# Create batch with inline requests (using OpenAI models)
|
||||
batch_requests = [
|
||||
{
|
||||
"custom_id": "request-1",
|
||||
"params": {
|
||||
"model": "gpt-4o-mini",
|
||||
"max_tokens": 100,
|
||||
"messages": [
|
||||
{"role": "user", "content": "What is 2+2?"}
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"custom_id": "request-2",
|
||||
"params": {
|
||||
"model": "gpt-4o-mini",
|
||||
"max_tokens": 100,
|
||||
"messages": [
|
||||
{"role": "user", "content": "What is the capital of France?"}
|
||||
]
|
||||
}
|
||||
}
|
||||
]
|
||||
|
||||
batch = client.beta.messages.batches.create(requests=batch_requests)
|
||||
|
||||
print(f"Batch ID: {batch.id}")
|
||||
print(f"Status: {batch.processing_status}")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Gemini Provider">
|
||||
|
||||
When routing to Gemini:
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
|
||||
# Client configured for Gemini provider
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="virtual-key-or-dummy-key",
|
||||
default_headers={"x-model-provider": "gemini"}
|
||||
)
|
||||
|
||||
# Create batch with inline requests (using Gemini models)
|
||||
batch_requests = [
|
||||
{
|
||||
"custom_id": "request-1",
|
||||
"params": {
|
||||
"model": "gemini-1.5-flash",
|
||||
"max_tokens": 100,
|
||||
"messages": [
|
||||
{"role": "user", "content": "What is 2+2?"}
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"custom_id": "request-2",
|
||||
"params": {
|
||||
"model": "gemini-1.5-flash",
|
||||
"max_tokens": 100,
|
||||
"messages": [
|
||||
{"role": "user", "content": "What is the capital of France?"}
|
||||
]
|
||||
}
|
||||
}
|
||||
]
|
||||
|
||||
batch = client.beta.messages.batches.create(requests=batch_requests)
|
||||
|
||||
print(f"Batch ID: {batch.id}")
|
||||
print(f"Status: {batch.processing_status}")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
<Note>
|
||||
**Bedrock Note:** Bedrock requires file-based batch creation with S3 storage. When routing to Bedrock from the Anthropic SDK, you'll need to use the Bedrock SDK directly for batch operations. See the [Bedrock SDK documentation](../bedrock-sdk/files-and-batch) for details.
|
||||
</Note>
|
||||
|
||||
### List Batches
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="virtual-key-or-dummy-key",
|
||||
default_headers={"x-model-provider": "anthropic"} # or "openai", "gemini"
|
||||
)
|
||||
|
||||
# List batches
|
||||
response = client.beta.messages.batches.list(limit=10)
|
||||
|
||||
for batch in response.data:
|
||||
print(f"Batch ID: {batch.id}")
|
||||
print(f"Status: {batch.processing_status}")
|
||||
if batch.request_counts:
|
||||
print(f"Processing: {batch.request_counts.processing}")
|
||||
print(f"Succeeded: {batch.request_counts.succeeded}")
|
||||
print(f"Errored: {batch.request_counts.errored}")
|
||||
print("---")
|
||||
```
|
||||
|
||||
### Retrieve Batch Status
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="virtual-key-or-dummy-key",
|
||||
default_headers={"x-model-provider": "anthropic"} # or "openai", "gemini"
|
||||
)
|
||||
|
||||
# Retrieve batch status
|
||||
batch_id = "batch-abc123"
|
||||
batch = client.beta.messages.batches.retrieve(batch_id)
|
||||
|
||||
print(f"Batch ID: {batch.id}")
|
||||
print(f"Status: {batch.processing_status}")
|
||||
|
||||
if batch.request_counts:
|
||||
print(f"Processing: {batch.request_counts.processing}")
|
||||
print(f"Succeeded: {batch.request_counts.succeeded}")
|
||||
print(f"Errored: {batch.request_counts.errored}")
|
||||
```
|
||||
|
||||
### Cancel a Batch
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="virtual-key-or-dummy-key",
|
||||
default_headers={"x-model-provider": "anthropic"} # or "openai", "gemini"
|
||||
)
|
||||
|
||||
# Cancel batch
|
||||
batch_id = "batch-abc123"
|
||||
batch = client.beta.messages.batches.cancel(batch_id)
|
||||
|
||||
print(f"Batch ID: {batch.id}")
|
||||
print(f"Status: {batch.processing_status}") # "canceling" or "ended"
|
||||
```
|
||||
|
||||
### Get Batch Results
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="virtual-key-or-dummy-key"
|
||||
)
|
||||
|
||||
# Get batch results (only available after batch is completed)
|
||||
batch_id = "batch-abc123"
|
||||
results = client.beta.messages.batches.results(batch_id)
|
||||
|
||||
# Iterate over results
|
||||
for result in results:
|
||||
print(f"Custom ID: {result.custom_id}")
|
||||
if result.result.type == "succeeded":
|
||||
message = result.result.message
|
||||
print(f"Response: {message.content[0].text}")
|
||||
elif result.result.type == "errored":
|
||||
print(f"Error: {result.result.error}")
|
||||
print("---")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## End-to-End Workflows
|
||||
|
||||
### Anthropic Batch Workflow
|
||||
|
||||
```python
|
||||
import time
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="virtual-key-or-dummy-key"
|
||||
)
|
||||
|
||||
# Step 1: Create batch with inline requests
|
||||
print("Step 1: Creating batch...")
|
||||
batch_requests = [
|
||||
{
|
||||
"custom_id": "math-question",
|
||||
"params": {
|
||||
"model": "claude-3-sonnet-20240229",
|
||||
"max_tokens": 100,
|
||||
"messages": [{"role": "user", "content": "What is 15 * 7?"}]
|
||||
}
|
||||
},
|
||||
{
|
||||
"custom_id": "geography-question",
|
||||
"params": {
|
||||
"model": "claude-3-sonnet-20240229",
|
||||
"max_tokens": 100,
|
||||
"messages": [{"role": "user", "content": "What is the largest ocean?"}]
|
||||
}
|
||||
}
|
||||
]
|
||||
|
||||
batch = client.beta.messages.batches.create(requests=batch_requests)
|
||||
print(f" Created batch: {batch.id}, status: {batch.processing_status}")
|
||||
|
||||
# Step 2: Poll for completion
|
||||
print("Step 2: Polling batch status...")
|
||||
for i in range(20):
|
||||
batch = client.beta.messages.batches.retrieve(batch.id)
|
||||
print(f" Poll {i+1}: status = {batch.processing_status}")
|
||||
|
||||
if batch.processing_status == "ended":
|
||||
print(" Batch completed!")
|
||||
break
|
||||
|
||||
if batch.request_counts:
|
||||
print(f" Processing: {batch.request_counts.processing}")
|
||||
print(f" Succeeded: {batch.request_counts.succeeded}")
|
||||
|
||||
time.sleep(5)
|
||||
|
||||
# Step 3: Verify batch is in list
|
||||
print("Step 3: Verifying batch in list...")
|
||||
batch_list = client.beta.messages.batches.list(limit=20)
|
||||
batch_ids = [b.id for b in batch_list.data]
|
||||
assert batch.id in batch_ids, f"Batch {batch.id} should be in list"
|
||||
print(f" Verified batch {batch.id} is in list")
|
||||
|
||||
# Step 4: Get results (if completed)
|
||||
if batch.processing_status == "ended":
|
||||
print("Step 4: Getting results...")
|
||||
try:
|
||||
results = client.beta.messages.batches.results(batch.id)
|
||||
for result in results:
|
||||
print(f" {result.custom_id}: ", end="")
|
||||
if result.result.type == "succeeded":
|
||||
print(result.result.message.content[0].text[:50] + "...")
|
||||
else:
|
||||
print(f"Error: {result.result.error}")
|
||||
except Exception as e:
|
||||
print(f" Results not yet available: {e}")
|
||||
|
||||
print(f"\nSuccess! Batch {batch.id} workflow completed.")
|
||||
```
|
||||
|
||||
### Cross-Provider Batch Workflow (OpenAI via Anthropic SDK)
|
||||
|
||||
```python
|
||||
import time
|
||||
import anthropic
|
||||
|
||||
# Create client with OpenAI provider header
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="virtual-key-or-dummy-key",
|
||||
default_headers={"x-model-provider": "openai"}
|
||||
)
|
||||
|
||||
# Step 1: Create batch with OpenAI models
|
||||
print("Step 1: Creating batch for OpenAI provider...")
|
||||
batch_requests = [
|
||||
{
|
||||
"custom_id": "openai-request-1",
|
||||
"params": {
|
||||
"model": "gpt-4o-mini",
|
||||
"max_tokens": 100,
|
||||
"messages": [{"role": "user", "content": "Explain AI in one sentence."}]
|
||||
}
|
||||
},
|
||||
{
|
||||
"custom_id": "openai-request-2",
|
||||
"params": {
|
||||
"model": "gpt-4o-mini",
|
||||
"max_tokens": 100,
|
||||
"messages": [{"role": "user", "content": "What is machine learning?"}]
|
||||
}
|
||||
}
|
||||
]
|
||||
|
||||
batch = client.beta.messages.batches.create(requests=batch_requests)
|
||||
print(f" Created batch: {batch.id}, status: {batch.processing_status}")
|
||||
|
||||
# Step 2: Poll for completion
|
||||
print("Step 2: Polling batch status...")
|
||||
for i in range(10):
|
||||
batch = client.beta.messages.batches.retrieve(batch.id)
|
||||
print(f" Poll {i+1}: status = {batch.processing_status}")
|
||||
|
||||
if batch.processing_status in ["ended", "completed"]:
|
||||
break
|
||||
|
||||
time.sleep(5)
|
||||
|
||||
print(f"\nSuccess! Cross-provider batch {batch.id} completed via Anthropic SDK.")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Provider-Specific Notes
|
||||
|
||||
| Provider | Header Value | File Upload | Batch Type | Models |
|
||||
|----------|--------------|-------------|------------|--------|
|
||||
| **Anthropic** | `anthropic` or omit | ✅ Beta API | Inline requests | `claude-3-*` |
|
||||
| **OpenAI** | `openai` | ✅ Beta API | Inline requests | `gpt-4o-*`, `gpt-4-*` |
|
||||
| **Gemini** | `gemini` | ✅ Beta API | Inline requests | `gemini-1.5-*` |
|
||||
| **Bedrock** | `bedrock` | ❌ Use Bedrock SDK | File-based (S3) | `anthropic.claude-*` |
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[Overview](./overview)** - Anthropic SDK integration basics
|
||||
- **[Configuration](../../quickstart/gateway/provider-configuration)** - Bifrost setup and configuration
|
||||
- **[Core Features](../../features/)** - Governance, semantic caching, and more
|
||||
449
docs/integrations/anthropic-sdk/overview.mdx
Normal file
449
docs/integrations/anthropic-sdk/overview.mdx
Normal file
@@ -0,0 +1,449 @@
|
||||
---
|
||||
title: "Overview"
|
||||
description: "Use Bifrost as a drop-in replacement for Anthropic API with full compatibility and enhanced features."
|
||||
icon: "book"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Bifrost provides complete Anthropic API compatibility through protocol adaptation. The integration handles request transformation, response normalization, and error mapping between Anthropic's Messages API specification and Bifrost's internal processing pipeline.
|
||||
|
||||
This integration enables you to utilize Bifrost's features like governance, load balancing, semantic caching, multi-provider support, and more, all while preserving your existing Anthropic SDK-based architecture.
|
||||
|
||||
**Endpoint:** `/anthropic`
|
||||
|
||||
<Note>
|
||||
**Enabling the beta header**: Anthropic frequently uses the `anthropic-beta` header to gate access to new features.
|
||||
Clients like Vercels AI SDK use these. Bifrost will block unrecognized headers by default for security purposes.
|
||||
To enable the beta header for full compatability, add `anthropic-beta` to the AllowList under Settings -> Client Settings in the UI.
|
||||
</Note>
|
||||
|
||||
---
|
||||
|
||||
## Setup
|
||||
|
||||
<Tabs group="anthropic-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python {5}
|
||||
import anthropic
|
||||
|
||||
# Configure client to use Bifrost
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="dummy-key" # Keys handled by Bifrost
|
||||
)
|
||||
|
||||
# Make requests as usual
|
||||
response = client.messages.create(
|
||||
model="claude-3-sonnet-20240229",
|
||||
max_tokens=1000,
|
||||
messages=[{"role": "user", "content": "Hello!"}]
|
||||
)
|
||||
|
||||
print(response.content[0].text)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript {5}
|
||||
import Anthropic from "@anthropic-ai/sdk";
|
||||
|
||||
// Configure client to use Bifrost
|
||||
const anthropic = new Anthropic({
|
||||
baseURL: "http://localhost:8080/anthropic",
|
||||
apiKey: "dummy-key", // Keys handled by Bifrost
|
||||
});
|
||||
|
||||
// Make requests as usual
|
||||
const response = await anthropic.messages.create({
|
||||
model: "claude-3-sonnet-20240229",
|
||||
max_tokens: 1000,
|
||||
messages: [{ role: "user", content: "Hello!" }],
|
||||
});
|
||||
|
||||
console.log(response.content[0].text);
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Provider/Model Usage Examples
|
||||
|
||||
Use multiple providers through the same Anthropic SDK format by prefixing model names with the provider:
|
||||
|
||||
<Tabs group="anthropic-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="dummy-key"
|
||||
)
|
||||
|
||||
# Anthropic models (default)
|
||||
anthropic_response = client.messages.create(
|
||||
model="claude-3-sonnet-20240229",
|
||||
max_tokens=1000,
|
||||
messages=[{"role": "user", "content": "Hello from Claude!"}]
|
||||
)
|
||||
|
||||
# OpenAI models via Anthropic SDK format
|
||||
openai_response = client.messages.create(
|
||||
model="openai/gpt-4o-mini",
|
||||
max_tokens=1000,
|
||||
messages=[{"role": "user", "content": "Hello from OpenAI!"}]
|
||||
)
|
||||
|
||||
# Google Vertex models via Anthropic SDK format
|
||||
vertex_response = client.messages.create(
|
||||
model="vertex/gemini-pro",
|
||||
max_tokens=1000,
|
||||
messages=[{"role": "user", "content": "Hello from Gemini!"}]
|
||||
)
|
||||
|
||||
# Azure models
|
||||
azure_response = client.messages.create(
|
||||
model="azure/gpt-4o",
|
||||
max_tokens=1000,
|
||||
messages=[{"role": "user", "content": "Hello from Azure!"}]
|
||||
)
|
||||
|
||||
# Local Ollama models
|
||||
ollama_response = client.messages.create(
|
||||
model="ollama/llama3.1:8b",
|
||||
max_tokens=1000,
|
||||
messages=[{"role": "user", "content": "Hello from Ollama!"}]
|
||||
)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import Anthropic from "@anthropic-ai/sdk";
|
||||
|
||||
const anthropic = new Anthropic({
|
||||
baseURL: "http://localhost:8080/anthropic",
|
||||
apiKey: "dummy-key",
|
||||
});
|
||||
|
||||
// Anthropic models (default)
|
||||
const anthropicResponse = await anthropic.messages.create({
|
||||
model: "claude-3-sonnet-20240229",
|
||||
max_tokens: 1000,
|
||||
messages: [{ role: "user", content: "Hello from Claude!" }],
|
||||
});
|
||||
|
||||
// OpenAI models via Anthropic SDK format
|
||||
const openaiResponse = await anthropic.messages.create({
|
||||
model: "openai/gpt-4o-mini",
|
||||
max_tokens: 1000,
|
||||
messages: [{ role: "user", content: "Hello from OpenAI!" }],
|
||||
});
|
||||
|
||||
// Google Vertex models via Anthropic SDK format
|
||||
const vertexResponse = await anthropic.messages.create({
|
||||
model: "vertex/gemini-pro",
|
||||
max_tokens: 1000,
|
||||
messages: [{ role: "user", content: "Hello from Gemini!" }],
|
||||
});
|
||||
|
||||
// Azure models
|
||||
const azureResponse = await anthropic.messages.create({
|
||||
model: "azure/gpt-4o",
|
||||
max_tokens: 1000,
|
||||
messages: [{ role: "user", content: "Hello from Azure!" }],
|
||||
});
|
||||
|
||||
// Local Ollama models
|
||||
const ollamaResponse = await anthropic.messages.create({
|
||||
model: "ollama/llama3.1:8b",
|
||||
max_tokens: 1000,
|
||||
messages: [{ role: "user", content: "Hello from Ollama!" }],
|
||||
});
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Adding Custom Headers
|
||||
|
||||
Pass custom headers required by Bifrost plugins (like governance, telemetry, etc.):
|
||||
|
||||
<Tabs group="anthropic-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="dummy-key",
|
||||
default_headers={
|
||||
"x-bf-vk": "vk_12345", # Virtual key for governance
|
||||
}
|
||||
)
|
||||
|
||||
response = client.messages.create(
|
||||
model="claude-3-sonnet-20240229",
|
||||
max_tokens=1000,
|
||||
messages=[{"role": "user", "content": "Hello with custom headers!"}]
|
||||
)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import Anthropic from "@anthropic-ai/sdk";
|
||||
|
||||
const anthropic = new Anthropic({
|
||||
baseURL: "http://localhost:8080/anthropic",
|
||||
apiKey: "dummy-key",
|
||||
defaultHeaders: {
|
||||
"x-bf-vk": "vk_12345", // Virtual key for governance
|
||||
},
|
||||
});
|
||||
|
||||
const response = await anthropic.messages.create({
|
||||
model: "claude-3-sonnet-20240229",
|
||||
max_tokens: 1000,
|
||||
messages: [{ role: "user", content: "Hello with custom headers!" }],
|
||||
});
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Using Direct Keys
|
||||
|
||||
Pass API keys directly in requests to bypass Bifrost's load balancing. You can pass any provider's API key (OpenAI, Anthropic, Mistral, etc.) since Bifrost only looks for `Authorization` or `x-api-key` headers. This requires the **Allow Direct API keys** option to be enabled in Bifrost configuration.
|
||||
|
||||
> **Learn more:** See [Key Management](../../features/keys-management#direct-key-bypass) for enabling direct API key usage.
|
||||
|
||||
<Tabs group="anthropic-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
|
||||
# Using Anthropic's API key directly
|
||||
client_with_direct_key = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="sk-your-anthropic-key" # Anthropic's API key works
|
||||
)
|
||||
|
||||
anthropic_response = client_with_direct_key.messages.create(
|
||||
model="claude-3-sonnet-20240229",
|
||||
max_tokens=1000,
|
||||
messages=[{"role": "user", "content": "Hello from Claude!"}]
|
||||
)
|
||||
|
||||
# or pass different provider keys per request using headers
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="dummy-key"
|
||||
)
|
||||
|
||||
# Use Anthropic key for Claude
|
||||
anthropic_response = client.messages.create(
|
||||
model="claude-3-sonnet-20240229",
|
||||
max_tokens=1000,
|
||||
messages=[{"role": "user", "content": "Hello Claude!"}],
|
||||
extra_headers={
|
||||
"x-api-key": "sk-ant-your-anthropic-key"
|
||||
}
|
||||
)
|
||||
|
||||
# Use OpenAI key for GPT models
|
||||
openai_response = client.messages.create(
|
||||
model="openai/gpt-4o-mini",
|
||||
max_tokens=1000,
|
||||
messages=[{"role": "user", "content": "Hello GPT!"}],
|
||||
extra_headers={
|
||||
"Authorization": "Bearer sk-your-openai-key"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import Anthropic from "@anthropic-ai/sdk";
|
||||
|
||||
// Using Anthropic's API key directly
|
||||
const anthropicWithDirectKey = new Anthropic({
|
||||
baseURL: "http://localhost:8080/anthropic",
|
||||
apiKey: "sk-your-anthropic-key", // Anthropic's API key works
|
||||
});
|
||||
|
||||
|
||||
const anthropicResponse = await anthropicWithDirectKey.messages.create({
|
||||
model: "claude-3-sonnet-20240229",
|
||||
max_tokens: 1000,
|
||||
messages: [{ role: "user", content: "Hello from Claude!" }],
|
||||
});
|
||||
|
||||
|
||||
// or pass different provider keys per request using headers
|
||||
const anthropic = new Anthropic({
|
||||
baseURL: "http://localhost:8080/anthropic",
|
||||
apiKey: "dummy-key",
|
||||
});
|
||||
|
||||
// Use Anthropic key for Claude
|
||||
const anthropicResponse = await anthropic.messages.create({
|
||||
model: "claude-3-sonnet-20240229",
|
||||
max_tokens: 1000,
|
||||
messages: [{ role: "user", content: "Hello Claude!" }],
|
||||
headers: {
|
||||
"x-api-key": "sk-ant-your-anthropic-key",
|
||||
},
|
||||
});
|
||||
|
||||
// Use OpenAI key for GPT models
|
||||
const openaiResponseWithHeader = await anthropic.messages.create({
|
||||
model: "openai/gpt-4o-mini",
|
||||
max_tokens: 1000,
|
||||
messages: [{ role: "user", content: "Hello GPT!" }],
|
||||
headers: {
|
||||
"Authorization": "Bearer sk-your-openai-key",
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Async Inference
|
||||
|
||||
Submit inference requests asynchronously and poll for results later using the `x-bf-async` header. This is useful for long-running requests where you don't want to hold a connection open. See [Async Inference](../../features/async-inference) for full details.
|
||||
|
||||
<Note>
|
||||
Async inference requires a [Logs Store](../../features/observability/default) to be configured and is not compatible with streaming.
|
||||
</Note>
|
||||
|
||||
### Messages
|
||||
|
||||
<Tabs group="anthropic-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
import time
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic",
|
||||
api_key="dummy-key"
|
||||
)
|
||||
|
||||
# Submit async request
|
||||
initial = client.messages.create(
|
||||
model="anthropic/claude-sonnet-4-20250514",
|
||||
max_tokens=256,
|
||||
messages=[{"role": "user", "content": "Tell me a short story."}],
|
||||
extra_headers={"x-bf-async": "true"}
|
||||
)
|
||||
|
||||
# If content is present, the request completed synchronously
|
||||
if initial.content:
|
||||
print(initial.content[0].text)
|
||||
else:
|
||||
# Poll until completed
|
||||
while True:
|
||||
time.sleep(2)
|
||||
poll = client.messages.create(
|
||||
model="anthropic/claude-sonnet-4-20250514",
|
||||
max_tokens=256,
|
||||
messages=[{"role": "user", "content": "Tell me a short story."}],
|
||||
extra_headers={"x-bf-async-id": initial.id}
|
||||
)
|
||||
if poll.content:
|
||||
print(poll.content[0].text)
|
||||
break
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import Anthropic from "@anthropic-ai/sdk";
|
||||
|
||||
const anthropic = new Anthropic({
|
||||
baseURL: "http://localhost:8080/anthropic",
|
||||
apiKey: "dummy-key",
|
||||
});
|
||||
|
||||
// Submit async request
|
||||
const initial = await anthropic.messages.create(
|
||||
{
|
||||
model: "anthropic/claude-sonnet-4-20250514",
|
||||
max_tokens: 256,
|
||||
messages: [{ role: "user", content: "Tell me a short story." }],
|
||||
},
|
||||
{ headers: { "x-bf-async": "true" } }
|
||||
);
|
||||
|
||||
// If content is present, the request completed synchronously
|
||||
if (initial.content?.length > 0) {
|
||||
console.log(initial.content[0].text);
|
||||
} else {
|
||||
// Poll until completed
|
||||
while (true) {
|
||||
await new Promise((r) => setTimeout(r, 2000));
|
||||
const poll = await anthropic.messages.create(
|
||||
{
|
||||
model: "anthropic/claude-sonnet-4-20250514",
|
||||
max_tokens: 256,
|
||||
messages: [{ role: "user", content: "Tell me a short story." }],
|
||||
},
|
||||
{ headers: { "x-bf-async-id": initial.id } }
|
||||
);
|
||||
if (poll.content?.length > 0) {
|
||||
console.log(poll.content[0].text);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### Async Headers
|
||||
|
||||
| Header | Description |
|
||||
|---|---|
|
||||
| `x-bf-async: true` | Submit the request as an async job. Returns immediately with a job ID. |
|
||||
| `x-bf-async-id: <job-id>` | Poll for results of a previously submitted async job. |
|
||||
| `x-bf-async-job-result-ttl: <seconds>` | Override the default result TTL (default: 3600s). |
|
||||
|
||||
---
|
||||
|
||||
## Supported Features
|
||||
|
||||
The Anthropic integration supports all features that are available in both the Anthropic SDK and Bifrost core functionality. If the Anthropic SDK supports a feature and Bifrost supports it, the integration will work seamlessly.
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[Files and Batch API](./files-and-batch)** - File uploads and batch processing
|
||||
- **[OpenAI SDK](../openai-sdk/overview)** - GPT integration patterns
|
||||
- **[Google GenAI SDK](../genai-sdk)** - Gemini integration patterns
|
||||
- **[Configuration](../../quickstart/README)** - Bifrost setup and configuration
|
||||
- **[Core Features](../../features/)** - Advanced Bifrost capabilities
|
||||
|
||||
1119
docs/integrations/bedrock-sdk/files-and-batch.mdx
Normal file
1119
docs/integrations/bedrock-sdk/files-and-batch.mdx
Normal file
File diff suppressed because it is too large
Load Diff
269
docs/integrations/bedrock-sdk/overview.mdx
Normal file
269
docs/integrations/bedrock-sdk/overview.mdx
Normal file
@@ -0,0 +1,269 @@
|
||||
---
|
||||
title: "Overview"
|
||||
description: "Use Bifrost as a Bedrock-compatible gateway for the Converse and Invoke APIs, with Bifrost features on top."
|
||||
icon: "book"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Bifrost provides a Bedrock-compatible endpoint for the **Converse** and **Invoke** APIs via protocol adaptation. The integration handles request transformation, response normalization, and error mapping between AWS Bedrock's API specification and Bifrost's internal processing pipeline.
|
||||
|
||||
This integration enables you to utilize Bifrost's features like governance, load balancing, semantic caching, multi-provider support, and more, all while preserving your existing Bedrock SDK-based architecture.
|
||||
|
||||
**Endpoint:** `/bedrock`
|
||||
|
||||
## Setup
|
||||
|
||||
<Tabs group="bedrock-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python {6}
|
||||
import boto3
|
||||
|
||||
# Configure boto3 Bedrock client to use Bifrost
|
||||
# Note: When using Bifrost keys, dummy credentials are required
|
||||
# because boto3 needs credentials to sign requests, even though
|
||||
# Bifrost will use its own configured keys.
|
||||
client = boto3.client(
|
||||
service_name="bedrock-runtime",
|
||||
endpoint_url="http://localhost:8080/bedrock",
|
||||
region_name="us-west-2",
|
||||
aws_access_key_id="bifrost-dummy-key", # Required when using Bifrost keys
|
||||
aws_secret_access_key="bifrost-dummy-secret" # Required when using Bifrost keys
|
||||
)
|
||||
|
||||
# Make requests as usual
|
||||
response = client.converse(
|
||||
modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
|
||||
messages=[
|
||||
{
|
||||
"role": "user",
|
||||
"content": [{"text": "Hello!"}]
|
||||
}
|
||||
]
|
||||
)
|
||||
|
||||
print(response)
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
## Provider/Model Usage Examples
|
||||
|
||||
Because Bedrock itself is a multi-provider platform, you can use any Bedrock-supported model ID and still route through Bifrost. Bifrost will handle governance, observability, and other cross-cutting concerns.
|
||||
|
||||
```python
|
||||
import boto3
|
||||
import json
|
||||
|
||||
client = boto3.client(
|
||||
service_name="bedrock-runtime",
|
||||
endpoint_url="http://localhost:8080/bedrock",
|
||||
region_name="us-west-2",
|
||||
aws_access_key_id="bifrost-dummy-key",
|
||||
aws_secret_access_key="bifrost-dummy-secret"
|
||||
)
|
||||
|
||||
# Anthropic via Bedrock (Converse API)
|
||||
anthropic_response = client.converse(
|
||||
modelId="anthropic.claude-3-sonnet-20240229",
|
||||
messages=[{"role": "user", "content": [{"text": "Hello from Claude!"}]}]
|
||||
)
|
||||
|
||||
# Mistral via Bedrock (Converse API)
|
||||
mistral_response = client.converse(
|
||||
modelId="mistral.mistral-large-2407",
|
||||
messages=[{"role": "user", "content": [{"text": "Hello from Mistral!"}]}]
|
||||
)
|
||||
|
||||
# Mistral via Bedrock (Invoke API)
|
||||
mistral_invoke_response = client.invoke_model(
|
||||
modelId="mistral.mistral-large-2407",
|
||||
contentType="application/json",
|
||||
accept="application/json",
|
||||
body=json.dumps({
|
||||
"prompt": "Say hello from Mistral using Invoke API.",
|
||||
"max_tokens": 50,
|
||||
"temperature": 0.7
|
||||
}),
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Adding Custom Headers
|
||||
|
||||
Pass custom headers required by Bifrost plugins (like governance, telemetry, etc.) using boto3's event system:
|
||||
|
||||
<Tabs group="bedrock-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
import boto3
|
||||
|
||||
def add_bifrost_headers(request, **kwargs):
|
||||
"""Add custom Bifrost headers to the request before signing."""
|
||||
request.headers.add_header("x-bf-vk", "vk_12345") # Virtual key for governance
|
||||
request.headers.add_header("x-bf-env", "production") # Environment tag
|
||||
|
||||
client = boto3.client(
|
||||
service_name="bedrock-runtime",
|
||||
endpoint_url="http://localhost:8080/bedrock",
|
||||
region_name="us-west-2",
|
||||
aws_access_key_id="bifrost-dummy-key",
|
||||
aws_secret_access_key="bifrost-dummy-secret"
|
||||
)
|
||||
|
||||
# Register the header injection for all Bedrock API calls
|
||||
client.meta.events.register_first(
|
||||
"before-sign.bedrock-runtime.*",
|
||||
add_bifrost_headers,
|
||||
)
|
||||
|
||||
# Now make requests with custom headers
|
||||
response = client.converse(
|
||||
modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
|
||||
messages=[{"role": "user", "content": [{"text": "Hello with custom headers!"}]}]
|
||||
)
|
||||
```
|
||||
|
||||
> **Note:** Use `register_first` to ensure headers are added before request signing. The event name format is `before-sign.<service-name>.<operation-name>`. You need to register for each API operation you plan to use (Converse, ConverseStream, InvokeModel, etc.).
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Streaming Examples
|
||||
|
||||
### Converse Stream
|
||||
|
||||
Use `converse_stream` for chat-based streaming with a unified interface across models.
|
||||
|
||||
```python
|
||||
import boto3
|
||||
|
||||
client = boto3.client(
|
||||
service_name="bedrock-runtime",
|
||||
endpoint_url="http://localhost:8080/bedrock",
|
||||
region_name="us-west-2",
|
||||
aws_access_key_id="bifrost-dummy-key",
|
||||
aws_secret_access_key="bifrost-dummy-secret"
|
||||
)
|
||||
|
||||
response = client.converse_stream(
|
||||
modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
|
||||
messages=[{"role": "user", "content": [{"text": "Tell me a story about a brave knight."}]}],
|
||||
inferenceConfig={"maxTokens": 512, "temperature": 0.5}
|
||||
)
|
||||
|
||||
print("Response:")
|
||||
for chunk in response["stream"]:
|
||||
if "contentBlockDelta" in chunk:
|
||||
text = chunk["contentBlockDelta"]["delta"]["text"]
|
||||
print(text, end="", flush=True)
|
||||
```
|
||||
|
||||
### Invoke Stream
|
||||
|
||||
Use `invoke_model_with_response_stream` for model-specific streaming payloads.
|
||||
|
||||
```python
|
||||
import boto3
|
||||
import json
|
||||
|
||||
client = boto3.client(
|
||||
service_name="bedrock-runtime",
|
||||
endpoint_url="http://localhost:8080/bedrock",
|
||||
region_name="us-west-2",
|
||||
aws_access_key_id="bifrost-dummy-key",
|
||||
aws_secret_access_key="bifrost-dummy-secret"
|
||||
)
|
||||
|
||||
# Example for Claude 3 (Messages API format)
|
||||
body = json.dumps({
|
||||
"anthropic_version": "bedrock-2023-05-31",
|
||||
"max_tokens": 1024,
|
||||
"messages": [
|
||||
{"role": "user", "content": "Write a haiku about coding."}
|
||||
]
|
||||
})
|
||||
|
||||
response = client.invoke_model_with_response_stream(
|
||||
modelId="anthropic.claude-3-haiku-20240307-v1:0",
|
||||
body=body,
|
||||
contentType="application/json",
|
||||
accept="application/json"
|
||||
)
|
||||
|
||||
print("Response:")
|
||||
for event in response.get("body"):
|
||||
if "chunk" in event:
|
||||
chunk = event["chunk"]
|
||||
if "bytes" in chunk:
|
||||
# The chunk bytes contain the model-specific JSON response
|
||||
result = json.loads(chunk["bytes"].decode("utf-8"))
|
||||
|
||||
# Extract content based on model (e.g., Claude)
|
||||
if "delta" in result and "text" in result["delta"]:
|
||||
print(result["delta"]["text"], end="", flush=True)
|
||||
elif "completion" in result:
|
||||
print(result["completion"], end="", flush=True)
|
||||
```
|
||||
|
||||
## Using Direct Keys
|
||||
|
||||
Pass AWS credentials or Bedrock API keys directly in requests to bypass Bifrost's load balancing. This requires the **Allow Direct API keys** option to be enabled in Bifrost configuration.
|
||||
|
||||
> **Learn more:** See [Key Management](../../features/keys-management#direct-key-bypass) for enabling direct API key usage.
|
||||
|
||||
When direct keys are enabled, you can pass your AWS credentials directly to the boto3 client instead of using dummy credentials.
|
||||
|
||||
<Tabs group="bedrock-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
import boto3
|
||||
|
||||
# When direct keys are enabled, pass real AWS credentials to boto3
|
||||
client = boto3.client(
|
||||
service_name="bedrock-runtime",
|
||||
endpoint_url="http://localhost:8080/bedrock",
|
||||
region_name="us-west-2",
|
||||
aws_access_key_id="your-aws-access-key", # Real credentials when direct keys enabled
|
||||
aws_secret_access_key="your-aws-secret-key" # Real credentials when direct keys enabled
|
||||
)
|
||||
|
||||
response = client.converse(
|
||||
modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
|
||||
messages=[{"role": "user", "content": [{"text": "Hello!"}]}]
|
||||
)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
> **Note:** When using Bifrost's configured keys (not direct keys), you must provide dummy AWS credentials (`aws_access_key_id` and `aws_secret_access_key`) to the boto3 client. This is because boto3 requires credentials to sign requests, even though Bifrost will use its own configured keys. The dummy values can be any string (e.g., `"bifrost-dummy-key"` and `"bifrost-dummy-secret"`).
|
||||
|
||||
---
|
||||
|
||||
## Supported Features
|
||||
|
||||
The Bedrock integration currently supports:
|
||||
|
||||
- **Converse** API (`/bedrock/model/{modelId}/converse`) for text/chat-style workloads
|
||||
- **Invoke** API (`/bedrock/model/{modelId}/invoke`) for model-specific text completion workloads
|
||||
- **Streaming** via `converse_stream` and `invoke_model_with_response_stream`
|
||||
- **Tools** via `toolConfig`, `toolUse`, and `toolResult` inside Converse requests
|
||||
- **Image and multimodal** responses where supported by the underlying Bedrock model
|
||||
- All Bifrost core features that apply to these flows (governance, load balancing, semantic cache, observability, etc.)
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[Files and Batch API](./files-and-batch)** - S3-based file operations and batch processing
|
||||
- **[What is an integration?](../what-is-an-integration)** - Core integration concepts
|
||||
- **[Configuration](../../quickstart/gateway/provider-configuration)** - Bedrock provider setup and API key management
|
||||
- **[Core Features](../../features/)** - Governance, semantic caching, and more
|
||||
|
||||
1022
docs/integrations/genai-sdk/files-and-batch.mdx
Normal file
1022
docs/integrations/genai-sdk/files-and-batch.mdx
Normal file
File diff suppressed because it is too large
Load Diff
317
docs/integrations/genai-sdk/overview.mdx
Normal file
317
docs/integrations/genai-sdk/overview.mdx
Normal file
@@ -0,0 +1,317 @@
|
||||
---
|
||||
title: "Overview"
|
||||
description: "Use Bifrost as a drop-in replacement for Google GenAI API with full compatibility and enhanced features."
|
||||
icon: "book"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Bifrost provides complete Google GenAI API compatibility through protocol adaptation. The integration handles request transformation, response normalization, and error mapping between Google's GenAI API specification and Bifrost's internal processing pipeline.
|
||||
|
||||
This integration enables you to utilize Bifrost's features like governance, load balancing, semantic caching, multi-provider support, and more, all while preserving your existing Google GenAI SDK-based architecture.
|
||||
|
||||
**Endpoint:** `/genai`
|
||||
|
||||
---
|
||||
|
||||
## Setup
|
||||
|
||||
<Tabs group="genai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python {7}
|
||||
from google import genai
|
||||
from google.genai.types import HttpOptions
|
||||
|
||||
# Configure client to use Bifrost
|
||||
client = genai.Client(
|
||||
api_key="dummy-key", # Keys handled by Bifrost
|
||||
http_options=HttpOptions(base_url="http://localhost:8080/genai")
|
||||
)
|
||||
|
||||
# Make requests as usual
|
||||
response = client.models.generate_content(
|
||||
model="gemini-1.5-flash",
|
||||
contents="Hello!"
|
||||
)
|
||||
|
||||
print(response.text)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript {5}
|
||||
import { GoogleGenerativeAI } from "@google/generative-ai";
|
||||
|
||||
// Configure client to use Bifrost
|
||||
const genAI = new GoogleGenerativeAI("dummy-key", {
|
||||
baseUrl: "http://localhost:8080/genai", // Keys handled by Bifrost
|
||||
});
|
||||
|
||||
// Make requests as usual
|
||||
const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash" });
|
||||
const response = await model.generateContent("Hello!");
|
||||
|
||||
console.log(response.response.text());
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Provider/Model Usage Examples
|
||||
|
||||
Use multiple providers through the same GenAI SDK format by prefixing model names with the provider:
|
||||
|
||||
<Tabs group="genai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
from google import genai
|
||||
from google.genai.types import HttpOptions
|
||||
|
||||
client = genai.Client(
|
||||
api_key="dummy-key",
|
||||
http_options=HttpOptions(base_url="http://localhost:8080/genai")
|
||||
)
|
||||
|
||||
# Google Vertex models (default)
|
||||
vertex_response = client.models.generate_content(
|
||||
model="gemini-1.5-flash",
|
||||
contents="Hello from Gemini!"
|
||||
)
|
||||
|
||||
# OpenAI models via GenAI SDK format
|
||||
openai_response = client.models.generate_content(
|
||||
model="openai/gpt-4o-mini",
|
||||
contents="Hello from OpenAI!"
|
||||
)
|
||||
|
||||
# Anthropic models via GenAI SDK format
|
||||
anthropic_response = client.models.generate_content(
|
||||
model="anthropic/claude-3-sonnet-20240229",
|
||||
contents="Hello from Claude!"
|
||||
)
|
||||
|
||||
# Azure models
|
||||
azure_response = client.models.generate_content(
|
||||
model="azure/gpt-4o",
|
||||
contents="Hello from Azure!"
|
||||
)
|
||||
|
||||
# Local Ollama models
|
||||
ollama_response = client.models.generate_content(
|
||||
model="ollama/llama3.1:8b",
|
||||
contents="Hello from Ollama!"
|
||||
)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import { GoogleGenerativeAI } from "@google/generative-ai";
|
||||
|
||||
const genAI = new GoogleGenerativeAI("dummy-key", {
|
||||
baseUrl: "http://localhost:8080/genai",
|
||||
});
|
||||
|
||||
// Google Vertex models (default)
|
||||
const geminiModel = genAI.getGenerativeModel({ model: "gemini-1.5-flash" });
|
||||
const vertexResponse = await geminiModel.generateContent("Hello from Gemini!");
|
||||
|
||||
// OpenAI models via GenAI SDK format
|
||||
const openaiModel = genAI.getGenerativeModel({ model: "openai/gpt-4o-mini" });
|
||||
const openaiResponse = await openaiModel.generateContent("Hello from OpenAI!");
|
||||
|
||||
// Anthropic models via GenAI SDK format
|
||||
const anthropicModel = genAI.getGenerativeModel({ model: "anthropic/claude-3-sonnet-20240229" });
|
||||
const anthropicResponse = await anthropicModel.generateContent("Hello from Claude!");
|
||||
|
||||
// Azure models
|
||||
const azureModel = genAI.getGenerativeModel({ model: "azure/gpt-4o" });
|
||||
const azureResponse = await azureModel.generateContent("Hello from Azure!");
|
||||
|
||||
// Local Ollama models
|
||||
const ollamaModel = genAI.getGenerativeModel({ model: "ollama/llama3.1:8b" });
|
||||
const ollamaResponse = await ollamaModel.generateContent("Hello from Ollama!");
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Adding Custom Headers
|
||||
|
||||
Pass custom headers required by Bifrost plugins (like governance, telemetry, etc.):
|
||||
|
||||
<Tabs group="genai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
from google import genai
|
||||
from google.genai.types import HttpOptions
|
||||
|
||||
# Configure client with custom headers
|
||||
client = genai.Client(
|
||||
api_key="dummy-key",
|
||||
http_options=HttpOptions(
|
||||
base_url="http://localhost:8080/genai",
|
||||
headers={
|
||||
"x-bf-vk": "vk_12345", # Virtual key for governance
|
||||
}
|
||||
)
|
||||
)
|
||||
|
||||
response = client.models.generate_content(
|
||||
model="gemini-1.5-flash",
|
||||
contents="Hello with custom headers!"
|
||||
)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import { GoogleGenerativeAI } from "@google/generative-ai";
|
||||
|
||||
// Configure client with custom headers
|
||||
const genAI = new GoogleGenerativeAI("dummy-key", {
|
||||
baseUrl: "http://localhost:8080/genai",
|
||||
customHeaders: {
|
||||
"x-bf-vk": "vk_12345", // Virtual key for governance
|
||||
},
|
||||
});
|
||||
|
||||
const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash" });
|
||||
const response = await model.generateContent("Hello with custom headers!");
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Using Direct Keys
|
||||
|
||||
Pass API keys directly in requests to bypass Bifrost's load balancing. You can pass any provider's API key (OpenAI, Anthropic, Mistral, etc.) since Bifrost only looks for `Authorization`, `x-api-key` and `x-goog-api-key` headers. This requires the **Allow Direct API keys** option to be enabled in Bifrost configuration.
|
||||
|
||||
> **Learn more:** See [Key Management](../../features/keys-management#direct-key-bypass) for enabling direct API key usage.
|
||||
|
||||
<Tabs group="genai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
from google import genai
|
||||
from google.genai.types import HttpOptions
|
||||
|
||||
# Pass different provider keys per request using headers
|
||||
client = genai.Client(
|
||||
api_key="gemini-key",
|
||||
http_options=HttpOptions(base_url="http://localhost:8080/genai")
|
||||
)
|
||||
|
||||
# Use Gemini key directly
|
||||
gemini_response = client.models.generate_content(
|
||||
model="gemini-1.5-flash",
|
||||
contents="Hello Gemini!"
|
||||
)
|
||||
|
||||
# Use Anthropic key for Claude models
|
||||
anthropic_response = client.models.generate_content(
|
||||
model="anthropic/claude-3-sonnet-20240229",
|
||||
contents="Hello Claude!",
|
||||
request_options={
|
||||
"headers": {"x-api-key": "your-anthropic-api-key"}
|
||||
}
|
||||
)
|
||||
|
||||
# Use OpenAI key for GPT models
|
||||
openai_response = client.models.generate_content(
|
||||
model="openai/gpt-4o-mini",
|
||||
contents="Hello GPT!",
|
||||
request_options={
|
||||
"headers": {"Authorization": "Bearer sk-your-openai-key"}
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import { GoogleGenerativeAI } from "@google/generative-ai";
|
||||
|
||||
// Pass different provider keys per request using headers
|
||||
const genAI = new GoogleGenerativeAI("gemini-key", {
|
||||
baseUrl: "http://localhost:8080/genai",
|
||||
});
|
||||
|
||||
// Use Gemini key directly
|
||||
const geminiModel = genAI.getGenerativeModel({
|
||||
model: "gemini-1.5-flash"
|
||||
});
|
||||
const geminiResponse = await geminiModel.generateContent("Hello Gemini!");
|
||||
|
||||
// Use Anthropic key for Claude models
|
||||
const anthropicModel = genAI.getGenerativeModel({
|
||||
model: "anthropic/claude-3-sonnet-20240229",
|
||||
requestOptions: {
|
||||
customHeaders: { "x-api-key": "your-anthropic-api-key" }
|
||||
}
|
||||
});
|
||||
const anthropicResponse = await anthropicModel.generateContent("Hello Claude!");
|
||||
|
||||
// Use OpenAI key for GPT models
|
||||
const gptModel = genAI.getGenerativeModel({
|
||||
model: "openai/gpt-4o-mini",
|
||||
requestOptions: {
|
||||
customHeaders: { "Authorization": "Bearer sk-your-openai-key" }
|
||||
}
|
||||
});
|
||||
const gptResponse = await gptModel.generateContent("Hello GPT!");
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Dynamic Thinking Budget
|
||||
|
||||
When `thinkingConfig.thinkingBudget` is set to `-1`, Bifrost handles it differently per provider:
|
||||
|
||||
- **Gemini**: Preserves `-1` for native dynamic thinking support
|
||||
- **Anthropic**, **Bedrock**, **Cohere**: Converts to minimum reasoning budget value (1024)
|
||||
- **OpenAI**: Converts to medium reasoning effort
|
||||
|
||||
```python
|
||||
response = client.models.glenerate_content(
|
||||
model="gemini-2.5-flash",
|
||||
contents="Complex reasoning task",
|
||||
config={
|
||||
"thinking_config": {
|
||||
"include_thoughts": true,
|
||||
"thinking_budget": -1 # Dynamic thinking
|
||||
}
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Supported Features
|
||||
|
||||
The Google GenAI integration supports all features that are available in both the Google GenAI SDK and Bifrost core functionality. If the Google GenAI SDK supports a feature and Bifrost supports it, the integration will work seamlessly.
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[OpenAI SDK](../openai-sdk/overview)** - GPT integration patterns
|
||||
- **[Configuration](../../quickstart/gateway/provider-configuration)** - Bifrost setup and configuration
|
||||
- **[Core Features](../../features/)** - Advanced Bifrost capabilities
|
||||
|
||||
104
docs/integrations/guardrails/aws-bedrock.mdx
Normal file
104
docs/integrations/guardrails/aws-bedrock.mdx
Normal file
@@ -0,0 +1,104 @@
|
||||
---
|
||||
title: "AWS Bedrock Guardrails"
|
||||
description: "Integrate AWS Bedrock Guardrails with Bifrost for enterprise-grade content filtering, PII protection, prompt attack detection, and image content analysis."
|
||||
icon: "aws"
|
||||
---
|
||||
|
||||
Bifrost integrates with **Amazon Bedrock Guardrails** to provide enterprise-grade content filtering and safety features with deep AWS integration. This page covers the configuration and capabilities of the AWS Bedrock guardrail provider.
|
||||
|
||||

|
||||
|
||||
## Capabilities
|
||||
|
||||
- **Content Filters**: Hate speech, insults, sexual content, violence, misconduct
|
||||
- **Denied Topics**: Block specific topics or categories
|
||||
- **Word Filters**: Custom profanity and sensitive word blocking
|
||||
- **PII Protection**: Detect and redact 50+ PII entity types
|
||||
- **Contextual Grounding**: Verify responses against source documents
|
||||
- **Prompt Attack Detection**: Identify injection and jailbreak attempts
|
||||
- **Image Content Support**: Analyze images in addition to text (PNG, JPEG)
|
||||
|
||||
## Configuration Fields
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `access_key` | string | Yes | - | AWS Access Key ID |
|
||||
| `secret_key` | string | Yes | - | AWS Secret Access Key |
|
||||
| `guardrail_arn` | string | Yes | - | ARN of the Bedrock guardrail |
|
||||
| `guardrail_version` | string | Yes | - | Version of the guardrail (e.g., "1", "DRAFT") |
|
||||
| `region` | string | Yes | - | AWS region |
|
||||
|
||||
## Authentication
|
||||
|
||||
Uses AWS SDK with static credentials:
|
||||
```json
|
||||
{
|
||||
"access_key": "AKIAXXXXXXXXXXXXXXXXXX",
|
||||
"secret_key": "your-secret-access-key",
|
||||
"guardrail_arn": "arn:aws:bedrock:us-east-1:123456789:guardrail/abc123",
|
||||
"guardrail_version": "1",
|
||||
"region": "us-east-1"
|
||||
}
|
||||
```
|
||||
|
||||
## Supported AWS Regions
|
||||
|
||||
| Region Code | Region Name |
|
||||
|-------------|-------------|
|
||||
| `us-east-1` | US East (N. Virginia) |
|
||||
| `us-east-2` | US East (Ohio) |
|
||||
| `us-west-1` | US West (N. California) |
|
||||
| `us-west-2` | US West (Oregon) |
|
||||
| `ap-south-1` | Asia Pacific (Mumbai) |
|
||||
| `ap-northeast-1` | Asia Pacific (Tokyo) |
|
||||
| `ap-northeast-2` | Asia Pacific (Seoul) |
|
||||
| `ap-southeast-1` | Asia Pacific (Singapore) |
|
||||
| `ap-southeast-2` | Asia Pacific (Sydney) |
|
||||
| `eu-central-1` | Europe (Frankfurt) |
|
||||
| `eu-west-1` | Europe (Ireland) |
|
||||
| `eu-west-2` | Europe (London) |
|
||||
| `eu-west-3` | Europe (Paris) |
|
||||
|
||||
## Supported Content Types
|
||||
|
||||
- Text content
|
||||
- Images (PNG, JPEG formats)
|
||||
|
||||
## Usage Metrics Returned
|
||||
|
||||
Bedrock guardrails return detailed usage metrics for cost tracking and monitoring:
|
||||
|
||||
| Metric | Description |
|
||||
|--------|-------------|
|
||||
| `content_policy_units` | Units consumed by content policy evaluation |
|
||||
| `contextual_grounding_policy_units` | Units for grounding checks |
|
||||
| `sensitive_information_policy_units` | Units for PII detection |
|
||||
| `topic_policy_units` | Units for topic filtering |
|
||||
| `word_policy_units` | Units for word filtering |
|
||||
| `automated_reasoning_policy_units` | Units for reasoning checks |
|
||||
| `content_policy_image_units` | Units for image content analysis |
|
||||
|
||||
## Supported PII Types
|
||||
|
||||
- Personal identifiers (SSN, passport, driver's license)
|
||||
- Financial information (credit cards, bank accounts)
|
||||
- Contact information (email, phone, address)
|
||||
- Medical information (health records, insurance)
|
||||
- Device identifiers (IP addresses, MAC addresses)
|
||||
|
||||
## Provider Capabilities Comparison
|
||||
|
||||
| Capability | AWS Bedrock | Azure Content Safety | GraySwan | Patronus AI |
|
||||
|------------|-------------|---------------------|----------|-------------|
|
||||
| PII Detection | Yes | No | No | Yes |
|
||||
| Content Filtering | Yes | Yes | Yes | Yes |
|
||||
| Prompt Injection | Yes | Yes | Yes | Yes |
|
||||
| Hallucination Detection | No | No | No | Yes |
|
||||
| Toxicity Screening | Yes | Yes | Yes | Yes |
|
||||
| Custom Policies | Yes | Yes | Yes | Yes |
|
||||
| Custom Natural Language Rules | No | No | Yes | No |
|
||||
| Image Support | Yes | No | No | No |
|
||||
| IPI Detection | No | Yes | Yes | No |
|
||||
| Mutation Detection | No | No | Yes | No |
|
||||
|
||||
For information on configuring guardrail rules and profiles, see [Guardrails](/enterprise/guardrails).
|
||||
80
docs/integrations/guardrails/azure-content-safety.mdx
Normal file
80
docs/integrations/guardrails/azure-content-safety.mdx
Normal file
@@ -0,0 +1,80 @@
|
||||
---
|
||||
title: "Azure Content Safety"
|
||||
description: "Integrate Azure AI Content Safety with Bifrost for multi-modal content moderation, severity-based filtering, prompt shield, and custom blocklist support."
|
||||
icon: "microsoft"
|
||||
---
|
||||
|
||||
Bifrost integrates with **Azure AI Content Safety** to provide multi-modal content moderation powered by Microsoft's advanced AI models. This page covers the configuration and capabilities of the Azure Content Safety guardrail provider.
|
||||
|
||||

|
||||
|
||||
## Capabilities
|
||||
|
||||
- **Severity-Based Filtering**: 4-level severity classification (Safe, Low, Medium, High)
|
||||
- **Multi-Category Detection**: Hate, sexual, violence, self-harm content
|
||||
- **Prompt Shield**: Advanced jailbreak and injection detection
|
||||
- **Indirect Attack Detection**: Identify hidden malicious instructions
|
||||
- **Protected Material**: Detect copyrighted content (output only)
|
||||
- **Custom Blocklists**: Define organization-specific blocked terms
|
||||
|
||||
## Configuration Fields
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
| -------------------------------- | ------- | -------- | -------- | ------------------------------------------------------------ |
|
||||
| `endpoint` | string | Yes | - | Azure Content Safety endpoint URL |
|
||||
| `api_key` | string | Yes | - | Azure subscription key |
|
||||
| `analyze_enabled` | boolean | No | true | Enable content analysis for Hate, Sexual, Violence, SelfHarm |
|
||||
| `analyze_severity_threshold` | enum | No | "medium" | Severity level to trigger: `low`, `medium`, or `high` |
|
||||
| `jailbreak_shield_enabled` | boolean | No | false | Enable jailbreak detection (input only) |
|
||||
| `indirect_attack_shield_enabled` | boolean | No | false | Enable indirect prompt attack detection (input only) |
|
||||
| `copyright_enabled` | boolean | No | false | Enable copyrighted content detection (output only) |
|
||||
| `text_blocklist_enabled` | boolean | No | false | Enable custom blocklist filtering |
|
||||
| `blocklist_names` | array | No | - | List of Azure blocklist names to apply |
|
||||
|
||||
## Collecting your API key and URL
|
||||
|
||||
Navigate to Azure foundry dashboard
|
||||
|
||||
<Frame>
|
||||
<img src="/media/guardrails/azure-api-key.png" alt="Azure foundry dashboard" />
|
||||
</Frame>
|
||||
|
||||
- Copy API key to use it in the Azure content moderation config form
|
||||
- Copy project endpoint and use base URL as endpoint in the form. e.g. (`https://xxx-resource.services.ai.azure.com`)
|
||||
|
||||
## Severity Threshold Levels
|
||||
|
||||
| Threshold | Numeric Value | Behavior |
|
||||
| --------- | ------------- | ----------------------------------------- |
|
||||
| `low` | 2 | Most strict - blocks severity 2 and above |
|
||||
| `medium` | 4 | Balanced - blocks severity 4 and above |
|
||||
| `high` | 6 | Least strict - blocks only severity 6 |
|
||||
|
||||
## Detection Categories
|
||||
|
||||
- Hate and fairness
|
||||
- Sexual content
|
||||
- Violence
|
||||
- Self-harm
|
||||
|
||||
<Note>
|
||||
**Input-only features:** Jailbreak Shield and Indirect Attack Shield only apply to input validation. **Output-only
|
||||
features:** Copyright detection only applies to output validation.
|
||||
</Note>
|
||||
|
||||
## Provider Capabilities Comparison
|
||||
|
||||
| Capability | AWS Bedrock | Azure Content Safety | GraySwan | Patronus AI |
|
||||
| ----------------------------- | ----------- | -------------------- | -------- | ----------- |
|
||||
| PII Detection | Yes | No | No | Yes |
|
||||
| Content Filtering | Yes | Yes | Yes | Yes |
|
||||
| Prompt Injection | Yes | Yes | Yes | Yes |
|
||||
| Hallucination Detection | No | No | No | Yes |
|
||||
| Toxicity Screening | Yes | Yes | Yes | Yes |
|
||||
| Custom Policies | Yes | Yes | Yes | Yes |
|
||||
| Custom Natural Language Rules | No | No | Yes | No |
|
||||
| Image Support | Yes | No | No | No |
|
||||
| IPI Detection | No | Yes | Yes | No |
|
||||
| Mutation Detection | No | No | Yes | No |
|
||||
|
||||
For information on configuring guardrail rules and profiles, see [Guardrails](/enterprise/guardrails).
|
||||
70
docs/integrations/guardrails/grayswan.mdx
Normal file
70
docs/integrations/guardrails/grayswan.mdx
Normal file
@@ -0,0 +1,70 @@
|
||||
---
|
||||
title: "GraySwan Cygnal"
|
||||
description: "Integrate GraySwan Cygnal Monitor with Bifrost for AI safety monitoring with natural language rule definitions, violation scoring, and advanced threat detection."
|
||||
icon: "shield-check"
|
||||
---
|
||||
|
||||
Bifrost integrates with **GraySwan Cygnal Monitor** to provide AI safety monitoring with natural language rule definitions and advanced threat detection capabilities. This page covers the configuration and capabilities of the GraySwan Cygnal guardrail provider.
|
||||
|
||||

|
||||
|
||||
## Capabilities
|
||||
|
||||
- **Violation Scoring**: Continuous 0-1 scale violation detection with configurable thresholds
|
||||
- **Custom Natural Language Rules**: Define safety rules in plain English without code
|
||||
- **Policy Management**: Use pre-built policies from GraySwan platform or create custom ones
|
||||
- **Indirect Prompt Injection (IPI) Detection**: Identify hidden instructions in user inputs
|
||||
- **Mutation Detection**: Detect attempts to manipulate or alter content
|
||||
- **Reasoning Modes**: Choose from fast ("off"), balanced ("hybrid"), or thorough ("thinking") analysis
|
||||
|
||||
## Configuration Fields
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `api_key` | string | Yes | - | GraySwan API key |
|
||||
| `violation_threshold` | number | No | 0.5 | Score threshold (0-1) for triggering intervention. Lower values are more strict. |
|
||||
| `reasoning_mode` | enum | No | "off" | Analysis depth: `off` (fastest), `hybrid` (balanced), or `thinking` (most thorough) |
|
||||
| `policy_id` | string | No | - | Single custom policy ID from GraySwan platform |
|
||||
| `policy_ids` | array | No | - | Multiple policy IDs for aggregated rule evaluation |
|
||||
| `rules` | object | No | - | Custom natural language rules as key-value pairs |
|
||||
|
||||
## Custom Rules Example
|
||||
|
||||

|
||||
|
||||
Rules are defined as key-value pairs where the key is the rule name and the value is a natural language description:
|
||||
|
||||
```json
|
||||
{
|
||||
"rules": {
|
||||
"no_profanity": "Do not allow profanity or vulgar language",
|
||||
"no_pii": "Do not allow personally identifiable information",
|
||||
"professional_tone": "Ensure all responses maintain a professional tone"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Detection Features
|
||||
|
||||
- Real-time violation scoring
|
||||
- Multi-rule evaluation
|
||||
- IPI attack detection
|
||||
- Content mutation monitoring
|
||||
- Detailed violation descriptions with rule attribution
|
||||
|
||||
## Provider Capabilities Comparison
|
||||
|
||||
| Capability | AWS Bedrock | Azure Content Safety | GraySwan | Patronus AI |
|
||||
|------------|-------------|---------------------|----------|-------------|
|
||||
| PII Detection | Yes | No | No | Yes |
|
||||
| Content Filtering | Yes | Yes | Yes | Yes |
|
||||
| Prompt Injection | Yes | Yes | Yes | Yes |
|
||||
| Hallucination Detection | No | No | No | Yes |
|
||||
| Toxicity Screening | Yes | Yes | Yes | Yes |
|
||||
| Custom Policies | Yes | Yes | Yes | Yes |
|
||||
| Custom Natural Language Rules | No | No | Yes | No |
|
||||
| Image Support | Yes | No | No | No |
|
||||
| IPI Detection | No | Yes | Yes | No |
|
||||
| Mutation Detection | No | No | Yes | No |
|
||||
|
||||
For information on configuring guardrail rules and profiles, see [Guardrails](/enterprise/guardrails).
|
||||
40
docs/integrations/guardrails/patronus-ai.mdx
Normal file
40
docs/integrations/guardrails/patronus-ai.mdx
Normal file
@@ -0,0 +1,40 @@
|
||||
---
|
||||
title: "Patronus AI"
|
||||
description: "Integrate Patronus AI with Bifrost for LLM security and safety including hallucination detection, PII identification, toxicity screening, and custom evaluators."
|
||||
icon: "brain"
|
||||
---
|
||||
|
||||
Bifrost integrates with **Patronus AI** to provide specialized LLM security and safety with advanced evaluation capabilities. This page covers the configuration and capabilities of the Patronus AI guardrail provider.
|
||||
|
||||
## Capabilities
|
||||
|
||||
- **Hallucination Detection**: Identify factually incorrect responses
|
||||
- **PII Detection**: Comprehensive personal data identification
|
||||
- **Toxicity Screening**: Multi-language toxic content detection
|
||||
- **Prompt Injection Defense**: Advanced attack pattern recognition
|
||||
- **Custom Evaluators**: Build organization-specific safety checks
|
||||
- **Real-Time Monitoring**: Continuous safety validation
|
||||
|
||||
## Advanced Features
|
||||
|
||||
- Context-aware evaluation
|
||||
- Multi-turn conversation analysis
|
||||
- Custom policy templates
|
||||
- Integration with existing safety workflows
|
||||
|
||||
## Provider Capabilities Comparison
|
||||
|
||||
| Capability | AWS Bedrock | Azure Content Safety | GraySwan | Patronus AI |
|
||||
|------------|-------------|---------------------|----------|-------------|
|
||||
| PII Detection | Yes | No | No | Yes |
|
||||
| Content Filtering | Yes | Yes | Yes | Yes |
|
||||
| Prompt Injection | Yes | Yes | Yes | Yes |
|
||||
| Hallucination Detection | No | No | No | Yes |
|
||||
| Toxicity Screening | Yes | Yes | Yes | Yes |
|
||||
| Custom Policies | Yes | Yes | Yes | Yes |
|
||||
| Custom Natural Language Rules | No | No | Yes | No |
|
||||
| Image Support | Yes | No | No | No |
|
||||
| IPI Detection | No | Yes | Yes | No |
|
||||
| Mutation Detection | No | No | Yes | No |
|
||||
|
||||
For information on configuring guardrail rules and profiles, see [Guardrails](/enterprise/guardrails).
|
||||
724
docs/integrations/langchain-sdk.mdx
Normal file
724
docs/integrations/langchain-sdk.mdx
Normal file
@@ -0,0 +1,724 @@
|
||||
---
|
||||
title: "Langchain SDK"
|
||||
description: "Use Bifrost as a drop-in proxy for Langchain applications with zero code changes."
|
||||
icon: "crow"
|
||||
---
|
||||
|
||||
Since Langchain already provides multi-provider abstraction and chaining capabilities, Bifrost adds enterprise features like governance, semantic caching, MCP tools, observability, etc, on top of your existing setup.
|
||||
|
||||
**Endpoint:** `/langchain`
|
||||
|
||||
<Warning>
|
||||
**Provider Compatibility:** This integration only works for AI providers that both Langchain and Bifrost support. If you're using a provider specific to Langchain that Bifrost doesn't support (or vice versa), those requests will fail.
|
||||
</Warning>
|
||||
---
|
||||
|
||||
## Setup
|
||||
|
||||
<Tabs group="langchain-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python {7}
|
||||
from langchain_openai import ChatOpenAI
|
||||
from langchain_core.messages import HumanMessage
|
||||
|
||||
# Configure client to use Bifrost
|
||||
llm = ChatOpenAI(
|
||||
model="gpt-4o-mini",
|
||||
openai_api_base="http://localhost:8080/langchain", # Point to Bifrost
|
||||
openai_api_key="dummy-key" # Keys managed by Bifrost
|
||||
)
|
||||
|
||||
response = llm.invoke([HumanMessage(content="Hello!")])
|
||||
print(response.content)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript {7}
|
||||
import { ChatOpenAI } from "@langchain/openai";
|
||||
|
||||
// Configure client to use Bifrost
|
||||
const llm = new ChatOpenAI({
|
||||
model: "gpt-4o-mini",
|
||||
configuration: {
|
||||
baseURL: "http://localhost:8080/langchain", // Point to Bifrost
|
||||
},
|
||||
openAIApiKey: "dummy-key" // Keys managed by Bifrost
|
||||
});
|
||||
|
||||
const response = await llm.invoke("Hello!");
|
||||
console.log(response.content);
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Provider/Model Usage Examples
|
||||
|
||||
Your existing Langchain provider switching works unchanged through Bifrost:
|
||||
|
||||
<Tabs group="langchain-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
from langchain_openai import ChatOpenAI
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
from langchain_google_genai import ChatGoogleGenerativeAI
|
||||
from langchain_core.messages import HumanMessage
|
||||
|
||||
base_url = "http://localhost:8080/langchain"
|
||||
|
||||
# OpenAI models via Langchain
|
||||
openai_llm = ChatOpenAI(
|
||||
model="gpt-4o-mini",
|
||||
openai_api_base=base_url
|
||||
)
|
||||
|
||||
# Anthropic models via Langchain
|
||||
anthropic_llm = ChatAnthropic(
|
||||
model="claude-3-sonnet-20240229",
|
||||
anthropic_api_url=base_url
|
||||
)
|
||||
|
||||
# Google models via Langchain
|
||||
google_llm = ChatGoogleGenerativeAI(
|
||||
model="gemini-1.5-flash",
|
||||
google_api_base=base_url
|
||||
)
|
||||
|
||||
# All work the same way
|
||||
openai_response = openai_llm.invoke([HumanMessage(content="Hello GPT!")])
|
||||
anthropic_response = anthropic_llm.invoke([HumanMessage(content="Hello Claude!")])
|
||||
google_response = google_llm.invoke([HumanMessage(content="Hello Gemini!")])
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import { ChatOpenAI } from "@langchain/openai";
|
||||
import { ChatAnthropic } from "@langchain/anthropic";
|
||||
import { ChatGoogleGenerativeAI } from "@langchain/google-genai";
|
||||
|
||||
const baseURL = "http://localhost:8080/langchain";
|
||||
|
||||
// OpenAI models via Langchain
|
||||
const openaiLlm = new ChatOpenAI({
|
||||
model: "gpt-4o-mini",
|
||||
configuration: { baseURL }
|
||||
});
|
||||
|
||||
// Anthropic models via Langchain
|
||||
const anthropicLlm = new ChatAnthropic({
|
||||
model: "claude-3-sonnet-20240229",
|
||||
clientOptions: { baseURL }
|
||||
});
|
||||
|
||||
// Google models via Langchain
|
||||
const googleLlm = new ChatGoogleGenerativeAI({
|
||||
model: "gemini-1.5-flash",
|
||||
baseURL
|
||||
});
|
||||
|
||||
// All work the same way
|
||||
const openaiResponse = await openaiLlm.invoke("Hello GPT!");
|
||||
const anthropicResponse = await anthropicLlm.invoke("Hello Claude!");
|
||||
const googleResponse = await googleLlm.invoke("Hello Gemini!");
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Adding Custom Headers
|
||||
|
||||
Add Bifrost-specific headers for governance and tracking. Different LangChain provider classes support different methods for adding custom headers:
|
||||
|
||||
<Tabs group="langchain-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
### ChatOpenAI
|
||||
|
||||
Use `default_headers` parameter for OpenAI models:
|
||||
|
||||
```python
|
||||
from langchain_openai import ChatOpenAI
|
||||
from langchain_core.messages import HumanMessage
|
||||
|
||||
llm = ChatOpenAI(
|
||||
model="gpt-4o-mini",
|
||||
openai_api_base="http://localhost:8080/langchain",
|
||||
default_headers={
|
||||
"x-bf-vk": "your-virtual-key",
|
||||
}
|
||||
)
|
||||
|
||||
response = llm.invoke([HumanMessage(content="Hello!")])
|
||||
print(response.content)
|
||||
```
|
||||
|
||||
### ChatAnthropic
|
||||
|
||||
Use `default_headers` parameter for Anthropic models:
|
||||
|
||||
```python
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
from langchain_core.messages import HumanMessage
|
||||
|
||||
llm = ChatAnthropic(
|
||||
model="claude-3-sonnet-20240229",
|
||||
anthropic_api_url="http://localhost:8080/langchain",
|
||||
default_headers={
|
||||
"x-bf-vk": "your-virtual-key", # Virtual key for governance
|
||||
}
|
||||
)
|
||||
|
||||
response = llm.invoke([HumanMessage(content="Hello!")])
|
||||
print(response.content)
|
||||
```
|
||||
|
||||
### ChatGoogleGenerativeAI
|
||||
|
||||
Use `additional_headers` parameter for Google/Gemini models:
|
||||
|
||||
```python
|
||||
from langchain_google_genai import ChatGoogleGenerativeAI
|
||||
from langchain_core.messages import HumanMessage
|
||||
|
||||
llm = ChatGoogleGenerativeAI(
|
||||
model="gemini-2.5-flash",
|
||||
google_api_base="http://localhost:8080/langchain",
|
||||
additional_headers={
|
||||
"x-bf-vk": "your-virtual-key", # Virtual key for governance
|
||||
}
|
||||
)
|
||||
|
||||
response = llm.invoke([HumanMessage(content="Hello!")])
|
||||
print(response.content)
|
||||
```
|
||||
### ChatBedrockConverse
|
||||
|
||||
For Bedrock models, there are two approaches:
|
||||
|
||||
**Method 1: Using the client's event system (after initialization)**
|
||||
|
||||
```python
|
||||
from langchain_aws import ChatBedrockConverse
|
||||
from langchain_core.messages import HumanMessage
|
||||
|
||||
llm = ChatBedrockConverse(
|
||||
model="us.anthropic.claude-haiku-4-5-20251001-v1:0",
|
||||
region_name="us-west-2",
|
||||
endpoint_url="http://localhost:8080/langchain",
|
||||
aws_access_key_id="dummy-access-key",
|
||||
aws_secret_access_key="dummy-secret-key",
|
||||
max_tokens=2000
|
||||
)
|
||||
|
||||
def add_bifrost_headers(request, **kwargs):
|
||||
"""Add custom headers to Bedrock requests"""
|
||||
request.headers.add_header("x-bf-vk", "your-virtual-key")
|
||||
|
||||
# Register header injection for all Bedrock operations
|
||||
llm.client.meta.events.register_first(
|
||||
"before-sign.bedrock-runtime.*",
|
||||
add_bifrost_headers
|
||||
)
|
||||
|
||||
response = llm.invoke([HumanMessage(content="Hello!")])
|
||||
print(response.content)
|
||||
```
|
||||
|
||||
**Method 2: Pre-configuring a boto3 client**
|
||||
|
||||
```python
|
||||
from langchain_aws import ChatBedrockConverse
|
||||
from langchain_core.messages import HumanMessage
|
||||
import boto3
|
||||
|
||||
# Create and configure boto3 client
|
||||
bedrock_client = boto3.client(
|
||||
service_name="bedrock-runtime",
|
||||
region_name="us-west-2",
|
||||
endpoint_url="http://localhost:8080/langchain",
|
||||
aws_access_key_id="dummy-access-key",
|
||||
aws_secret_access_key="dummy-secret-key"
|
||||
)
|
||||
|
||||
def add_bifrost_headers(request, **kwargs):
|
||||
"""Add custom headers to Bedrock requests"""
|
||||
request.headers.add_header("x-bf-vk", "your-virtual-key")
|
||||
|
||||
# Register header injection before creating LLM
|
||||
bedrock_client.meta.events.register_first(
|
||||
"before-sign.bedrock-runtime.*",
|
||||
add_bifrost_headers
|
||||
)
|
||||
|
||||
# Pass the configured client to ChatBedrockConverse
|
||||
llm = ChatBedrockConverse(
|
||||
model="us.anthropic.claude-haiku-4-5-20251001-v1:0",
|
||||
client=bedrock_client,
|
||||
max_tokens=2000
|
||||
)
|
||||
|
||||
response = llm.invoke([HumanMessage(content="Hello!")])
|
||||
print(response.content)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
### ChatOpenAI
|
||||
|
||||
Use `defaultHeaders` in configuration for OpenAI models:
|
||||
|
||||
```javascript
|
||||
import { ChatOpenAI } from "@langchain/openai";
|
||||
|
||||
const llm = new ChatOpenAI({
|
||||
model: "gpt-4o-mini",
|
||||
configuration: {
|
||||
baseURL: "http://localhost:8080/langchain",
|
||||
defaultHeaders: {
|
||||
"x-bf-vk": "your-virtual-key", // Virtual key for governance
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
const response = await llm.invoke("Hello!");
|
||||
console.log(response.content);
|
||||
```
|
||||
|
||||
### ChatAnthropic
|
||||
|
||||
Use `defaultHeaders` in clientOptions for Anthropic models:
|
||||
|
||||
```javascript
|
||||
import { ChatAnthropic } from "@langchain/anthropic";
|
||||
|
||||
const llm = new ChatAnthropic({
|
||||
model: "claude-3-sonnet-20240229",
|
||||
clientOptions: {
|
||||
baseURL: "http://localhost:8080/langchain",
|
||||
defaultHeaders: {
|
||||
"x-bf-vk": "your-virtual-key", // Virtual key for governance
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
const response = await llm.invoke("Hello!");
|
||||
console.log(response.content);
|
||||
```
|
||||
|
||||
### ChatGoogleGenerativeAI
|
||||
|
||||
Use `additionalHeaders` for Google/Gemini models:
|
||||
|
||||
```javascript
|
||||
import { ChatGoogleGenerativeAI } from "@langchain/google-genai";
|
||||
|
||||
const llm = new ChatGoogleGenerativeAI({
|
||||
model: "gemini-2.5-flash",
|
||||
baseURL: "http://localhost:8080/langchain",
|
||||
additionalHeaders: {
|
||||
"x-bf-vk": "your-virtual-key", // Virtual key for governance
|
||||
}
|
||||
});
|
||||
|
||||
const response = await llm.invoke("Hello!");
|
||||
console.log(response.content);
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Using Direct Keys
|
||||
|
||||
Pass API keys directly to bypass Bifrost's key management. You can pass any provider's API key since Bifrost only looks for `Authorization` or `x-api-key` headers. This requires the **Allow Direct API keys** option to be enabled in Bifrost configuration.
|
||||
|
||||
> **Learn more:** See [Key Management](../features/keys-management#direct-key-bypass) for enabling direct API key usage.
|
||||
|
||||
<Tabs group="langchain-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
from langchain_openai import ChatOpenAI
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
from langchain_core.messages import HumanMessage
|
||||
|
||||
# Using OpenAI key directly
|
||||
openai_llm = ChatOpenAI(
|
||||
model="gpt-4o-mini",
|
||||
openai_api_base="http://localhost:8080/langchain",
|
||||
default_headers={
|
||||
"Authorization": "Bearer sk-your-openai-key"
|
||||
}
|
||||
)
|
||||
|
||||
# Using Anthropic key for Claude models
|
||||
anthropic_llm = ChatAnthropic(
|
||||
model="claude-3-sonnet-20240229",
|
||||
anthropic_api_url="http://localhost:8080/langchain",
|
||||
default_headers={
|
||||
"x-api-key": "sk-ant-your-anthropic-key"
|
||||
}
|
||||
)
|
||||
|
||||
# Using Azure with direct Azure key
|
||||
from langchain_openai import AzureChatOpenAI
|
||||
|
||||
azure_llm = AzureChatOpenAI(
|
||||
deployment_name="gpt-4o-aug",
|
||||
api_key="your-azure-api-key",
|
||||
azure_endpoint="http://localhost:8080/langchain",
|
||||
api_version="2024-05-01-preview",
|
||||
max_tokens=100,
|
||||
default_headers={
|
||||
"x-bf-azure-endpoint": "https://your-resource.openai.azure.com",
|
||||
}
|
||||
)
|
||||
|
||||
openai_response = openai_llm.invoke([HumanMessage(content="Hello GPT!")])
|
||||
anthropic_response = anthropic_llm.invoke([HumanMessage(content="Hello Claude!")])
|
||||
azure_response = azure_llm.invoke([HumanMessage(content="Hello from Azure!")])
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import { ChatOpenAI } from "@langchain/openai";
|
||||
import { ChatAnthropic } from "@langchain/anthropic";
|
||||
|
||||
// Using OpenAI key directly
|
||||
const openaiLlm = new ChatOpenAI({
|
||||
model: "gpt-4o-mini",
|
||||
configuration: {
|
||||
baseURL: "http://localhost:8080/langchain",
|
||||
defaultHeaders: {
|
||||
"Authorization": "Bearer sk-your-openai-key"
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
// Using Anthropic key for Claude models
|
||||
const anthropicLlm = new ChatAnthropic({
|
||||
model: "claude-3-sonnet-20240229",
|
||||
clientOptions: {
|
||||
baseURL: "http://localhost:8080/langchain",
|
||||
defaultHeaders: {
|
||||
"x-api-key": "sk-ant-your-anthropic-key"
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
// Using Azure with direct Azure key
|
||||
import { AzureChatOpenAI } from "@langchain/openai";
|
||||
|
||||
const azureLlm = new AzureChatOpenAI({
|
||||
deploymentName: "gpt-4o-aug",
|
||||
apiKey: "your-azure-api-key",
|
||||
azureOpenAIEndpoint: "http://localhost:8080/langchain",
|
||||
apiVersion: "2024-05-01-preview",
|
||||
maxTokens: 100,
|
||||
configuration: {
|
||||
defaultHeaders: {
|
||||
"x-bf-azure-endpoint": "https://your-resource.openai.azure.com",
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
const openaiResponse = await openaiLlm.invoke("Hello GPT!");
|
||||
const anthropicResponse = await anthropicLlm.invoke("Hello Claude!");
|
||||
const azureResponse = await azureLlm.invoke("Hello from Azure!");
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Reasoning/Thinking Models
|
||||
|
||||
Control extended reasoning capabilities for models that support thinking/reasoning modes.
|
||||
|
||||
### Azure OpenAI Models
|
||||
|
||||
For Azure OpenAI reasoning models, use `ChatOpenAI` with the `reasoning` parameter and Azure-specific headers:
|
||||
|
||||
<Tabs group="langchain-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
from langchain_openai import ChatOpenAI
|
||||
from langchain_core.messages import HumanMessage
|
||||
|
||||
# Azure OpenAI with reasoning control
|
||||
llm = ChatOpenAI(
|
||||
model="azure/gpt-5.1", # Azure deployment name
|
||||
base_url="http://localhost:8080/langchain",
|
||||
api_key="dummy-key",
|
||||
reasoning={
|
||||
"effort": "high", # "minimal" | "low" | "medium" | "high"
|
||||
"summary": "detailed" # "auto" | "concise" | "detailed"
|
||||
},
|
||||
default_headers={
|
||||
"authorization": "Bearer your-azure-api-key",
|
||||
"x-bf-azure-endpoint": "https://your-resource.openai.azure.com"
|
||||
}
|
||||
)
|
||||
|
||||
response = llm.invoke([HumanMessage(content="Solve this complex problem...")])
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import { ChatOpenAI } from "@langchain/openai";
|
||||
|
||||
// Azure OpenAI with reasoning control
|
||||
const llm = new ChatOpenAI({
|
||||
model: "azure/gpt-5.1", // Azure deployment name
|
||||
configuration: {
|
||||
baseURL: "http://localhost:8080/langchain",
|
||||
defaultHeaders: {
|
||||
"authorization": "Bearer your-azure-api-key",
|
||||
"x-bf-azure-endpoint": "https://your-resource.openai.azure.com"
|
||||
}
|
||||
},
|
||||
openAIApiKey: "dummy-key",
|
||||
reasoning: {
|
||||
effort: "high",
|
||||
summary: "detailed"
|
||||
}
|
||||
});
|
||||
|
||||
const response = await llm.invoke("Solve this complex problem...");
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### OpenAI Models
|
||||
|
||||
For OpenAI reasoning models, use `ChatOpenAI` with the `reasoning` parameter:
|
||||
|
||||
<Tabs group="langchain-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
from langchain_openai import ChatOpenAI
|
||||
from langchain_core.messages import HumanMessage
|
||||
|
||||
# OpenAI with reasoning control
|
||||
llm = ChatOpenAI(
|
||||
model="gpt-5",
|
||||
base_url="http://localhost:8080/langchain",
|
||||
api_key="dummy-key",
|
||||
max_tokens=2000,
|
||||
reasoning={
|
||||
"effort": "high",
|
||||
"summary": "detailed"
|
||||
}
|
||||
)
|
||||
|
||||
response = llm.invoke([HumanMessage(content="Solve this complex problem...")])
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import { ChatOpenAI } from "@langchain/openai";
|
||||
|
||||
const llm = new ChatOpenAI({
|
||||
model: "gpt-5",
|
||||
configuration: {
|
||||
baseURL: "http://localhost:8080/langchain"
|
||||
},
|
||||
openAIApiKey: "dummy-key",
|
||||
reasoning: {
|
||||
effort: "high",
|
||||
summary: "detailed"
|
||||
}
|
||||
});
|
||||
|
||||
const response = await llm.invoke("Solve this complex problem...");
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### Bedrock Models (Anthropic & Nova)
|
||||
|
||||
Both Anthropic Claude and Amazon Nova models support reasoning/thinking capabilities via Bedrock. Use `ChatBedrockConverse` with model-specific configuration formats.
|
||||
|
||||
<Tabs group="langchain-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
#### Anthropic Claude Models
|
||||
|
||||
```python
|
||||
from langchain_aws import ChatBedrockConverse
|
||||
from langchain_core.messages import HumanMessage
|
||||
|
||||
# Bedrock Claude with reasoning control
|
||||
llm = ChatBedrockConverse(
|
||||
model="us.anthropic.claude-opus-4-5-20251101-v1:0",
|
||||
region_name="dummy-region",
|
||||
endpoint_url="http://localhost:8080/langchain",
|
||||
aws_access_key_id="dummy-access-key",
|
||||
aws_secret_access_key="dummy-secret-key",
|
||||
max_tokens=2000,
|
||||
additional_model_request_fields={ # Anthropic format
|
||||
"reasoning_config": {
|
||||
"type": "enabled",
|
||||
"budget_tokens": 1500, # Control thinking token budget
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
response = llm.invoke([HumanMessage(content="Reason through this problem...")])
|
||||
```
|
||||
|
||||
#### Amazon Nova Models
|
||||
|
||||
```python
|
||||
from langchain_aws import ChatBedrockConverse
|
||||
from langchain_core.messages import HumanMessage
|
||||
|
||||
# Bedrock Nova with reasoning control
|
||||
llm = ChatBedrockConverse(
|
||||
model="global.amazon.nova-2-lite-v1:0",
|
||||
region_name="dummy-region",
|
||||
endpoint_url="http://localhost:8080/langchain",
|
||||
aws_access_key_id="dummy-access-key",
|
||||
aws_secret_access_key="dummy-secret-key",
|
||||
max_tokens=2000,
|
||||
additional_model_request_fields={ # Nova format
|
||||
"reasoningConfig": {
|
||||
"type": "enabled",
|
||||
"maxReasoningEffort": "high", # "low" | "medium" | "high"
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
response = llm.invoke([HumanMessage(content="Reason through this problem...")])
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
<Note>
|
||||
**Model-Specific Configuration:**
|
||||
- **Anthropic Claude models** use `reasoning_config` (snake_case) with `budget_tokens` to control the token budget for reasoning
|
||||
- **Amazon Nova models** use `reasoningConfig` (camelCase) with `maxReasoningEffort` to control reasoning intensity ("low", "medium", "high")
|
||||
</Note>
|
||||
|
||||
### Google/Vertex AI Models
|
||||
|
||||
For Google Gemini 2.5 models (Pro, Flash) and Gemini 3, use `ChatGoogleGenerativeAI` with the `thinking_budget` parameter:
|
||||
|
||||
<Tabs group="langchain-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
from langchain_google_genai import ChatGoogleGenerativeAI
|
||||
from langchain_core.messages import HumanMessage
|
||||
|
||||
# Gemini with thinking budget control
|
||||
llm = ChatGoogleGenerativeAI(
|
||||
model="gemini/gemini-2.5-flash", # or "vertex/gemini-2.5-flash"
|
||||
base_url="http://localhost:8080/langchain",
|
||||
api_key="dummy-key",
|
||||
max_tokens=4000,
|
||||
thinking_budget=1024, # 0=disable, -1=dynamic, >0=constrained token budget
|
||||
include_thoughts=True, # Include reasoning in response
|
||||
)
|
||||
|
||||
response = llm.invoke([HumanMessage(content="Reason through this problem...")])
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
<Warning>
|
||||
**Experimental Module:** `ChatGoogleGenerativeAI` is a recently released module that deprecates `ChatVertexAI`. It may have some issues or breaking changes. If you encounter problems, you can use `ChatAnthropic` with `model="gemini/..."` or `model="vertex/..."` as an alternative, which provides stable access to Gemini and Vertex AI models through Bifrost.
|
||||
</Warning>
|
||||
|
||||
---
|
||||
|
||||
## Embeddings
|
||||
|
||||
LangChain's `OpenAIEmbeddings` class can be used to generate embeddings through Bifrost:
|
||||
|
||||
```python
|
||||
from langchain_openai import OpenAIEmbeddings
|
||||
|
||||
# Create embeddings instance
|
||||
embeddings = OpenAIEmbeddings(
|
||||
model="text-embedding-3-small",
|
||||
base_url="http://localhost:8080/langchain",
|
||||
api_key="dummy-key"
|
||||
)
|
||||
|
||||
# Embed a single query
|
||||
query_embedding = embeddings.embed_query("What is machine learning?")
|
||||
|
||||
# Embed multiple documents
|
||||
doc_embeddings = embeddings.embed_documents([
|
||||
"Machine learning is a subset of AI",
|
||||
"Deep learning uses neural networks",
|
||||
"NLP helps computers understand text"
|
||||
])
|
||||
```
|
||||
|
||||
<Warning>
|
||||
**Provider Compatibility Limitation:** LangChain's `OpenAIEmbeddings` class converts text to int array before sending to the API. While OpenAI's API supports both text strings and int arrays as input, other providers like Cohere, Bedrock, and Gemini only accept text strings.
|
||||
|
||||
**This means `OpenAIEmbeddings` only works reliably with OpenAI embedding models.** Using it with other providers (e.g., `model="cohere/embed-v4.0"`) will fail because those providers cannot process int array inputs.
|
||||
</Warning>
|
||||
|
||||
### Cross-Provider Embeddings
|
||||
|
||||
For embedding models from other providers (Cohere, Bedrock, Gemini, etc.), you can use `GoogleGenerativeAIEmbeddings` from the `langchain_google_genai` package. This module sends text strings directly and works across multiple providers:
|
||||
|
||||
```python
|
||||
from langchain_google_genai import GoogleGenerativeAIEmbeddings
|
||||
|
||||
# Works with any provider's embedding models
|
||||
embeddings = GoogleGenerativeAIEmbeddings(
|
||||
model="cohere/cohere-embed-v4.0", # or bedrock/..., gemini/..., etc.
|
||||
base_url="http://localhost:8080/langchain",
|
||||
api_key="dummy-key"
|
||||
)
|
||||
|
||||
query_embedding = embeddings.embed_query("What is machine learning?")
|
||||
doc_embeddings = embeddings.embed_documents([
|
||||
"Machine learning is a subset of AI",
|
||||
"Deep learning uses neural networks"
|
||||
])
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Supported Features
|
||||
|
||||
The Langchain integration supports all features that are available in both the Langchain SDK and Bifrost core functionality. Your existing Langchain chains and workflows work seamlessly with Bifrost's enterprise features. 😄
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[Governance Features](../features/governance)** - Virtual keys and team management
|
||||
- **[Semantic Caching](../features/semantic-caching)** - Intelligent response caching
|
||||
- **[Configuration](../quickstart/README)** - Provider setup and API key management
|
||||
180
docs/integrations/litellm-sdk.mdx
Normal file
180
docs/integrations/litellm-sdk.mdx
Normal file
@@ -0,0 +1,180 @@
|
||||
---
|
||||
title: "LiteLLM SDK"
|
||||
description: "Use Bifrost as a drop-in proxy for LiteLLM applications with zero code changes."
|
||||
icon: "train"
|
||||
---
|
||||
|
||||
Since LiteLLM already provides multi-provider abstraction, Bifrost adds enterprise features like governance, semantic caching, MCP tools, observability, etc, on top of your existing setup.
|
||||
|
||||
**Endpoint:** `/litellm`
|
||||
|
||||
<Warning>
|
||||
**Provider Compatibility:** This integration only works for AI providers that both LiteLLM and Bifrost support. If you're using a provider specific to LiteLLM that Bifrost doesn't support (or vice versa), those requests will fail.
|
||||
</Warning>
|
||||
---
|
||||
|
||||
## Setup
|
||||
|
||||
<Tabs group="litellm-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python {7}
|
||||
from litellm import completion
|
||||
|
||||
# Configure client to use Bifrost
|
||||
response = completion(
|
||||
model="gpt-4o-mini",
|
||||
messages=[{"role": "user", "content": "Hello!"}],
|
||||
base_url="http://localhost:8080/litellm" # Point to Bifrost
|
||||
)
|
||||
|
||||
print(response.choices[0].message.content)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Provider/Model Usage Examples
|
||||
|
||||
Your existing LiteLLM provider switching works unchanged through Bifrost:
|
||||
|
||||
<Tabs group="litellm-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python {4}
|
||||
from litellm import completion
|
||||
|
||||
# All your existing LiteLLM patterns work the same
|
||||
base_url = "http://localhost:8080/litellm"
|
||||
|
||||
# OpenAI models
|
||||
openai_response = completion(
|
||||
model="gpt-4o-mini",
|
||||
messages=[{"role": "user", "content": "Hello GPT!"}],
|
||||
base_url=base_url
|
||||
)
|
||||
|
||||
# Anthropic models
|
||||
anthropic_response = completion(
|
||||
model="claude-3-sonnet-20240229",
|
||||
messages=[{"role": "user", "content": "Hello Claude!"}],
|
||||
base_url=base_url
|
||||
)
|
||||
|
||||
# Google models
|
||||
google_response = completion(
|
||||
model="gemini/gemini-1.5-flash",
|
||||
messages=[{"role": "user", "content": "Hello Gemini!"}],
|
||||
base_url=base_url
|
||||
)
|
||||
|
||||
# Azure models
|
||||
azure_response = completion(
|
||||
model="azure/gpt-4o",
|
||||
messages=[{"role": "user", "content": "Hello Azure!"}],
|
||||
base_url=base_url
|
||||
)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Adding Custom Headers
|
||||
|
||||
Add Bifrost-specific headers for governance and tracking:
|
||||
|
||||
<Tabs group="litellm-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
from litellm import completion
|
||||
|
||||
# Add custom headers for Bifrost features
|
||||
response = completion(
|
||||
model="gpt-4o-mini",
|
||||
messages=[{"role": "user", "content": "Hello!"}],
|
||||
base_url="http://localhost:8080/litellm",
|
||||
extra_headers={
|
||||
"x-bf-vk": "your-virtual-key", # Virtual key for governance
|
||||
}
|
||||
)
|
||||
|
||||
print(response.choices[0].message.content)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Using Direct Keys
|
||||
|
||||
Pass API keys directly to bypass Bifrost's key management. You can pass any provider's API key since Bifrost only looks for `Authorization` or `x-api-key` headers. This requires the **Allow Direct API keys** option to be enabled in Bifrost configuration.
|
||||
|
||||
> **Learn more:** See [Key Management](../features/keys-management#direct-key-bypass) for enabling direct API key usage.
|
||||
|
||||
<Tabs group="litellm-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
from litellm import completion
|
||||
|
||||
# Using OpenAI key directly
|
||||
openai_response = completion(
|
||||
model="gpt-4o-mini",
|
||||
messages=[{"role": "user", "content": "Hello GPT!"}],
|
||||
base_url="http://localhost:8080/litellm",
|
||||
extra_headers={
|
||||
"Authorization": "Bearer sk-your-openai-key"
|
||||
}
|
||||
)
|
||||
|
||||
# Using Anthropic key for Claude models
|
||||
anthropic_response = completion(
|
||||
model="claude-3-sonnet-20240229",
|
||||
messages=[{"role": "user", "content": "Hello Claude!"}],
|
||||
base_url="http://localhost:8080/litellm",
|
||||
extra_headers={
|
||||
"x-api-key": "sk-ant-your-anthropic-key"
|
||||
}
|
||||
)
|
||||
|
||||
# Using Azure with direct Azure key
|
||||
import os
|
||||
|
||||
deployment = os.getenv("AZURE_OPENAI_DEPLOYMENT", "my-azure-deployment")
|
||||
model = f"azure/{deployment}"
|
||||
|
||||
azure_response = completion(
|
||||
model=model,
|
||||
messages=[{"role": "user", "content": "Hello from LiteLLM (Azure demo)!"}],
|
||||
base_url="http://localhost:8080/litellm",
|
||||
api_key=os.getenv("AZURE_API_KEY", "your-azure-api-key"),
|
||||
deployment_id=os.getenv("AZURE_OPENAI_DEPLOYMENT", "gpt-4o-aug"),
|
||||
max_tokens=100,
|
||||
extra_headers={
|
||||
"x-bf-azure-endpoint": "https://your-resource.openai.azure.com",
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Supported Features
|
||||
|
||||
The LiteLLM integration supports all features that are available in both the LiteLLM SDK and Bifrost core functionality. Your existing LiteLLM code works seamlessly with Bifrost's enterprise features. 😄
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[Governance Features](../features/governance)** - Virtual keys and team management
|
||||
- **[Semantic Caching](../features/semantic-caching)** - Intelligent response caching
|
||||
- **[Configuration](../quickstart/README)** - Provider setup and API key management
|
||||
669
docs/integrations/openai-sdk/files-and-batch.mdx
Normal file
669
docs/integrations/openai-sdk/files-and-batch.mdx
Normal file
@@ -0,0 +1,669 @@
|
||||
---
|
||||
title: "Files and Batch API"
|
||||
description: "Upload files and create batch jobs for asynchronous processing using the OpenAI SDK through Bifrost across multiple providers."
|
||||
tag: "Beta"
|
||||
icon: "folder-open"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Bifrost supports the OpenAI Files API and Batch API with **cross-provider routing**. This means you can use the familiar OpenAI SDK to manage files and batch jobs across multiple providers including OpenAI, Anthropic, Bedrock, and Gemini.
|
||||
|
||||
The provider is specified using `extra_body` (for POST requests) or `extra_query` (for GET requests) parameters.
|
||||
|
||||
---
|
||||
|
||||
## Client Setup
|
||||
|
||||
The base client setup is the same for all providers. The provider is specified per-request:
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-api-key" # Your actual API key
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Files API
|
||||
|
||||
### Upload a File
|
||||
|
||||
<Note>
|
||||
**Bedrock** requires S3 storage configuration. OpenAI and Gemini use their native file storage. Anthropic uses inline requests (no file upload).
|
||||
</Note>
|
||||
|
||||
<Tabs group="provider">
|
||||
<Tab title="OpenAI Provider">
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-openai-api-key"
|
||||
)
|
||||
|
||||
# Create JSONL content for OpenAI batch format
|
||||
jsonl_content = '''{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 100}}
|
||||
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "How are you?"}], "max_tokens": 100}}'''
|
||||
|
||||
# Upload file (uses OpenAI's native file storage)
|
||||
response = client.files.create(
|
||||
file=("batch_input.jsonl", jsonl_content.encode(), "application/jsonl"),
|
||||
purpose="batch",
|
||||
extra_body={"provider": "openai"},
|
||||
)
|
||||
|
||||
print(f"Uploaded file ID: {response.id}")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Bedrock Provider">
|
||||
|
||||
For Bedrock, you need to provide S3 storage configuration:
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-api-key"
|
||||
)
|
||||
|
||||
# Create JSONL content using OpenAI-style format (Bifrost converts to Bedrock format internally)
|
||||
jsonl_content = '''{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic.claude-3-sonnet-20240229-v1:0", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 100}}
|
||||
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic.claude-3-sonnet-20240229-v1:0", "messages": [{"role": "user", "content": "How are you?"}], "max_tokens": 100}}'''
|
||||
|
||||
# Upload file with S3 storage configuration
|
||||
response = client.files.create(
|
||||
file=("batch_input.jsonl", jsonl_content.encode(), "application/jsonl"),
|
||||
purpose="batch",
|
||||
extra_body={
|
||||
"provider": "bedrock",
|
||||
"storage_config": {
|
||||
"s3": {
|
||||
"bucket": "your-s3-bucket",
|
||||
"region": "us-west-2",
|
||||
"prefix": "bifrost-batch-output",
|
||||
},
|
||||
},
|
||||
},
|
||||
)
|
||||
|
||||
print(f"Uploaded file ID: {response.id}")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Anthropic Provider">
|
||||
|
||||
Anthropic uses inline requests for batching (no file upload needed). See the Batch API section below.
|
||||
|
||||
</Tab>
|
||||
<Tab title="Gemini Provider">
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-api-key"
|
||||
)
|
||||
|
||||
# Create JSONL content using OpenAI-style format (Bifrost converts to Gemini format internally)
|
||||
jsonl_content = '''{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 100}}
|
||||
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash", "messages": [{"role": "user", "content": "How are you?"}], "max_tokens": 100}}'''
|
||||
|
||||
# Upload file (uses Gemini's native file storage)
|
||||
response = client.files.create(
|
||||
file=("batch_input.jsonl", jsonl_content.encode(), "application/jsonl"),
|
||||
purpose="batch",
|
||||
extra_body={"provider": "gemini"},
|
||||
)
|
||||
|
||||
print(f"Uploaded file ID: {response.id}")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### List Files
|
||||
|
||||
```python
|
||||
# List files for OpenAI or Gemini (no S3 config needed)
|
||||
response = client.files.list(
|
||||
extra_query={"provider": "openai"} # or "gemini"
|
||||
)
|
||||
|
||||
for file in response.data:
|
||||
print(f"File ID: {file.id}, Name: {file.filename}")
|
||||
|
||||
# For Bedrock (requires S3 config)
|
||||
response = client.files.list(
|
||||
extra_query={
|
||||
"provider": "bedrock",
|
||||
"storage_config": {
|
||||
"s3": {
|
||||
"bucket": "your-s3-bucket",
|
||||
"region": "us-west-2",
|
||||
"prefix": "bifrost-batch-output",
|
||||
},
|
||||
},
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
### Retrieve File Metadata
|
||||
|
||||
```python
|
||||
# Retrieve file metadata (specify provider)
|
||||
file_id = "file-abc123"
|
||||
response = client.files.retrieve(
|
||||
file_id,
|
||||
extra_query={"provider": "bedrock"} # or "openai", "gemini"
|
||||
)
|
||||
|
||||
print(f"File ID: {response.id}")
|
||||
print(f"Filename: {response.filename}")
|
||||
print(f"Purpose: {response.purpose}")
|
||||
print(f"Bytes: {response.bytes}")
|
||||
```
|
||||
|
||||
### Delete a File
|
||||
|
||||
```python
|
||||
# Delete file (specify provider)
|
||||
file_id = "file-abc123"
|
||||
response = client.files.delete(
|
||||
file_id,
|
||||
extra_query={"provider": "bedrock"} # or "openai", "gemini"
|
||||
)
|
||||
|
||||
print(f"Deleted: {response.deleted}")
|
||||
```
|
||||
|
||||
### Download File Content
|
||||
|
||||
```python
|
||||
# Download file content (specify provider)
|
||||
file_id = "file-abc123"
|
||||
response = client.files.content(
|
||||
file_id,
|
||||
extra_query={"provider": "bedrock"} # or "openai", "gemini"
|
||||
)
|
||||
|
||||
# Handle different response types
|
||||
if hasattr(response, "read"):
|
||||
content = response.read()
|
||||
elif hasattr(response, "content"):
|
||||
content = response.content
|
||||
else:
|
||||
content = response
|
||||
|
||||
# Decode bytes to string if needed
|
||||
if isinstance(content, bytes):
|
||||
content = content.decode("utf-8")
|
||||
|
||||
print(f"File content:\n{content}")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Batch API
|
||||
|
||||
### Create a Batch
|
||||
|
||||
<Tabs group="provider">
|
||||
<Tab title="OpenAI Provider">
|
||||
|
||||
For native OpenAI batching:
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-openai-api-key"
|
||||
)
|
||||
|
||||
# First upload a file (see Files API section)
|
||||
# Then create batch using the file ID
|
||||
|
||||
batch = client.batches.create(
|
||||
input_file_id="file-abc123",
|
||||
endpoint="/v1/chat/completions",
|
||||
completion_window="24h",
|
||||
extra_body={"provider": "openai"},
|
||||
)
|
||||
|
||||
print(f"Batch ID: {batch.id}")
|
||||
print(f"Status: {batch.status}")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Bedrock Provider">
|
||||
|
||||
For Bedrock, you need to provide output S3 URI:
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-api-key"
|
||||
)
|
||||
|
||||
# First upload a file with S3 config (see Files API section)
|
||||
# Then create batch using the file ID
|
||||
|
||||
batch = client.batches.create(
|
||||
input_file_id="file-abc123",
|
||||
endpoint="/v1/chat/completions",
|
||||
completion_window="24h",
|
||||
extra_body={
|
||||
"provider": "bedrock",
|
||||
"model": "anthropic.claude-3-sonnet-20240229-v1:0",
|
||||
"output_s3_uri": "s3://your-bucket/batch-output",
|
||||
},
|
||||
)
|
||||
|
||||
print(f"Batch ID: {batch.id}")
|
||||
print(f"Status: {batch.status}")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Anthropic Provider">
|
||||
|
||||
Anthropic supports inline requests (no file upload required):
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-anthropic-api-key"
|
||||
)
|
||||
|
||||
# Create inline requests for Anthropic
|
||||
requests = [
|
||||
{
|
||||
"custom_id": "request-1",
|
||||
"params": {
|
||||
"model": "claude-3-sonnet-20240229",
|
||||
"max_tokens": 100,
|
||||
"messages": [{"role": "user", "content": "Hello!"}]
|
||||
}
|
||||
},
|
||||
{
|
||||
"custom_id": "request-2",
|
||||
"params": {
|
||||
"model": "claude-3-sonnet-20240229",
|
||||
"max_tokens": 100,
|
||||
"messages": [{"role": "user", "content": "How are you?"}]
|
||||
}
|
||||
}
|
||||
]
|
||||
|
||||
# Create batch with inline requests (no file ID needed)
|
||||
batch = client.batches.create(
|
||||
input_file_id="", # Empty for inline requests
|
||||
endpoint="/v1/chat/completions",
|
||||
completion_window="24h",
|
||||
extra_body={
|
||||
"provider": "anthropic",
|
||||
"requests": requests,
|
||||
},
|
||||
)
|
||||
|
||||
print(f"Batch ID: {batch.id}")
|
||||
print(f"Status: {batch.status}")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Gemini Provider">
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-api-key"
|
||||
)
|
||||
|
||||
# First upload a file with Gemini format (see Files API section)
|
||||
# Then create batch using the file ID
|
||||
|
||||
batch = client.batches.create(
|
||||
input_file_id="file-abc123",
|
||||
endpoint="/v1/chat/completions",
|
||||
completion_window="24h",
|
||||
extra_body={
|
||||
"provider": "gemini",
|
||||
"model": "gemini-1.5-flash",
|
||||
},
|
||||
)
|
||||
|
||||
print(f"Batch ID: {batch.id}")
|
||||
print(f"Status: {batch.status}")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### List Batches
|
||||
|
||||
```python
|
||||
# List batches (specify provider)
|
||||
response = client.batches.list(
|
||||
limit=10,
|
||||
extra_query={
|
||||
"provider": "bedrock", # or "openai", "anthropic", "gemini"
|
||||
"model": "anthropic.claude-3-sonnet-20240229-v1:0", # Required for bedrock
|
||||
}
|
||||
)
|
||||
|
||||
for batch in response.data:
|
||||
print(f"Batch ID: {batch.id}, Status: {batch.status}")
|
||||
```
|
||||
|
||||
### Retrieve Batch Status
|
||||
|
||||
```python
|
||||
# Retrieve batch status (specify provider)
|
||||
batch_id = "batch-abc123"
|
||||
batch = client.batches.retrieve(
|
||||
batch_id,
|
||||
extra_query={"provider": "bedrock"} # or "openai", "anthropic", "gemini"
|
||||
)
|
||||
|
||||
print(f"Batch ID: {batch.id}")
|
||||
print(f"Status: {batch.status}")
|
||||
|
||||
if batch.request_counts:
|
||||
print(f"Total: {batch.request_counts.total}")
|
||||
print(f"Completed: {batch.request_counts.completed}")
|
||||
print(f"Failed: {batch.request_counts.failed}")
|
||||
```
|
||||
|
||||
### Cancel a Batch
|
||||
|
||||
```python
|
||||
# Cancel batch (specify provider)
|
||||
batch_id = "batch-abc123"
|
||||
batch = client.batches.cancel(
|
||||
batch_id,
|
||||
extra_body={"provider": "bedrock"} # or "openai", "anthropic", "gemini"
|
||||
)
|
||||
|
||||
print(f"Batch ID: {batch.id}")
|
||||
print(f"Status: {batch.status}") # "cancelling" or "cancelled"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## End-to-End Workflows
|
||||
|
||||
### OpenAI Batch Workflow
|
||||
|
||||
```python
|
||||
import time
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-openai-api-key"
|
||||
)
|
||||
|
||||
# Configuration
|
||||
provider = "openai"
|
||||
|
||||
# Step 1: Create OpenAI JSONL content
|
||||
jsonl_content = '''{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "What is 2+2?"}], "max_tokens": 100}}
|
||||
{"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 100}}'''
|
||||
|
||||
# Step 2: Upload file (uses OpenAI's native file storage)
|
||||
print("Step 1: Uploading batch input file...")
|
||||
uploaded_file = client.files.create(
|
||||
file=("batch_e2e.jsonl", jsonl_content.encode(), "application/jsonl"),
|
||||
purpose="batch",
|
||||
extra_body={"provider": provider},
|
||||
)
|
||||
print(f" Uploaded file: {uploaded_file.id}")
|
||||
|
||||
# Step 3: Create batch
|
||||
print("Step 2: Creating batch job...")
|
||||
batch = client.batches.create(
|
||||
input_file_id=uploaded_file.id,
|
||||
endpoint="/v1/chat/completions",
|
||||
completion_window="24h",
|
||||
extra_body={"provider": provider},
|
||||
)
|
||||
print(f" Created batch: {batch.id}, status: {batch.status}")
|
||||
|
||||
# Step 4: Poll for completion
|
||||
print("Step 3: Polling batch status...")
|
||||
for i in range(10):
|
||||
batch = client.batches.retrieve(batch.id, extra_query={"provider": provider})
|
||||
print(f" Poll {i+1}: status = {batch.status}")
|
||||
|
||||
if batch.status in ["completed", "failed", "expired", "cancelled"]:
|
||||
break
|
||||
|
||||
if batch.request_counts:
|
||||
print(f" Completed: {batch.request_counts.completed}/{batch.request_counts.total}")
|
||||
|
||||
time.sleep(5)
|
||||
|
||||
print(f"\nSuccess! Batch {batch.id} workflow completed.")
|
||||
```
|
||||
|
||||
### Bedrock Batch Workflow
|
||||
|
||||
```python
|
||||
import time
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-api-key"
|
||||
)
|
||||
|
||||
# Configuration
|
||||
provider = "bedrock"
|
||||
s3_bucket = "your-s3-bucket"
|
||||
s3_region = "us-west-2"
|
||||
model = "anthropic.claude-3-sonnet-20240229-v1:0"
|
||||
|
||||
# Step 1: Create JSONL content using OpenAI-style format (Bifrost converts to Bedrock format internally)
|
||||
jsonl_content = '''{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic.claude-3-sonnet-20240229-v1:0", "messages": [{"role": "user", "content": "What is 2+2?"}], "max_tokens": 100}}
|
||||
{"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic.claude-3-sonnet-20240229-v1:0", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 100}}'''
|
||||
|
||||
# Step 2: Upload file
|
||||
print("Step 1: Uploading batch input file...")
|
||||
uploaded_file = client.files.create(
|
||||
file=("batch_e2e.jsonl", jsonl_content.encode(), "application/jsonl"),
|
||||
purpose="batch",
|
||||
extra_body={
|
||||
"provider": provider,
|
||||
"storage_config": {
|
||||
"s3": {"bucket": s3_bucket, "region": s3_region, "prefix": "batch-input"},
|
||||
},
|
||||
},
|
||||
)
|
||||
print(f" Uploaded file: {uploaded_file.id}")
|
||||
|
||||
# Step 3: Create batch
|
||||
print("Step 2: Creating batch job...")
|
||||
batch = client.batches.create(
|
||||
input_file_id=uploaded_file.id,
|
||||
endpoint="/v1/chat/completions",
|
||||
completion_window="24h",
|
||||
extra_body={
|
||||
"provider": provider,
|
||||
"model": model,
|
||||
"output_s3_uri": f"s3://{s3_bucket}/batch-output",
|
||||
},
|
||||
)
|
||||
print(f" Created batch: {batch.id}, status: {batch.status}")
|
||||
|
||||
# Step 4: Poll for completion
|
||||
print("Step 3: Polling batch status...")
|
||||
for i in range(10):
|
||||
batch = client.batches.retrieve(batch.id, extra_query={"provider": provider})
|
||||
print(f" Poll {i+1}: status = {batch.status}")
|
||||
|
||||
if batch.status in ["completed", "failed", "expired", "cancelled"]:
|
||||
break
|
||||
|
||||
if batch.request_counts:
|
||||
print(f" Completed: {batch.request_counts.completed}/{batch.request_counts.total}")
|
||||
|
||||
time.sleep(5)
|
||||
|
||||
print(f"\nSuccess! Batch {batch.id} workflow completed.")
|
||||
```
|
||||
|
||||
### Anthropic Inline Batch Workflow
|
||||
|
||||
```python
|
||||
import time
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-anthropic-api-key"
|
||||
)
|
||||
|
||||
provider = "anthropic"
|
||||
|
||||
# Step 1: Create inline requests
|
||||
print("Step 1: Creating inline requests...")
|
||||
requests = [
|
||||
{
|
||||
"custom_id": "math-question",
|
||||
"params": {
|
||||
"model": "claude-3-sonnet-20240229",
|
||||
"max_tokens": 100,
|
||||
"messages": [{"role": "user", "content": "What is 15 * 7?"}]
|
||||
}
|
||||
},
|
||||
{
|
||||
"custom_id": "geography-question",
|
||||
"params": {
|
||||
"model": "claude-3-sonnet-20240229",
|
||||
"max_tokens": 100,
|
||||
"messages": [{"role": "user", "content": "What is the largest ocean?"}]
|
||||
}
|
||||
}
|
||||
]
|
||||
print(f" Created {len(requests)} inline requests")
|
||||
|
||||
# Step 2: Create batch
|
||||
print("Step 2: Creating batch job...")
|
||||
batch = client.batches.create(
|
||||
input_file_id="",
|
||||
endpoint="/v1/chat/completions",
|
||||
completion_window="24h",
|
||||
extra_body={"provider": provider, "requests": requests},
|
||||
)
|
||||
print(f" Created batch: {batch.id}, status: {batch.status}")
|
||||
|
||||
# Step 3: Poll for completion
|
||||
print("Step 3: Polling batch status...")
|
||||
for i in range(10):
|
||||
batch = client.batches.retrieve(batch.id, extra_query={"provider": provider})
|
||||
print(f" Poll {i+1}: status = {batch.status}")
|
||||
|
||||
if batch.status in ["completed", "failed", "expired", "cancelled", "ended"]:
|
||||
break
|
||||
|
||||
time.sleep(5)
|
||||
|
||||
print(f"\nSuccess! Batch {batch.id} workflow completed.")
|
||||
```
|
||||
|
||||
### Gemini Batch Workflow
|
||||
|
||||
```python
|
||||
import time
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-api-key"
|
||||
)
|
||||
|
||||
# Configuration
|
||||
provider = "gemini"
|
||||
model = "gemini-1.5-flash"
|
||||
|
||||
# Step 1: Create JSONL content using OpenAI-style format (Bifrost converts to Gemini format internally)
|
||||
jsonl_content = '''{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash", "messages": [{"role": "user", "content": "What is 2+2?"}], "max_tokens": 100}}
|
||||
{"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 100}}'''
|
||||
|
||||
# Step 2: Upload file (uses Gemini's native file storage)
|
||||
print("Step 1: Uploading batch input file...")
|
||||
uploaded_file = client.files.create(
|
||||
file=("batch_e2e.jsonl", jsonl_content.encode(), "application/jsonl"),
|
||||
purpose="batch",
|
||||
extra_body={"provider": provider},
|
||||
)
|
||||
print(f" Uploaded file: {uploaded_file.id}")
|
||||
|
||||
# Step 3: Create batch
|
||||
print("Step 2: Creating batch job...")
|
||||
batch = client.batches.create(
|
||||
input_file_id=uploaded_file.id,
|
||||
endpoint="/v1/chat/completions",
|
||||
completion_window="24h",
|
||||
extra_body={
|
||||
"provider": provider,
|
||||
"model": model,
|
||||
},
|
||||
)
|
||||
print(f" Created batch: {batch.id}, status: {batch.status}")
|
||||
|
||||
# Step 4: Poll for completion
|
||||
print("Step 3: Polling batch status...")
|
||||
for i in range(10):
|
||||
batch = client.batches.retrieve(batch.id, extra_query={"provider": provider})
|
||||
print(f" Poll {i+1}: status = {batch.status}")
|
||||
|
||||
if batch.status in ["completed", "failed", "expired", "cancelled"]:
|
||||
break
|
||||
|
||||
if batch.request_counts:
|
||||
print(f" Completed: {batch.request_counts.completed}/{batch.request_counts.total}")
|
||||
|
||||
time.sleep(5)
|
||||
|
||||
print(f"\nSuccess! Batch {batch.id} workflow completed.")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Provider-Specific Notes
|
||||
|
||||
| Provider | File Upload | Batch Creation | Extra Configuration |
|
||||
|----------|-------------|----------------|---------------------|
|
||||
| **OpenAI** | ✅ Native storage | ✅ File-based | None |
|
||||
| **Bedrock** | ✅ S3-based | ✅ File-based | `storage_config`, `output_s3_uri` |
|
||||
| **Anthropic** | ❌ Not supported | ✅ Inline requests | `requests` array in `extra_body` |
|
||||
| **Gemini** | ✅ Native storage | ✅ File-based | `model` in `extra_body` |
|
||||
|
||||
<Note>
|
||||
- **OpenAI** and **Gemini** use their native file storage - no S3 configuration needed
|
||||
- **Bedrock** requires S3 storage configuration (`storage_config`, `output_s3_uri`)
|
||||
- **Anthropic** does not support file-based batch operations - use inline requests instead
|
||||
</Note>
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[Overview](./overview)** - OpenAI SDK integration basics
|
||||
- **[Configuration](../../quickstart/gateway/provider-configuration)** - Bifrost setup and configuration
|
||||
- **[Core Features](../../features/)** - Governance, semantic caching, and more
|
||||
563
docs/integrations/openai-sdk/overview.mdx
Normal file
563
docs/integrations/openai-sdk/overview.mdx
Normal file
@@ -0,0 +1,563 @@
|
||||
---
|
||||
title: "Overview"
|
||||
description: "Use Bifrost as a drop-in replacement for OpenAI API with full compatibility and enhanced features."
|
||||
icon: "book"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Bifrost provides complete OpenAI API compatibility through protocol adaptation. The integration handles request transformation, response normalization, and error mapping between OpenAI's API specification and Bifrost's internal processing pipeline.
|
||||
|
||||
This integration enables you to utilize Bifrost's features like governance, load balancing, semantic caching, multi-provider support, and more, all while preserving your existing OpenAI SDK-based architecture.
|
||||
|
||||
**Endpoint:** `/openai`
|
||||
|
||||
---
|
||||
|
||||
## Setup
|
||||
|
||||
<Tabs group="openai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python {5}
|
||||
import openai
|
||||
|
||||
# Configure client to use Bifrost
|
||||
client = openai.OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="dummy-key" # Keys handled by Bifrost
|
||||
)
|
||||
|
||||
# Make requests as usual
|
||||
response = client.chat.completions.create(
|
||||
model="gpt-4o-mini",
|
||||
messages=[{"role": "user", "content": "Hello!"}]
|
||||
)
|
||||
|
||||
print(response.choices[0].message.content)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript {5}
|
||||
import OpenAI from "openai";
|
||||
|
||||
// Configure client to use Bifrost
|
||||
const openai = new OpenAI({
|
||||
baseURL: "http://localhost:8080/openai",
|
||||
apiKey: "dummy-key", // Keys handled by Bifrost
|
||||
});
|
||||
|
||||
// Make requests as usual
|
||||
const response = await openai.chat.completions.create({
|
||||
model: "gpt-4o-mini",
|
||||
messages: [{ role: "user", content: "Hello!" }],
|
||||
});
|
||||
|
||||
console.log(response.choices[0].message.content);
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Provider/Model Usage Examples
|
||||
|
||||
Use multiple providers through the same OpenAI SDK format by prefixing model names with the provider:
|
||||
|
||||
<Tabs group="openai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
import openai
|
||||
|
||||
client = openai.OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="dummy-key"
|
||||
)
|
||||
|
||||
# OpenAI models (default)
|
||||
openai_response = client.chat.completions.create(
|
||||
model="gpt-4o-mini",
|
||||
messages=[{"role": "user", "content": "Hello from OpenAI!"}]
|
||||
)
|
||||
|
||||
# Anthropic models via OpenAI SDK format
|
||||
anthropic_response = client.chat.completions.create(
|
||||
model="anthropic/claude-3-sonnet-20240229",
|
||||
messages=[{"role": "user", "content": "Hello from Claude!"}]
|
||||
)
|
||||
|
||||
# Google Vertex models via OpenAI SDK format
|
||||
vertex_response = client.chat.completions.create(
|
||||
model="vertex/gemini-pro",
|
||||
messages=[{"role": "user", "content": "Hello from Gemini!"}]
|
||||
)
|
||||
|
||||
# Azure models
|
||||
azure_response = client.chat.completions.create(
|
||||
model="azure/gpt-4o",
|
||||
messages=[{"role": "user", "content": "Hello from Azure!"}]
|
||||
)
|
||||
|
||||
# Local Ollama models
|
||||
ollama_response = client.chat.completions.create(
|
||||
model="ollama/llama3.1:8b",
|
||||
messages=[{"role": "user", "content": "Hello from Ollama!"}]
|
||||
)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import OpenAI from "openai";
|
||||
|
||||
const openai = new OpenAI({
|
||||
baseURL: "http://localhost:8080/openai",
|
||||
apiKey: "dummy-key",
|
||||
});
|
||||
|
||||
// OpenAI models (default)
|
||||
const openaiResponse = await openai.chat.completions.create({
|
||||
model: "gpt-4o-mini",
|
||||
messages: [{ role: "user", content: "Hello from OpenAI!" }],
|
||||
});
|
||||
|
||||
// Anthropic models via OpenAI SDK format
|
||||
const anthropicResponse = await openai.chat.completions.create({
|
||||
model: "anthropic/claude-3-sonnet-20240229",
|
||||
messages: [{ role: "user", content: "Hello from Claude!" }],
|
||||
});
|
||||
|
||||
// Google Vertex models via OpenAI SDK format
|
||||
const vertexResponse = await openai.chat.completions.create({
|
||||
model: "vertex/gemini-pro",
|
||||
messages: [{ role: "user", content: "Hello from Gemini!" }],
|
||||
});
|
||||
|
||||
// Azure models
|
||||
const azureResponse = await openai.chat.completions.create({
|
||||
model: "azure/gpt-4o",
|
||||
messages: [{ role: "user", content: "Hello from Azure!" }],
|
||||
});
|
||||
|
||||
// Local Ollama models
|
||||
const ollamaResponse = await openai.chat.completions.create({
|
||||
model: "ollama/llama3.1:8b",
|
||||
messages: [{ role: "user", content: "Hello from Ollama!" }],
|
||||
});
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Adding Custom Headers
|
||||
|
||||
Pass custom headers required by Bifrost plugins (like governance, telemetry, etc.):
|
||||
|
||||
<Tabs group="openai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
import openai
|
||||
|
||||
client = openai.OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="dummy-key",
|
||||
default_headers={
|
||||
"x-bf-vk": "vk_12345", # Virtual key for governance
|
||||
}
|
||||
)
|
||||
|
||||
response = client.chat.completions.create(
|
||||
model="gpt-4o-mini",
|
||||
messages=[{"role": "user", "content": "Hello with custom headers!"}]
|
||||
)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import OpenAI from "openai";
|
||||
|
||||
const openai = new OpenAI({
|
||||
baseURL: "http://localhost:8080/openai",
|
||||
apiKey: "dummy-key",
|
||||
defaultHeaders: {
|
||||
"x-bf-vk": "vk_12345", // Virtual key for governance
|
||||
},
|
||||
});
|
||||
|
||||
const response = await openai.chat.completions.create({
|
||||
model: "gpt-4o-mini",
|
||||
messages: [{ role: "user", content: "Hello with custom headers!" }],
|
||||
});
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Using Direct Keys
|
||||
|
||||
Pass API keys directly in requests to bypass Bifrost's load balancing. You can pass any provider's API key (OpenAI, Anthropic, Mistral, etc.) since Bifrost only looks for `Authorization` or `x-api-key` headers. This requires the **Allow Direct API keys** option to be enabled in Bifrost configuration.
|
||||
|
||||
> **Learn more:** See [Key Management](../../features/keys-management#direct-key-bypass) for enabling direct API key usage.
|
||||
|
||||
<Tabs group="openai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
import openai
|
||||
|
||||
# Using OpenAI's API key directly
|
||||
client_with_direct_key = openai.OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="sk-your-openai-key" # OpenAI's API key works
|
||||
)
|
||||
|
||||
openai_response = client_with_direct_key.chat.completions.create(
|
||||
model="openai/gpt-4o-mini",
|
||||
messages=[{"role": "user", "content": "Hello from GPT!"}]
|
||||
)
|
||||
|
||||
# Or pass different provider keys per request
|
||||
client = openai.OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="dummy-key"
|
||||
)
|
||||
|
||||
# Use OpenAI key for GPT models
|
||||
openai_response = client.chat.completions.create(
|
||||
model="gpt-4o-mini",
|
||||
messages=[{"role": "user", "content": "Hello GPT!"}],
|
||||
extra_headers={
|
||||
"Authorization": "Bearer sk-your-openai-key"
|
||||
}
|
||||
)
|
||||
|
||||
# Use Anthropic key for Claude models
|
||||
anthropic_response = client.chat.completions.create(
|
||||
model="anthropic/claude-3-sonnet-20240229",
|
||||
messages=[{"role": "user", "content": "Hello Claude!"}],
|
||||
extra_headers={
|
||||
"x-api-key": "sk-ant-your-anthropic-key"
|
||||
}
|
||||
)
|
||||
|
||||
# Use Gemini key for Gemini models
|
||||
gemini_response = client.chat.completions.create(
|
||||
model="gemini/gemini-2.5-flash",
|
||||
messages=[{"role": "user", "content": "Hello Gemini!"}],
|
||||
extra_headers={
|
||||
"x-goog-api-key": "sk-gemini-your-gemini-key"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import OpenAI from "openai";
|
||||
|
||||
// Using OpenAI's API key directly
|
||||
const openaiWithDirectKey = new OpenAI({
|
||||
baseURL: "http://localhost:8080/openai",
|
||||
apiKey: "sk-your-openai-key", // OpenAI's API key works
|
||||
});
|
||||
|
||||
const openaiResponse = await openaiWithDirectKey.chat.completions.create({
|
||||
model: "openai/gpt-4o-mini",
|
||||
messages: [{ role: "user", content: "Hello from GPT!" }],
|
||||
});
|
||||
|
||||
// Or pass different provider keys per request
|
||||
const openai = new OpenAI({
|
||||
baseURL: "http://localhost:8080/openai",
|
||||
apiKey: "dummy-key",
|
||||
});
|
||||
|
||||
// Use OpenAI key for GPT models
|
||||
const openaiResponse = await openai.chat.completions.create({
|
||||
model: "gpt-4o-mini",
|
||||
messages: [{ role: "user", content: "Hello GPT!" }],
|
||||
headers: {
|
||||
"Authorization": "Bearer sk-your-openai-key",
|
||||
},
|
||||
});
|
||||
|
||||
// Use Anthropic key for Claude models
|
||||
const anthropicResponseWithHeader = await openai.chat.completions.create({
|
||||
model: "anthropic/claude-3-sonnet-20240229",
|
||||
messages: [{ role: "user", content: "Hello Claude!" }],
|
||||
headers: {
|
||||
"x-api-key": "sk-ant-your-anthropic-key",
|
||||
},
|
||||
});
|
||||
|
||||
// Use Gemini key for Gemini models
|
||||
const geminiResponseWithHeader = await openai.chat.completions.create({
|
||||
model: "gemini/gemini-2.5-flash",
|
||||
messages: [{ role: "user", content: "Hello Gemini!" }],
|
||||
headers: {
|
||||
"x-goog-api-key": "sk-gemini-your-gemini-key",
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
For Azure, you can use the AzureOpenAI client and point it to Bifrost integration endpoint. The `x-bf-azure-endpoint` header is required to specify your Azure resource endpoint.
|
||||
|
||||
<Tabs group="openai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
from openai import AzureOpenAI
|
||||
|
||||
azure_client = AzureOpenAI(
|
||||
api_key="your-azure-api-key",
|
||||
api_version="2024-02-01",
|
||||
azure_endpoint="http://localhost:8080/openai", # Point to Bifrost
|
||||
default_headers={
|
||||
"x-bf-azure-endpoint": "https://your-resource.openai.azure.com"
|
||||
}
|
||||
)
|
||||
|
||||
azure_response = azure_client.chat.completions.create(
|
||||
model="gpt-4-deployment", # Your deployment name
|
||||
messages=[{"role": "user", "content": "Hello from Azure!"}]
|
||||
)
|
||||
|
||||
print(azure_response.choices[0].message.content)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import { AzureOpenAI } from "openai";
|
||||
|
||||
const azureClient = new AzureOpenAI({
|
||||
apiKey: "your-azure-api-key",
|
||||
apiVersion: "2024-02-01",
|
||||
baseURL: "http://localhost:8080/openai", // Point to Bifrost
|
||||
defaultHeaders: {
|
||||
"x-bf-azure-endpoint": "https://your-resource.openai.azure.com"
|
||||
}
|
||||
});
|
||||
|
||||
const azureResponse = await azureClient.chat.completions.create({
|
||||
model: "gpt-4-deployment", // Your deployment name
|
||||
messages: [{ role: "user", content: "Hello from Azure!" }],
|
||||
});
|
||||
|
||||
console.log(azureResponse.choices[0].message.content);
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Async Inference
|
||||
|
||||
Submit inference requests asynchronously and poll for results later using the `x-bf-async` header. This is useful for long-running requests where you don't want to hold a connection open. See [Async Inference](../../features/async-inference) for full details.
|
||||
|
||||
<Note>
|
||||
Async inference requires a [Logs Store](../../features/observability/default) to be configured and is not compatible with streaming.
|
||||
</Note>
|
||||
|
||||
### Chat Completions
|
||||
|
||||
<Tabs group="openai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
import openai
|
||||
import time
|
||||
|
||||
client = openai.OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="dummy-key"
|
||||
)
|
||||
|
||||
# Submit async request
|
||||
initial = client.chat.completions.create(
|
||||
model="openai/gpt-4o-mini",
|
||||
messages=[{"role": "user", "content": "Tell me a short story."}],
|
||||
extra_headers={"x-bf-async": "true"}
|
||||
)
|
||||
|
||||
# If choices are present, the request completed synchronously
|
||||
if initial.choices:
|
||||
print(initial.choices[0].message.content)
|
||||
else:
|
||||
# Poll until completed
|
||||
while True:
|
||||
time.sleep(2)
|
||||
poll = client.chat.completions.create(
|
||||
model="openai/gpt-4o-mini",
|
||||
messages=[{"role": "user", "content": "Tell me a short story."}],
|
||||
extra_headers={"x-bf-async-id": initial.id}
|
||||
)
|
||||
if poll.choices:
|
||||
print(poll.choices[0].message.content)
|
||||
break
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import OpenAI from "openai";
|
||||
|
||||
const openai = new OpenAI({
|
||||
baseURL: "http://localhost:8080/openai",
|
||||
apiKey: "dummy-key",
|
||||
});
|
||||
|
||||
// Submit async request
|
||||
const initial = await openai.chat.completions.create(
|
||||
{
|
||||
model: "openai/gpt-4o-mini",
|
||||
messages: [{ role: "user", content: "Tell me a short story." }],
|
||||
},
|
||||
{ headers: { "x-bf-async": "true" } }
|
||||
);
|
||||
|
||||
// If choices are present, the request completed synchronously
|
||||
if (initial.choices?.length > 0) {
|
||||
console.log(initial.choices[0].message.content);
|
||||
} else {
|
||||
// Poll until completed
|
||||
while (true) {
|
||||
await new Promise((r) => setTimeout(r, 2000));
|
||||
const poll = await openai.chat.completions.create(
|
||||
{
|
||||
model: "openai/gpt-4o-mini",
|
||||
messages: [{ role: "user", content: "Tell me a short story." }],
|
||||
},
|
||||
{ headers: { "x-bf-async-id": initial.id } }
|
||||
);
|
||||
if (poll.choices?.length > 0) {
|
||||
console.log(poll.choices[0].message.content);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### Responses API
|
||||
|
||||
<Tabs group="openai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
import openai
|
||||
import time
|
||||
|
||||
client = openai.OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="dummy-key"
|
||||
)
|
||||
|
||||
# Submit async request
|
||||
initial = client.responses.create(
|
||||
model="openai/gpt-4o-mini",
|
||||
input="Tell me a short story.",
|
||||
extra_headers={"x-bf-async": "true"}
|
||||
)
|
||||
|
||||
# If status is "completed", the request completed synchronously
|
||||
if initial.status == "completed":
|
||||
print(initial.output_text)
|
||||
else:
|
||||
# Poll until completed
|
||||
while True:
|
||||
time.sleep(2)
|
||||
poll = client.responses.create(
|
||||
model="openai/gpt-4o-mini",
|
||||
input="Tell me a short story.",
|
||||
extra_headers={"x-bf-async-id": initial.id}
|
||||
)
|
||||
if poll.status == "completed":
|
||||
print(poll.output_text)
|
||||
break
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import OpenAI from "openai";
|
||||
|
||||
const openai = new OpenAI({
|
||||
baseURL: "http://localhost:8080/openai",
|
||||
apiKey: "dummy-key",
|
||||
});
|
||||
|
||||
// Submit async request
|
||||
const initial = await openai.responses.create(
|
||||
{ model: "openai/gpt-4o-mini", input: "Tell me a short story." },
|
||||
{ headers: { "x-bf-async": "true" } }
|
||||
);
|
||||
|
||||
// If status is "completed", the request completed synchronously
|
||||
if (initial.status === "completed") {
|
||||
console.log(initial.output_text);
|
||||
} else {
|
||||
// Poll until completed
|
||||
while (true) {
|
||||
await new Promise((r) => setTimeout(r, 2000));
|
||||
const poll = await openai.responses.create(
|
||||
{ model: "openai/gpt-4o-mini", input: "Tell me a short story." },
|
||||
{ headers: { "x-bf-async-id": initial.id } }
|
||||
);
|
||||
if (poll.status === "completed") {
|
||||
console.log(poll.output_text);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### Async Headers
|
||||
|
||||
| Header | Description |
|
||||
|---|---|
|
||||
| `x-bf-async: true` | Submit the request as an async job. Returns immediately with a job ID. |
|
||||
| `x-bf-async-id: <job-id>` | Poll for results of a previously submitted async job. |
|
||||
| `x-bf-async-job-result-ttl: <seconds>` | Override the default result TTL (default: 3600s). |
|
||||
|
||||
---
|
||||
|
||||
## Supported Features
|
||||
|
||||
The OpenAI integration supports all features that are available in both the OpenAI SDK and Bifrost core functionality. If the OpenAI SDK supports a feature and Bifrost supports it, the integration will work seamlessly.
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[Files and Batch API](./files-and-batch)** - File uploads and batch processing
|
||||
- **[Anthropic SDK](../anthropic-sdk/overview)** - Claude integration patterns
|
||||
- **[Google GenAI SDK](../genai-sdk)** - Gemini integration patterns
|
||||
- **[Configuration](../../quickstart/README)** - Bifrost setup and configuration
|
||||
- **[Core Features](../../features/)** - Advanced Bifrost capabilities
|
||||
|
||||
298
docs/integrations/passthrough.mdx
Normal file
298
docs/integrations/passthrough.mdx
Normal file
@@ -0,0 +1,298 @@
|
||||
---
|
||||
title: "Passthrough"
|
||||
description: "Forward provider-native requests through Bifrost with full core pipeline processing, including logs and observability."
|
||||
icon: "route"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Passthrough integrations let you call provider-native API paths and payloads through Bifrost without route-level request/response conversion.
|
||||
|
||||
When you use passthrough endpoints, the request still flows through Bifrost core logic. You keep Bifrost features such as logging and observability while sending provider-native paths and bodies.
|
||||
|
||||
---
|
||||
|
||||
## Endpoints
|
||||
|
||||
- `/openai_passthrough`
|
||||
Default provider: `openai`
|
||||
- `/anthropic_passthrough`
|
||||
Default provider: `anthropic`
|
||||
- `/azure_passthrough`
|
||||
Default provider: `azure`
|
||||
- `/genai_passthrough`
|
||||
Default provider: `gemini` (with automatic Vertex detection for clients configured to use Vertex)
|
||||
|
||||
---
|
||||
|
||||
## How It Works
|
||||
|
||||
1. Send your request to a passthrough endpoint (OpenAI, Anthropic, Azure, or GenAI passthrough).
|
||||
2. The integration strips the passthrough prefix and forwards the remaining provider-native path/body.
|
||||
3. Bifrost handles provider execution through core inference and plugin pipelines.
|
||||
4. Response status, headers, and body are returned as passthrough output (for both stream and non-stream requests).
|
||||
|
||||
---
|
||||
|
||||
## Provider Selection Rules
|
||||
|
||||
### OpenAI Passthrough
|
||||
|
||||
- Uses `openai` as the default provider.
|
||||
|
||||
### Anthropic Passthrough
|
||||
|
||||
- Uses `anthropic` as the default provider.
|
||||
|
||||
### Azure Passthrough
|
||||
|
||||
- Uses `azure` as the default provider.
|
||||
- Requires an Azure key with `endpoint` configured. `api-version` is injected automatically:
|
||||
- **Key config `api_version`** takes priority (consistent with how auth is handled).
|
||||
- Falls back to any `api-version` the client supplied in the query string.
|
||||
|
||||
### GenAI Passthrough
|
||||
|
||||
- Uses `gemini` by default.
|
||||
- Automatically switches to `vertex` when Vertex patterns are detected, such as:
|
||||
- URL path containing `/projects/{PROJECT_ID}/locations/{LOCATION}/`
|
||||
- Request body `model` containing a Vertex resource path
|
||||
- OAuth token pattern typically used for Vertex (`Bearer ya29...`)
|
||||
|
||||
---
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### OpenAI Passthrough
|
||||
|
||||
<Tabs group="openai-passthrough">
|
||||
<Tab title="Python SDK">
|
||||
|
||||
```python
|
||||
import openai
|
||||
|
||||
client = openai.OpenAI(
|
||||
base_url="http://localhost:8080/openai_passthrough/v1",
|
||||
api_key="dummy-key"
|
||||
)
|
||||
|
||||
response = client.chat.completions.create(
|
||||
model="gpt-4o-mini",
|
||||
messages=[{"role": "user", "content": "hello from passthrough"}]
|
||||
)
|
||||
|
||||
print(response.choices[0].message.content)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="cURL">
|
||||
|
||||
```bash
|
||||
curl -X POST "http://localhost:8080/openai_passthrough/v1/chat/completions" \
|
||||
-H "content-type: application/json" \
|
||||
-H "authorization: Bearer sk-your-openai-key" \
|
||||
-d '{
|
||||
"model": "gpt-4o-mini",
|
||||
"messages": [{"role":"user","content":"hello from passthrough"}]
|
||||
}'
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### Anthropic Passthrough
|
||||
|
||||
<Tabs group="anthropic-passthrough">
|
||||
<Tab title="Python SDK">
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/anthropic_passthrough",
|
||||
api_key="dummy-key"
|
||||
)
|
||||
|
||||
response = client.messages.create(
|
||||
model="claude-sonnet-4-20250514",
|
||||
max_tokens=1024,
|
||||
messages=[{"role": "user", "content": "hello from passthrough"}]
|
||||
)
|
||||
|
||||
print(response.content[0].text)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="cURL">
|
||||
|
||||
```bash
|
||||
curl -X POST "http://localhost:8080/anthropic_passthrough/v1/messages" \
|
||||
-H "content-type: application/json" \
|
||||
-H "x-api-key: your-anthropic-key" \
|
||||
-H "anthropic-version: 2023-06-01" \
|
||||
-d '{
|
||||
"model": "claude-sonnet-4-20250514",
|
||||
"max_tokens": 1024,
|
||||
"messages": [{"role":"user","content":"hello from passthrough"}]
|
||||
}'
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### Azure Passthrough
|
||||
|
||||
<Tabs group="azure-passthrough">
|
||||
<Tab title="Azure OpenAI SDK">
|
||||
|
||||
```python
|
||||
from openai import AzureOpenAI
|
||||
|
||||
client = AzureOpenAI(
|
||||
azure_endpoint="http://localhost:8080/azure_passthrough",
|
||||
api_key="dummy-key",
|
||||
api_version="2024-10-21", # overridden by key config api_version if set
|
||||
)
|
||||
|
||||
response = client.chat.completions.create(
|
||||
model="gpt-4o", # your Azure deployment name
|
||||
messages=[{"role": "user", "content": "hello from azure passthrough"}]
|
||||
)
|
||||
|
||||
print(response.choices[0].message.content)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="OpenAI SDK">
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/azure_passthrough/openai/v1/",
|
||||
api_key="dummy-key",
|
||||
)
|
||||
|
||||
response = client.responses.create(
|
||||
model="gpt-4.1", # your Azure deployment name
|
||||
input="hello from azure passthrough",
|
||||
)
|
||||
|
||||
print(response.output_text)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Anthropic SDK (Anthropic on Azure)">
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
|
||||
client = anthropic.Anthropic(
|
||||
base_url="http://localhost:8080/azure_passthrough",
|
||||
api_key="dummy-key",
|
||||
)
|
||||
|
||||
response = client.messages.create(
|
||||
model="claude-sonnet-4-20250514",
|
||||
max_tokens=1024,
|
||||
messages=[{"role": "user", "content": "hello from azure passthrough"}]
|
||||
)
|
||||
|
||||
print(response.content[0].text)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="cURL">
|
||||
|
||||
```bash
|
||||
curl -X POST "http://localhost:8080/azure_passthrough/openai/deployments/gpt-4o/chat/completions" \
|
||||
-H "content-type: application/json" \
|
||||
-d '{
|
||||
"messages": [{"role": "user", "content": "hello from azure passthrough"}]
|
||||
}'
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### GenAI Passthrough (Gemini)
|
||||
|
||||
<Tabs group="genai-passthrough">
|
||||
<Tab title="Python SDK">
|
||||
|
||||
```python
|
||||
from google import genai
|
||||
from google.genai.types import HttpOptions
|
||||
|
||||
client = genai.Client(
|
||||
api_key="dummy-key",
|
||||
http_options=HttpOptions(base_url="http://localhost:8080/genai_passthrough")
|
||||
)
|
||||
|
||||
response = client.models.generate_content(
|
||||
model="gemini-2.5-flash",
|
||||
contents="hello from passthrough"
|
||||
)
|
||||
|
||||
print(response.text)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="cURL">
|
||||
|
||||
```bash
|
||||
curl -X POST "http://localhost:8080/genai_passthrough/v1beta/models/gemini-2.5-flash:generateContent" \
|
||||
-H "content-type: application/json" \
|
||||
-H "x-goog-api-key: your-gemini-key" \
|
||||
-d '{
|
||||
"contents":[{"parts":[{"text":"hello from passthrough"}]}]
|
||||
}'
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### GenAI Passthrough (Vertex-style request)
|
||||
|
||||
<Tabs group="vertex-passthrough">
|
||||
<Tab title="Python SDK">
|
||||
|
||||
```python
|
||||
from google import genai
|
||||
from google.genai.types import HttpOptions
|
||||
|
||||
client = genai.Client(
|
||||
vertexai=True,
|
||||
api_key="dummy-key",
|
||||
http_options=HttpOptions(base_url="http://localhost:8080/genai_passthrough")
|
||||
)
|
||||
|
||||
response = client.models.generate_content(
|
||||
model="gemini-2.5-flash",
|
||||
contents="hello from vertex passthrough"
|
||||
)
|
||||
|
||||
print(response.text)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="cURL">
|
||||
|
||||
```bash
|
||||
curl -X POST "http://localhost:8080/genai_passthrough/v1/projects/my-project/locations/us-central1/publishers/google/models/gemini-2.5-flash:generateContent" \
|
||||
-H "content-type: application/json" \
|
||||
-H "authorization: Bearer ya29.your-vertex-token" \
|
||||
-d '{
|
||||
"contents":[{"parts":[{"text":"hello from vertex passthrough"}]}]
|
||||
}'
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
- Use passthrough when you need a provider endpoint that is not directly supported by Bifrost integration routes yet.
|
||||
- For Azure passthrough, auth headers (`api-key`, `x-api-key`, OAuth token) are always sourced from the Bifrost key config and never forwarded from the client request.
|
||||
409
docs/integrations/pydanticai-sdk.mdx
Normal file
409
docs/integrations/pydanticai-sdk.mdx
Normal file
@@ -0,0 +1,409 @@
|
||||
---
|
||||
title: "Pydantic AI SDK"
|
||||
description: "Use Bifrost as a drop-in proxy for Pydantic AI agents with zero code changes."
|
||||
icon: "triangle"
|
||||
---
|
||||
|
||||
Pydantic AI is a Python agent framework that brings FastAPI-like ergonomics to GenAI development. Since Pydantic AI uses standard provider SDKs under the hood, Bifrost adds enterprise features like governance, semantic caching, MCP tools, observability, etc, on top of your existing agent setup.
|
||||
|
||||
**Endpoint:** `/pydanticai`
|
||||
|
||||
<Warning>
|
||||
**Provider Compatibility:** This integration only works for AI providers that both Pydantic AI and Bifrost support. Currently supported: OpenAI, Anthropic, and Google Gemini.
|
||||
</Warning>
|
||||
---
|
||||
|
||||
## Setup
|
||||
|
||||
<Tabs group="pydanticai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python {7-8}
|
||||
from pydantic_ai import Agent
|
||||
from pydantic_ai.models.openai import OpenAIChatModel
|
||||
from pydantic_ai.providers.openai import OpenAIProvider
|
||||
|
||||
# Configure provider to use Bifrost
|
||||
provider = OpenAIProvider(
|
||||
base_url="http://localhost:8080/pydanticai/v1", # Point to Bifrost
|
||||
api_key="dummy-key" # Keys managed by Bifrost, Or add virtual key
|
||||
)
|
||||
model = OpenAIChatModel("gpt-4o-mini", provider=provider)
|
||||
|
||||
# Create agent with Bifrost-routed model
|
||||
agent = Agent(model, instructions="Be concise and helpful.")
|
||||
|
||||
result = agent.run_sync("Hello! How are you?")
|
||||
print(result.output)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Provider/Model Usage Examples
|
||||
|
||||
Your existing Pydantic AI provider switching works unchanged through Bifrost:
|
||||
|
||||
<Tabs group="pydanticai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python {7,10,14}
|
||||
from pydantic_ai import Agent
|
||||
from pydantic_ai.models.openai import OpenAIChatModel
|
||||
from pydantic_ai.models.anthropic import AnthropicModel
|
||||
from pydantic_ai.models.google import GoogleModel
|
||||
from pydantic_ai.providers.openai import OpenAIProvider
|
||||
from pydantic_ai.providers.anthropic import AnthropicProvider
|
||||
from pydantic_ai.providers.google import GoogleProvider
|
||||
|
||||
base_url = "http://localhost:8080/pydanticai"
|
||||
|
||||
# OpenAI models via Pydantic AI
|
||||
openai_provider = OpenAIProvider(base_url=f"{base_url}/v1")
|
||||
openai_model = OpenAIChatModel("gpt-4o-mini", provider=openai_provider)
|
||||
openai_agent = Agent(openai_model)
|
||||
|
||||
# Anthropic models via Pydantic AI
|
||||
# Note: Anthropic SDK adds /v1 internally, so we don't append it here
|
||||
anthropic_provider = AnthropicProvider(base_url=base_url)
|
||||
anthropic_model = AnthropicModel("claude-3-haiku-20240307", provider=anthropic_provider)
|
||||
anthropic_agent = Agent(anthropic_model)
|
||||
|
||||
# Google Gemini models via Pydantic AI
|
||||
google_provider = GoogleProvider(base_url=base_url, api_key="dummy-key")
|
||||
google_model = GoogleModel("gemini-2.0-flash", provider=google_provider)
|
||||
google_agent = Agent(google_model)
|
||||
|
||||
# All work the same way
|
||||
openai_result = openai_agent.run_sync("Hello GPT!")
|
||||
anthropic_result = anthropic_agent.run_sync("Hello Claude!")
|
||||
gemini_result = google_agent.run_sync("Hello Gemini!")
|
||||
|
||||
print(openai_result.output)
|
||||
print(anthropic_result.output)
|
||||
print(gemini_result.output)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Tool Calling
|
||||
|
||||
Pydantic AI's powerful tool system works seamlessly through Bifrost:
|
||||
|
||||
<Tabs group="pydanticai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python {7}
|
||||
from pydantic_ai import Agent, RunContext, Tool
|
||||
from pydantic_ai.models.openai import OpenAIChatModel
|
||||
from pydantic_ai.providers.openai import OpenAIProvider
|
||||
from dataclasses import dataclass
|
||||
|
||||
# Configure Bifrost
|
||||
provider = OpenAIProvider(base_url="http://localhost:8080/pydanticai/v1")
|
||||
model = OpenAIChatModel("gpt-4o-mini", provider=provider)
|
||||
|
||||
# Define tools as functions
|
||||
def get_weather(location: str) -> str:
|
||||
"""Get the current weather for a location."""
|
||||
return f"The weather in {location} is 72°F and sunny."
|
||||
|
||||
def calculate(expression: str) -> str:
|
||||
"""Perform a mathematical calculation."""
|
||||
result = eval(expression) # Use safe evaluation in production
|
||||
return f"The result is {result}"
|
||||
|
||||
# Create agent with tools
|
||||
agent = Agent(
|
||||
model,
|
||||
tools=[get_weather, calculate],
|
||||
instructions="You can check weather and do calculations."
|
||||
)
|
||||
|
||||
result = agent.run_sync("What's the weather in Boston?")
|
||||
print(result.output)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Tools with Dependency Injection
|
||||
|
||||
Use `RunContext` to pass dependencies to your tools:
|
||||
|
||||
<Tabs group="pydanticai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python {12}
|
||||
from pydantic_ai import Agent, RunContext, Tool
|
||||
from pydantic_ai.models.openai import OpenAIChatModel
|
||||
from pydantic_ai.providers.openai import OpenAIProvider
|
||||
from dataclasses import dataclass
|
||||
|
||||
@dataclass
|
||||
class UserContext:
|
||||
user_id: int
|
||||
user_name: str
|
||||
|
||||
# Configure Bifrost
|
||||
provider = OpenAIProvider(base_url="http://localhost:8080/pydanticai/v1")
|
||||
model = OpenAIChatModel("gpt-4o-mini", provider=provider)
|
||||
|
||||
def get_user_info(ctx: RunContext[UserContext]) -> str:
|
||||
"""Get information about the current user."""
|
||||
return f"User: {ctx.deps.user_name} (ID: {ctx.deps.user_id})"
|
||||
|
||||
agent = Agent(
|
||||
model,
|
||||
deps_type=UserContext,
|
||||
tools=[Tool(get_user_info, takes_ctx=True)],
|
||||
instructions="You can look up user information."
|
||||
)
|
||||
|
||||
# Pass dependencies at runtime
|
||||
deps = UserContext(user_id=123, user_name="Alice")
|
||||
result = agent.run_sync("What is my user information?", deps=deps)
|
||||
print(result.output)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Structured Output
|
||||
|
||||
Define response types using Pydantic models:
|
||||
|
||||
<Tabs group="pydanticai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python {13}
|
||||
from pydantic import BaseModel, Field
|
||||
from pydantic_ai import Agent
|
||||
from pydantic_ai.models.openai import OpenAIChatModel
|
||||
from pydantic_ai.providers.openai import OpenAIProvider
|
||||
|
||||
# Define structured output type
|
||||
class CityInfo(BaseModel):
|
||||
city: str = Field(description="Name of the city")
|
||||
country: str = Field(description="Country where the city is located")
|
||||
population: int = Field(description="Approximate population")
|
||||
|
||||
# Configure Bifrost
|
||||
provider = OpenAIProvider(base_url="http://localhost:8080/pydanticai/v1")
|
||||
model = OpenAIChatModel("gpt-4o-mini", provider=provider)
|
||||
|
||||
# Agent with typed output
|
||||
agent = Agent(
|
||||
model,
|
||||
output_type=CityInfo,
|
||||
instructions="Extract city information from user queries."
|
||||
)
|
||||
|
||||
result = agent.run_sync("Tell me about Tokyo, Japan")
|
||||
|
||||
# result.output is typed as CityInfo
|
||||
print(f"City: {result.output.city}")
|
||||
print(f"Country: {result.output.country}")
|
||||
print(f"Population: {result.output.population}")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Streaming Responses
|
||||
|
||||
Stream responses in real-time for better UX:
|
||||
|
||||
<Tabs group="pydanticai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python {7}
|
||||
import asyncio
|
||||
from pydantic_ai import Agent
|
||||
from pydantic_ai.models.openai import OpenAIChatModel
|
||||
from pydantic_ai.providers.openai import OpenAIProvider
|
||||
|
||||
# Configure Bifrost
|
||||
provider = OpenAIProvider(base_url="http://localhost:8080/pydanticai/v1")
|
||||
model = OpenAIChatModel("gpt-4o-mini", provider=provider)
|
||||
|
||||
agent = Agent(model, instructions="Tell engaging stories.")
|
||||
|
||||
async def stream_story():
|
||||
async with agent.run_stream("Tell me a short story about a robot.") as response:
|
||||
async for chunk in response.stream_text():
|
||||
print(chunk, end="", flush=True)
|
||||
print() # Newline at end
|
||||
|
||||
asyncio.run(stream_story())
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Adding Custom Headers
|
||||
|
||||
Add Bifrost-specific headers for governance and tracking:
|
||||
|
||||
<Tabs group="pydanticai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python {15}
|
||||
from httpx import AsyncClient
|
||||
from pydantic_ai import Agent
|
||||
from pydantic_ai.models.openai import OpenAIChatModel
|
||||
from pydantic_ai.providers.openai import OpenAIProvider
|
||||
|
||||
# Create HTTP client with custom headers
|
||||
http_client = AsyncClient(
|
||||
headers={
|
||||
"x-bf-vk": "your-virtual-key", # Virtual key for governance
|
||||
}
|
||||
)
|
||||
|
||||
# Configure provider with custom client
|
||||
provider = OpenAIProvider(
|
||||
base_url="http://localhost:8080/pydanticai/v1",
|
||||
http_client=http_client
|
||||
)
|
||||
model = OpenAIChatModel("gpt-4o-mini", provider=provider)
|
||||
|
||||
agent = Agent(model)
|
||||
result = agent.run_sync("Hello!")
|
||||
print(result.output)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Using Direct Keys
|
||||
|
||||
Pass API keys directly to bypass Bifrost's key management. This requires the **Allow Direct API keys** option to be enabled in Bifrost configuration.
|
||||
|
||||
> **Learn more:** See [Key Management](../features/keys-management#direct-key-bypass) for enabling direct API key usage.
|
||||
|
||||
<Tabs group="pydanticai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python {8,15,26}
|
||||
from httpx import AsyncClient
|
||||
from pydantic_ai import Agent
|
||||
from pydantic_ai.models.openai import OpenAIChatModel
|
||||
from pydantic_ai.models.anthropic import AnthropicModel
|
||||
from pydantic_ai.providers.openai import OpenAIProvider
|
||||
from pydantic_ai.providers.anthropic import AnthropicProvider
|
||||
|
||||
base_url = "http://localhost:8080/pydanticai"
|
||||
|
||||
# Using OpenAI key directly
|
||||
openai_client = AsyncClient(
|
||||
headers={"Authorization": "Bearer sk-your-openai-key"}
|
||||
)
|
||||
openai_provider = OpenAIProvider(
|
||||
base_url=f"{base_url}/v1",
|
||||
http_client=openai_client
|
||||
)
|
||||
openai_model = OpenAIChatModel("gpt-4o-mini", provider=openai_provider)
|
||||
openai_agent = Agent(openai_model)
|
||||
|
||||
# Using Anthropic key directly
|
||||
# Note: Anthropic SDK adds /v1 internally, so we don't append it here
|
||||
anthropic_client = AsyncClient(
|
||||
headers={"x-api-key": "sk-ant-your-anthropic-key"}
|
||||
)
|
||||
anthropic_provider = AnthropicProvider(
|
||||
base_url=base_url,
|
||||
http_client=anthropic_client
|
||||
)
|
||||
anthropic_model = AnthropicModel("claude-3-haiku-20240307", provider=anthropic_provider)
|
||||
anthropic_agent = Agent(anthropic_model)
|
||||
|
||||
# Both work through Bifrost with your own keys
|
||||
openai_result = openai_agent.run_sync("Hello GPT!")
|
||||
anthropic_result = anthropic_agent.run_sync("Hello Claude!")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Multi-turn Conversations
|
||||
|
||||
Maintain conversation history across multiple turns:
|
||||
|
||||
<Tabs group="pydanticai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python {6}
|
||||
from pydantic_ai import Agent
|
||||
from pydantic_ai.models.openai import OpenAIChatModel
|
||||
from pydantic_ai.providers.openai import OpenAIProvider
|
||||
|
||||
# Configure Bifrost
|
||||
provider = OpenAIProvider(base_url="http://localhost:8080/pydanticai/v1")
|
||||
model = OpenAIChatModel("gpt-4o-mini", provider=provider)
|
||||
|
||||
agent = Agent(model, instructions="Remember context from previous messages.")
|
||||
|
||||
# First turn
|
||||
result1 = agent.run_sync("My name is Alice and I live in Paris.")
|
||||
|
||||
# Second turn - pass message history to maintain context
|
||||
result2 = agent.run_sync(
|
||||
"What is my name and where do I live?",
|
||||
message_history=result1.all_messages()
|
||||
)
|
||||
|
||||
print(result2.output) # Should mention Alice and Paris
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Supported Features
|
||||
|
||||
The Pydantic AI integration supports all features available in both the Pydantic AI SDK and Bifrost core functionality:
|
||||
|
||||
| Feature | Supported |
|
||||
|---------|-----------|
|
||||
| Chat Completions | ✅ |
|
||||
| Tool/Function Calling | ✅ |
|
||||
| Structured Output | ✅ |
|
||||
| Streaming | ✅ |
|
||||
| Multi-turn Conversations | ✅ |
|
||||
| Dependency Injection | ✅ |
|
||||
| OpenAI Models | ✅ |
|
||||
| Anthropic Models | ✅ |
|
||||
| Google Gemini Models | ✅ |
|
||||
| Embeddings | ✅ |
|
||||
| Speech/TTS | ✅ |
|
||||
| Transcription | ✅ |
|
||||
|
||||
Your existing Pydantic AI agents work seamlessly with Bifrost's enterprise features. 😄
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[Governance Features](../features/governance)** - Virtual keys and team management
|
||||
- **[Semantic Caching](../features/semantic-caching)** - Intelligent response caching
|
||||
- **[Configuration](../quickstart/README)** - Provider setup and API key management
|
||||
|
||||
97
docs/integrations/vector-databases/pinecone.mdx
Normal file
97
docs/integrations/vector-databases/pinecone.mdx
Normal file
@@ -0,0 +1,97 @@
|
||||
---
|
||||
title: "Pinecone"
|
||||
description: "Pinecone vector database integration for semantic caching in Bifrost."
|
||||
icon: "database"
|
||||
---
|
||||
|
||||
## Pinecone
|
||||
|
||||
[Pinecone](https://www.pinecone.io/) is a managed vector database service designed for machine learning applications, offering both serverless and pod-based deployment options.
|
||||
|
||||
### Key Features
|
||||
|
||||
- **Managed Service**: Fully managed with no infrastructure to maintain
|
||||
- **Serverless Option**: Pay-per-use pricing with automatic scaling
|
||||
- **High Performance**: Optimized for low-latency vector search
|
||||
- **Metadata Filtering**: Advanced filtering on vector metadata
|
||||
- **Namespaces**: Organize vectors into separate namespaces within an index
|
||||
|
||||
### Setup & Installation
|
||||
|
||||
**Pinecone Cloud:**
|
||||
- Sign up at [pinecone.io](https://www.pinecone.io/)
|
||||
- Create a new index with the desired dimensions
|
||||
- Get your API key and index host URL from the console
|
||||
|
||||
**Local Development (Pinecone Local):**
|
||||
```bash
|
||||
docker run -d \
|
||||
--name pinecone-local \
|
||||
-p 5081:5081 \
|
||||
ghcr.io/pinecone-io/pinecone-index:latest
|
||||
```
|
||||
|
||||
### Configuration Options
|
||||
|
||||
<Tabs group="pinecone-config">
|
||||
|
||||
<Tab title="Go SDK">
|
||||
|
||||
```go
|
||||
vectorConfig := &vectorstore.Config{
|
||||
Enabled: true,
|
||||
Type: vectorstore.VectorStoreTypePinecone,
|
||||
Config: vectorstore.PineconeConfig{
|
||||
APIKey: "your-pinecone-api-key",
|
||||
IndexHost: "your-index-host.svc.environment.pinecone.io",
|
||||
},
|
||||
}
|
||||
|
||||
store, err := vectorstore.NewVectorStore(context.Background(), vectorConfig, logger)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="config.json">
|
||||
|
||||
**Cloud Setup:**
|
||||
```json
|
||||
{
|
||||
"vector_store": {
|
||||
"enabled": true,
|
||||
"type": "pinecone",
|
||||
"config": {
|
||||
"api_key": "your-pinecone-api-key",
|
||||
"index_host": "your-index-host.svc.environment.pinecone.io"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Local Development:**
|
||||
```json
|
||||
{
|
||||
"vector_store": {
|
||||
"enabled": true,
|
||||
"type": "pinecone",
|
||||
"config": {
|
||||
"api_key": "pclocal",
|
||||
"index_host": "localhost:5081"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
</Tabs>
|
||||
|
||||
<Note>
|
||||
For local development with Pinecone Local, any API key value works (e.g., "pclocal"). The index host should point to localhost:5081 by default.
|
||||
</Note>
|
||||
|
||||
<Warning>
|
||||
Pinecone requires all IDs to be unique strings. Namespaces are created automatically when you first upsert vectors.
|
||||
</Warning>
|
||||
|
||||
For the VectorStore interface API and usage examples, see [Vector Store Architecture](/architecture/framework/vector-store). For semantic caching setup, see [Semantic Caching](/features/semantic-caching).
|
||||
94
docs/integrations/vector-databases/qdrant.mdx
Normal file
94
docs/integrations/vector-databases/qdrant.mdx
Normal file
@@ -0,0 +1,94 @@
|
||||
---
|
||||
title: "Qdrant"
|
||||
description: "Qdrant vector database integration for semantic caching in Bifrost."
|
||||
icon: "database"
|
||||
---
|
||||
|
||||
## Qdrant
|
||||
|
||||
[Qdrant](https://qdrant.tech/) is a high-performance vector search engine built in Rust.
|
||||
|
||||
### Setup & Installation
|
||||
|
||||
**Local Qdrant:**
|
||||
```bash
|
||||
# Using Docker
|
||||
docker run -d \
|
||||
--name qdrant \
|
||||
-p 6333:6333 \
|
||||
-p 6334:6334 \
|
||||
-v $(pwd)/qdrant_storage:/qdrant/storage \
|
||||
qdrant/qdrant:latest
|
||||
```
|
||||
|
||||
**Qdrant Cloud:**
|
||||
Sign up at [cloud.qdrant.io](https://cloud.qdrant.io)
|
||||
|
||||
### Configuration Options
|
||||
|
||||
<Tabs group="qdrant-config">
|
||||
|
||||
<Tab title="Go SDK">
|
||||
|
||||
```go
|
||||
vectorConfig := &vectorstore.Config{
|
||||
Enabled: true,
|
||||
Type: vectorstore.VectorStoreTypeQdrant,
|
||||
Config: vectorstore.QdrantConfig{
|
||||
Host: "localhost",
|
||||
Port: 6334,
|
||||
APIKey: "",
|
||||
UseTLS: false,
|
||||
},
|
||||
}
|
||||
|
||||
store, err := vectorstore.NewVectorStore(context.Background(), vectorConfig, logger)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="config.json">
|
||||
|
||||
**Local Setup:**
|
||||
```json
|
||||
{
|
||||
"vector_store": {
|
||||
"enabled": true,
|
||||
"type": "qdrant",
|
||||
"config": {
|
||||
"host": "localhost",
|
||||
"port": 6334
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Cloud Setup:**
|
||||
```json
|
||||
{
|
||||
"vector_store": {
|
||||
"enabled": true,
|
||||
"type": "qdrant",
|
||||
"config": {
|
||||
"host": "your-qdrant-cluster.cloud.qdrant.io",
|
||||
"port": 6334,
|
||||
"api_key": "your-qdrant-api-key",
|
||||
"use_tls": true
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
</Tabs>
|
||||
|
||||
<Note>
|
||||
Qdrant uses port 6334 for gRPC and port 6333 for REST. Bifrost uses the gRPC port.
|
||||
</Note>
|
||||
|
||||
<Warning>
|
||||
Qdrant requires all IDs to be valid UUIDs. Use `uuid.New().String()` to generate IDs.
|
||||
</Warning>
|
||||
|
||||
For the VectorStore interface API and usage examples, see [Vector Store Architecture](/architecture/framework/vector-store). For semantic caching setup, see [Semantic Caching](/features/semantic-caching).
|
||||
241
docs/integrations/vector-databases/redis.mdx
Normal file
241
docs/integrations/vector-databases/redis.mdx
Normal file
@@ -0,0 +1,241 @@
|
||||
---
|
||||
title: "Redis / Valkey"
|
||||
description: "Redis and Valkey vector store integration for semantic caching in Bifrost."
|
||||
icon: "database"
|
||||
---
|
||||
|
||||
## Redis
|
||||
|
||||
Redis provides high-performance in-memory vector storage using RediSearch-compatible APIs, ideal for applications requiring sub-millisecond response times and fast semantic search capabilities. Valkey deployments that expose compatible `FT.*` commands are supported through the same configuration.
|
||||
|
||||
### Key Features
|
||||
|
||||
- **High Performance**: Sub-millisecond cache retrieval with Redis's in-memory storage
|
||||
- **Cost Effective**: Open-source solution with no licensing costs
|
||||
- **HNSW Algorithm**: Fast vector similarity search with excellent recall rates
|
||||
- **Connection Pooling**: Advanced connection management for high-throughput applications
|
||||
- **TTL Support**: Automatic expiration of cached entries
|
||||
- **Streaming Support**: Full streaming response caching with proper chunk ordering
|
||||
- **Flexible Filtering**: Advanced metadata filtering with exact string matching
|
||||
|
||||
### Setup & Installation
|
||||
|
||||
**Redis Cloud:**
|
||||
- Sign up at [cloud.redis.io](https://cloud.redis.io)
|
||||
- Create a new database with RediSearch module enabled
|
||||
- Get your connection details
|
||||
|
||||
**Local Redis with RediSearch:**
|
||||
```bash
|
||||
# Using Docker with Redis Stack (includes RediSearch)
|
||||
docker run -d --name redis-stack -p 6379:6379 redis/redis-stack:latest
|
||||
```
|
||||
|
||||
**Local Valkey Bundle:**
|
||||
```bash
|
||||
# Example Valkey bundle with search/vector support
|
||||
docker run -d --name valkey-bundle -p 6379:6379 valkey/valkey-bundle:9.0.0
|
||||
```
|
||||
|
||||
### Configuration Options
|
||||
|
||||
<Tabs group="redis-config">
|
||||
|
||||
<Tab title="Go SDK">
|
||||
|
||||
```go
|
||||
// Configure Redis-compatible vector store (Redis or Valkey endpoint)
|
||||
vectorConfig := &vectorstore.Config{
|
||||
Enabled: true,
|
||||
Type: vectorstore.VectorStoreTypeRedis, // Keep type as "redis" for Valkey too
|
||||
Config: vectorstore.RedisConfig{
|
||||
Addr: "localhost:6379", // Redis/Valkey server address - REQUIRED
|
||||
Username: "", // Optional: Redis username
|
||||
Password: "", // Optional: Redis password
|
||||
DB: 0, // Optional: Redis database number (default: 0)
|
||||
|
||||
// Optional: TLS and cluster settings
|
||||
UseTLS: false, // Enable TLS for encrypted connections
|
||||
InsecureSkipVerify: false, // Skip TLS cert verification
|
||||
ClusterMode: false, // Use Redis Cluster client for cluster endpoints
|
||||
|
||||
// Optional: Connection pool settings
|
||||
PoolSize: 10, // Maximum socket connections
|
||||
MaxActiveConns: 10, // Maximum active connections
|
||||
MinIdleConns: 5, // Minimum idle connections
|
||||
MaxIdleConns: 10, // Maximum idle connections
|
||||
|
||||
// Optional: Timeout settings
|
||||
DialTimeout: 5 * time.Second, // Connection timeout
|
||||
ReadTimeout: 3 * time.Second, // Read timeout
|
||||
WriteTimeout: 3 * time.Second, // Write timeout
|
||||
ContextTimeout: 10 * time.Second, // Operation timeout
|
||||
},
|
||||
}
|
||||
|
||||
// Create vector store
|
||||
store, err := vectorstore.NewVectorStore(context.Background(), vectorConfig, logger)
|
||||
if err != nil {
|
||||
log.Fatal("Failed to create vector store:", err)
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="config.json">
|
||||
|
||||
```json
|
||||
{
|
||||
"vector_store": {
|
||||
"enabled": true,
|
||||
"type": "redis",
|
||||
"config": {
|
||||
"addr": "localhost:6379",
|
||||
"username": "",
|
||||
"password": "",
|
||||
"db": 0,
|
||||
"use_tls": false,
|
||||
"insecure_skip_verify": false,
|
||||
"ca_cert_pem": "",
|
||||
"cluster_mode": false,
|
||||
"pool_size": 10,
|
||||
"max_active_conns": 10,
|
||||
"min_idle_conns": 5,
|
||||
"max_idle_conns": 10,
|
||||
"dial_timeout": "5s",
|
||||
"read_timeout": "3s",
|
||||
"write_timeout": "3s",
|
||||
"context_timeout": "10s"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**For Redis Cloud or Valkey service endpoints:**
|
||||
```json
|
||||
{
|
||||
"vector_store": {
|
||||
"enabled": true,
|
||||
"type": "redis",
|
||||
"config": {
|
||||
"addr": "your-redis-host:port",
|
||||
"username": "your-username",
|
||||
"password": "your-password",
|
||||
"db": 0,
|
||||
"use_tls": true,
|
||||
"ca_cert_pem": "-----BEGIN CERTIFICATE-----\n...\n-----END CERTIFICATE-----",
|
||||
"cluster_mode": false,
|
||||
"context_timeout": "10s"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**For managed Redis Cluster endpoints:**
|
||||
```json
|
||||
{
|
||||
"vector_store": {
|
||||
"enabled": true,
|
||||
"type": "redis",
|
||||
"config": {
|
||||
"addr": "your-cluster-endpoint:6379",
|
||||
"username": "your-username",
|
||||
"password": "your-password",
|
||||
"db": 0,
|
||||
"use_tls": true,
|
||||
"ca_cert_pem": "-----BEGIN CERTIFICATE-----\n...\n-----END CERTIFICATE-----",
|
||||
"cluster_mode": true,
|
||||
"context_timeout": "10s"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
</Tabs>
|
||||
|
||||
### Redis-Specific Features
|
||||
|
||||
**Vector Search Algorithm:**
|
||||
Redis uses the **HNSW (Hierarchical Navigable Small World)** algorithm for vector similarity search, which provides:
|
||||
|
||||
- **Fast Search**: O(log N) search complexity
|
||||
- **High Accuracy**: Excellent recall rates for similarity search
|
||||
- **Memory Efficient**: Optimized for in-memory operations
|
||||
- **Cosine Similarity**: Uses cosine distance metric for semantic similarity
|
||||
|
||||
**Connection Pool Management:**
|
||||
Redis provides extensive connection pool configuration:
|
||||
|
||||
```go
|
||||
config := vectorstore.RedisConfig{
|
||||
Addr: "localhost:6379",
|
||||
UseTLS: true, // Enable TLS
|
||||
ClusterMode: true, // Enable cluster mode
|
||||
PoolSize: 20, // Max socket connections
|
||||
MaxActiveConns: 20, // Max active connections
|
||||
MinIdleConns: 5, // Min idle connections
|
||||
MaxIdleConns: 10, // Max idle connections
|
||||
ConnMaxLifetime: 30 * time.Minute, // Connection lifetime
|
||||
ConnMaxIdleTime: 5 * time.Minute, // Idle connection timeout
|
||||
DialTimeout: 5 * time.Second, // Connection timeout
|
||||
ReadTimeout: 3 * time.Second, // Read timeout
|
||||
WriteTimeout: 3 * time.Second, // Write timeout
|
||||
ContextTimeout: 10 * time.Second, // Operation timeout
|
||||
}
|
||||
```
|
||||
|
||||
### Performance Optimization
|
||||
|
||||
**Connection Pool Tuning:**
|
||||
For high-throughput applications, tune the connection pool settings:
|
||||
|
||||
```json
|
||||
{
|
||||
"vector_store": {
|
||||
"config": {
|
||||
"pool_size": 50, // Increase for high concurrency
|
||||
"max_active_conns": 50, // Match pool_size
|
||||
"min_idle_conns": 10, // Keep connections warm
|
||||
"max_idle_conns": 20, // Allow some idle connections
|
||||
"conn_max_lifetime": "1h", // Refresh connections periodically
|
||||
"conn_max_idle_time": "10m" // Close idle connections
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Memory Optimization:**
|
||||
- **TTL**: Use appropriate TTL values to prevent memory bloat
|
||||
- **Namespace Cleanup**: Regularly clean up unused namespaces
|
||||
|
||||
**Batch Operations:**
|
||||
Redis supports efficient batch operations:
|
||||
|
||||
```go
|
||||
// Batch retrieval
|
||||
results, err := store.GetChunks(ctx, namespace, []string{"id1", "id2", "id3"})
|
||||
|
||||
// Batch deletion
|
||||
deleteResults, err := store.DeleteAll(ctx, namespace, queries)
|
||||
```
|
||||
|
||||
### Production Considerations
|
||||
|
||||
<Info>
|
||||
**TLS and Cluster Mode**: Set `use_tls: true` to enable TLS encryption for the Redis connection, and `insecure_skip_verify: true` if using self-signed certificates. Set `cluster_mode: true` when connecting to a Redis Cluster endpoint. When cluster mode is enabled, the `db` field must be `0` (Redis Cluster does not support database selection).
|
||||
</Info>
|
||||
|
||||
<Info>
|
||||
**Search Module Required**: Redis/Valkey integration requires a search module/API that supports `FT.*` commands (index creation and vector search). If `FT.INFO` or `FT.SEARCH` is unavailable, semantic caching will not work.
|
||||
</Info>
|
||||
|
||||
<Warning>
|
||||
**Production Considerations**:
|
||||
- Use Redis AUTH for production deployments
|
||||
- Configure appropriate connection timeouts
|
||||
- Monitor memory usage and set appropriate TTL values
|
||||
</Warning>
|
||||
|
||||
For the VectorStore interface API and usage examples, see [Vector Store Architecture](/architecture/framework/vector-store). For semantic caching setup, see [Semantic Caching](/features/semantic-caching).
|
||||
146
docs/integrations/vector-databases/weaviate.mdx
Normal file
146
docs/integrations/vector-databases/weaviate.mdx
Normal file
@@ -0,0 +1,146 @@
|
||||
---
|
||||
title: "Weaviate"
|
||||
description: "Weaviate vector database integration for semantic caching in Bifrost."
|
||||
icon: "database"
|
||||
---
|
||||
|
||||
## Weaviate
|
||||
|
||||
Weaviate is a production-ready vector database solution that provides advanced querying capabilities, gRPC support for high performance, and flexible schema management for production deployments.
|
||||
|
||||
### Key Features
|
||||
|
||||
- **gRPC Support**: Enhanced performance with gRPC connections
|
||||
- **Advanced Filtering**: Complex query operations with multiple conditions
|
||||
- **Schema Management**: Flexible schema definition for different data types
|
||||
- **Cloud & Self-Hosted**: Support for both Weaviate Cloud and self-hosted deployments
|
||||
- **Scalable Storage**: Handle millions of vectors with efficient indexing
|
||||
|
||||
### Setup & Installation
|
||||
|
||||
**Weaviate Cloud:**
|
||||
- Sign up at [cloud.weaviate.io](https://cloud.weaviate.io)
|
||||
- Create a new cluster
|
||||
- Get your API key and cluster URL
|
||||
|
||||
**Local Weaviate:**
|
||||
```bash
|
||||
# Using Docker
|
||||
docker run -d \
|
||||
--name weaviate \
|
||||
-p 8080:8080 \
|
||||
-e QUERY_DEFAULTS_LIMIT=25 \
|
||||
-e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED='true' \
|
||||
-e PERSISTENCE_DATA_PATH='/var/lib/weaviate' \
|
||||
semitechnologies/weaviate:latest
|
||||
```
|
||||
|
||||
### Configuration Options
|
||||
|
||||
<Tabs group="weaviate-config">
|
||||
|
||||
<Tab title="Go SDK">
|
||||
|
||||
```go
|
||||
// Configure Weaviate vector store
|
||||
vectorConfig := &vectorstore.Config{
|
||||
Enabled: true,
|
||||
Type: vectorstore.VectorStoreTypeWeaviate,
|
||||
Config: vectorstore.WeaviateConfig{
|
||||
Scheme: "http", // "http" for local, "https" for cloud
|
||||
Host: "localhost:8080", // Your Weaviate host
|
||||
APIKey: "your-weaviate-api-key", // Required for Weaviate Cloud; optional for local/self-hosted
|
||||
|
||||
// Enable gRPC for improved performance (optional)
|
||||
GrpcConfig: &vectorstore.WeaviateGrpcConfig{
|
||||
Host: "localhost:50051", // gRPC port
|
||||
Secured: false, // true for TLS
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
// Create vector store
|
||||
store, err := vectorstore.NewVectorStore(context.Background(), vectorConfig, logger)
|
||||
if err != nil {
|
||||
log.Fatal("Failed to create vector store:", err)
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="config.json">
|
||||
|
||||
**Local Setup:**
|
||||
```json
|
||||
{
|
||||
"vector_store": {
|
||||
"enabled": true,
|
||||
"type": "weaviate",
|
||||
"config": {
|
||||
"scheme": "http",
|
||||
"host": "localhost:8080"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Cloud Setup with gRPC:**
|
||||
```json
|
||||
{
|
||||
"vector_store": {
|
||||
"enabled": true,
|
||||
"type": "weaviate",
|
||||
"config": {
|
||||
"scheme": "https",
|
||||
"host": "your-weaviate-host",
|
||||
"api_key": "your-weaviate-api-key",
|
||||
"grpc_config": {
|
||||
"host": "your-weaviate-grpc-host",
|
||||
"secured": true
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
</Tabs>
|
||||
|
||||
<Note>
|
||||
gRPC host should include the port. If no port is specified, port 80 is used for insecured connections and port 443 for secured connections.
|
||||
</Note>
|
||||
|
||||
### Advanced Features
|
||||
|
||||
**gRPC Performance Optimization:**
|
||||
Enable gRPC for better performance in production:
|
||||
|
||||
```go
|
||||
vectorConfig := &vectorstore.Config{
|
||||
Type: vectorstore.VectorStoreTypeWeaviate,
|
||||
Config: vectorstore.WeaviateConfig{
|
||||
Scheme: "https",
|
||||
Host: "your-weaviate-host",
|
||||
APIKey: "your-api-key",
|
||||
|
||||
// Enable gRPC for better performance
|
||||
GrpcConfig: &vectorstore.WeaviateGrpcConfig{
|
||||
Host: "your-weaviate-grpc-host:443",
|
||||
Secured: true,
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### Production Considerations
|
||||
|
||||
<Info>
|
||||
**Performance**: For production environments, consider using gRPC configuration for better performance and enable appropriate authentication mechanisms for your Weaviate deployment.
|
||||
</Info>
|
||||
|
||||
<Warning>
|
||||
**Authentication**: Always use API keys for Weaviate Cloud deployments and configure proper authentication for self-hosted instances in production.
|
||||
</Warning>
|
||||
|
||||
For the VectorStore interface API and usage examples, see [Vector Store Architecture](/architecture/framework/vector-store). For semantic caching setup, see [Semantic Caching](/features/semantic-caching).
|
||||
332
docs/integrations/what-is-an-integration.mdx
Normal file
332
docs/integrations/what-is-an-integration.mdx
Normal file
@@ -0,0 +1,332 @@
|
||||
---
|
||||
title: "What is an integration?"
|
||||
description: "Protocol adapters that translate between Bifrost's unified API and provider-specific API formats like OpenAI, Anthropic, and Google GenAI."
|
||||
icon: "box"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
An integration is a protocol adapter that translates between Bifrost's unified API and provider-specific API formats. Each integration handles request transformation, response normalization, and error mapping between the external API contract and Bifrost's internal processing pipeline.
|
||||
|
||||
Integrations enable you to utilize Bifrost's features like governance, MCP tools, load balancing, semantic caching, multi-provider support, and more, all while preserving your existing SDK-based architecture. Bifrost handles all the overhead of structure conversion, requiring only a single URL change to switch from direct provider APIs to Bifrost's gateway.
|
||||
|
||||
Bifrost converts the request/response format of the provider API to the Bifrost API format based on the integration used, so you don't have to.
|
||||
|
||||
---
|
||||
|
||||
## Quick Migration
|
||||
|
||||
### **Before (Direct Provider)**
|
||||
|
||||
```python
|
||||
import openai
|
||||
|
||||
client = openai.OpenAI(
|
||||
api_key="your-openai-key"
|
||||
)
|
||||
```
|
||||
|
||||
### **After (Bifrost)**
|
||||
|
||||
```python {4}
|
||||
import openai
|
||||
|
||||
client = openai.OpenAI(
|
||||
base_url="http://localhost:8080/openai", # Point to Bifrost
|
||||
api_key="dummy-key" # Keys are handled in Bifrost now
|
||||
)
|
||||
```
|
||||
|
||||
**That's it!** Your application now benefits from Bifrost's features with no other changes.
|
||||
|
||||
---
|
||||
|
||||
## Supported Integrations
|
||||
|
||||
1. [OpenAI](./openai-sdk)
|
||||
2. [Anthropic](./anthropic-sdk)
|
||||
3. [Google GenAI](./genai-sdk)
|
||||
4. [LiteLLM](./litellm-sdk)
|
||||
5. [Langchain](./langchain-sdk)
|
||||
6. [AWS Bedrock](./bedrock-sdk)
|
||||
|
||||
---
|
||||
|
||||
## Provider-Prefixed Models
|
||||
|
||||
Use multiple providers seamlessly by prefixing model names with the provider:
|
||||
|
||||
<Tabs>
|
||||
<Tab title="OpenAI">
|
||||
```python
|
||||
import openai
|
||||
|
||||
# Single client, multiple providers
|
||||
client = openai.OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="dummy" # API keys configured in Bifrost
|
||||
)
|
||||
|
||||
# OpenAI models
|
||||
response1 = client.chat.completions.create(
|
||||
model="gpt-4o-mini", # (default OpenAI since it's OpenAI's SDK)
|
||||
messages=[{"role": "user", "content": "Hello!"}]
|
||||
)
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="Anthropic">
|
||||
```python
|
||||
import openai
|
||||
|
||||
# Anthropic models using OpenAI SDK format
|
||||
response2 = client.chat.completions.create(
|
||||
model="anthropic/claude-3-sonnet-20240229",
|
||||
messages=[{"role": "user", "content": "Hello!"}]
|
||||
)
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="Azure">
|
||||
```python
|
||||
import openai
|
||||
|
||||
# Azure models
|
||||
response4 = client.chat.completions.create(
|
||||
model="azure/gpt-4o",
|
||||
messages=[{"role": "user", "content": "Hello!"}]
|
||||
)
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="Vertex">
|
||||
```python
|
||||
import openai
|
||||
|
||||
# Google Vertex models
|
||||
response3 = client.chat.completions.create(
|
||||
model="vertex/gemini-pro",
|
||||
messages=[{"role": "user", "content": "Hello!"}]
|
||||
)
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="Ollama">
|
||||
```python
|
||||
import openai
|
||||
|
||||
# Local Ollama models
|
||||
response5 = client.chat.completions.create(
|
||||
model="ollama/llama3.1:8b",
|
||||
messages=[{"role": "user", "content": "Hello!"}]
|
||||
)
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Direct API Usage
|
||||
|
||||
For custom HTTP clients or when you have existing provider-specific setup and want to use Bifrost gateway without restructuring your codebase:
|
||||
|
||||
```python {5,18,31,}
|
||||
import requests
|
||||
|
||||
# Fully OpenAI compatible endpoint
|
||||
response = requests.post(
|
||||
"http://localhost:8080/openai/v1/chat/completions",
|
||||
headers={
|
||||
"Authorization": f"Bearer {openai_key}",
|
||||
"Content-Type": "application/json"
|
||||
},
|
||||
json={
|
||||
"model": "gpt-4o-mini",
|
||||
"messages": [{"role": "user", "content": "Hello!"}]
|
||||
}
|
||||
)
|
||||
|
||||
# Fully Anthropic compatible endpoint
|
||||
response = requests.post(
|
||||
"http://localhost:8080/anthropic/v1/messages",
|
||||
headers={
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
json={
|
||||
"model": "claude-3-sonnet-20240229",
|
||||
"max_tokens": 1000,
|
||||
"messages": [{"role": "user", "content": "Hello!"}]
|
||||
}
|
||||
)
|
||||
|
||||
# Fully Google GenAI compatible endpoint
|
||||
response = requests.post(
|
||||
"http://localhost:8080/genai/v1beta/models/gemini-1.5-flash/generateContent",
|
||||
headers={
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
json={
|
||||
"contents": [
|
||||
{"parts": [{"text": "Hello!"}]}
|
||||
],
|
||||
"generation_config": {
|
||||
"max_output_tokens": 1000,
|
||||
"temperature": 1
|
||||
}
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Listing Models
|
||||
|
||||
All integrations support listing available models through their respective list models endpoints (e.g., `/openai/v1/models`, `/anthropic/v1/models`). By default, list models requests return models from **all configured providers** in Bifrost.
|
||||
|
||||
### Filtering by Provider
|
||||
|
||||
You can control which provider's models to list using the `x-bf-list-models-provider` header:
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
import openai
|
||||
|
||||
client = openai.OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="dummy-key"
|
||||
)
|
||||
|
||||
# List models from all providers (default behavior)
|
||||
all_models = client.models.list()
|
||||
|
||||
# List models from a specific provider only
|
||||
openai_models = client.models.list(
|
||||
extra_headers={
|
||||
"x-bf-list-models-provider": "openai"
|
||||
}
|
||||
)
|
||||
|
||||
anthropic_models = client.models.list(
|
||||
extra_headers={
|
||||
"x-bf-list-models-provider": "anthropic"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import OpenAI from "openai";
|
||||
|
||||
const openai = new OpenAI({
|
||||
baseURL: "http://localhost:8080/openai",
|
||||
apiKey: "dummy-key",
|
||||
});
|
||||
|
||||
// List models from all providers (default behavior)
|
||||
const allModels = await openai.models.list();
|
||||
|
||||
// List models from a specific provider only
|
||||
const openaiModels = await openai.models.list({
|
||||
headers: {
|
||||
"x-bf-list-models-provider": "openai",
|
||||
},
|
||||
});
|
||||
|
||||
const anthropicModels = await openai.models.list({
|
||||
headers: {
|
||||
"x-bf-list-models-provider": "anthropic",
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="cURL">
|
||||
|
||||
```bash
|
||||
# List models from all providers (default)
|
||||
curl http://localhost:8080/openai/v1/models
|
||||
|
||||
# List models from specific provider
|
||||
curl http://localhost:8080/openai/v1/models \
|
||||
-H "x-bf-list-models-provider: openai"
|
||||
|
||||
# Explicitly request all providers
|
||||
curl http://localhost:8080/openai/v1/models \
|
||||
-H "x-bf-list-models-provider: all"
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### Header Behavior
|
||||
|
||||
| Header Value | Behavior |
|
||||
|--------------|----------|
|
||||
| Not set (default) | Lists models from **all configured providers** |
|
||||
| `all` | Lists models from **all configured providers** |
|
||||
| `openai` | Lists models from **OpenAI provider only** |
|
||||
| `anthropic` | Lists models from **Anthropic provider only** |
|
||||
| `vertex` | Lists models from **Vertex AI provider only** |
|
||||
| Any valid provider | Lists models from that specific provider |
|
||||
|
||||
### Response Fields
|
||||
|
||||
When listing models from all providers, some provider-specific fields may be empty or contain default values if the information is not available from all providers. This is normal behavior as different providers expose different model metadata.
|
||||
|
||||
---
|
||||
|
||||
|
||||
## Migration Strategies
|
||||
|
||||
### **Gradual Migration**
|
||||
|
||||
1. **Start with development** - Test Bifrost in dev environment
|
||||
2. **Canary deployment** - Route 5% of traffic through Bifrost
|
||||
3. **Feature-by-feature** - Migrate specific endpoints gradually
|
||||
4. **Full migration** - Switch all traffic to Bifrost
|
||||
|
||||
### **Blue-Green Migration**
|
||||
|
||||
```python
|
||||
import os
|
||||
import random
|
||||
|
||||
# Route traffic based on feature flag
|
||||
def get_base_url(provider: str) -> str:
|
||||
if os.getenv("USE_BIFROST", "false") == "true":
|
||||
return f"http://bifrost:8080/{provider}"
|
||||
else:
|
||||
return f"https://api.{provider}.com"
|
||||
|
||||
# Gradual rollout
|
||||
def should_use_bifrost() -> bool:
|
||||
rollout_percentage = int(os.getenv("BIFROST_ROLLOUT", "0"))
|
||||
return random.randint(1, 100) <= rollout_percentage
|
||||
```
|
||||
|
||||
### **Feature Flag Integration**
|
||||
|
||||
```python
|
||||
# Using feature flags for safe migration
|
||||
import openai
|
||||
from feature_flags import get_flag
|
||||
|
||||
def create_client():
|
||||
if get_flag("use_bifrost_openai"):
|
||||
base_url = "http://bifrost:8080/openai"
|
||||
else:
|
||||
base_url = "https://api.openai.com"
|
||||
|
||||
return openai.OpenAI(
|
||||
base_url=base_url,
|
||||
api_key=os.getenv("OPENAI_API_KEY")
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[HTTP Transport Overview](../quickstart/gateway/setting-up)** - Main HTTP transport guide
|
||||
- **[Endpoints](../openapi/openapi.json)** - Complete API reference
|
||||
- **[Configuration](../quickstart/gateway/provider-configuration)** - Provider setup and config
|
||||
Reference in New Issue
Block a user