first commit
This commit is contained in:
669
docs/integrations/openai-sdk/files-and-batch.mdx
Normal file
669
docs/integrations/openai-sdk/files-and-batch.mdx
Normal file
@@ -0,0 +1,669 @@
|
||||
---
|
||||
title: "Files and Batch API"
|
||||
description: "Upload files and create batch jobs for asynchronous processing using the OpenAI SDK through Bifrost across multiple providers."
|
||||
tag: "Beta"
|
||||
icon: "folder-open"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Bifrost supports the OpenAI Files API and Batch API with **cross-provider routing**. This means you can use the familiar OpenAI SDK to manage files and batch jobs across multiple providers including OpenAI, Anthropic, Bedrock, and Gemini.
|
||||
|
||||
The provider is specified using `extra_body` (for POST requests) or `extra_query` (for GET requests) parameters.
|
||||
|
||||
---
|
||||
|
||||
## Client Setup
|
||||
|
||||
The base client setup is the same for all providers. The provider is specified per-request:
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-api-key" # Your actual API key
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Files API
|
||||
|
||||
### Upload a File
|
||||
|
||||
<Note>
|
||||
**Bedrock** requires S3 storage configuration. OpenAI and Gemini use their native file storage. Anthropic uses inline requests (no file upload).
|
||||
</Note>
|
||||
|
||||
<Tabs group="provider">
|
||||
<Tab title="OpenAI Provider">
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-openai-api-key"
|
||||
)
|
||||
|
||||
# Create JSONL content for OpenAI batch format
|
||||
jsonl_content = '''{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 100}}
|
||||
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "How are you?"}], "max_tokens": 100}}'''
|
||||
|
||||
# Upload file (uses OpenAI's native file storage)
|
||||
response = client.files.create(
|
||||
file=("batch_input.jsonl", jsonl_content.encode(), "application/jsonl"),
|
||||
purpose="batch",
|
||||
extra_body={"provider": "openai"},
|
||||
)
|
||||
|
||||
print(f"Uploaded file ID: {response.id}")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Bedrock Provider">
|
||||
|
||||
For Bedrock, you need to provide S3 storage configuration:
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-api-key"
|
||||
)
|
||||
|
||||
# Create JSONL content using OpenAI-style format (Bifrost converts to Bedrock format internally)
|
||||
jsonl_content = '''{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic.claude-3-sonnet-20240229-v1:0", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 100}}
|
||||
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic.claude-3-sonnet-20240229-v1:0", "messages": [{"role": "user", "content": "How are you?"}], "max_tokens": 100}}'''
|
||||
|
||||
# Upload file with S3 storage configuration
|
||||
response = client.files.create(
|
||||
file=("batch_input.jsonl", jsonl_content.encode(), "application/jsonl"),
|
||||
purpose="batch",
|
||||
extra_body={
|
||||
"provider": "bedrock",
|
||||
"storage_config": {
|
||||
"s3": {
|
||||
"bucket": "your-s3-bucket",
|
||||
"region": "us-west-2",
|
||||
"prefix": "bifrost-batch-output",
|
||||
},
|
||||
},
|
||||
},
|
||||
)
|
||||
|
||||
print(f"Uploaded file ID: {response.id}")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Anthropic Provider">
|
||||
|
||||
Anthropic uses inline requests for batching (no file upload needed). See the Batch API section below.
|
||||
|
||||
</Tab>
|
||||
<Tab title="Gemini Provider">
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-api-key"
|
||||
)
|
||||
|
||||
# Create JSONL content using OpenAI-style format (Bifrost converts to Gemini format internally)
|
||||
jsonl_content = '''{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 100}}
|
||||
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash", "messages": [{"role": "user", "content": "How are you?"}], "max_tokens": 100}}'''
|
||||
|
||||
# Upload file (uses Gemini's native file storage)
|
||||
response = client.files.create(
|
||||
file=("batch_input.jsonl", jsonl_content.encode(), "application/jsonl"),
|
||||
purpose="batch",
|
||||
extra_body={"provider": "gemini"},
|
||||
)
|
||||
|
||||
print(f"Uploaded file ID: {response.id}")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### List Files
|
||||
|
||||
```python
|
||||
# List files for OpenAI or Gemini (no S3 config needed)
|
||||
response = client.files.list(
|
||||
extra_query={"provider": "openai"} # or "gemini"
|
||||
)
|
||||
|
||||
for file in response.data:
|
||||
print(f"File ID: {file.id}, Name: {file.filename}")
|
||||
|
||||
# For Bedrock (requires S3 config)
|
||||
response = client.files.list(
|
||||
extra_query={
|
||||
"provider": "bedrock",
|
||||
"storage_config": {
|
||||
"s3": {
|
||||
"bucket": "your-s3-bucket",
|
||||
"region": "us-west-2",
|
||||
"prefix": "bifrost-batch-output",
|
||||
},
|
||||
},
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
### Retrieve File Metadata
|
||||
|
||||
```python
|
||||
# Retrieve file metadata (specify provider)
|
||||
file_id = "file-abc123"
|
||||
response = client.files.retrieve(
|
||||
file_id,
|
||||
extra_query={"provider": "bedrock"} # or "openai", "gemini"
|
||||
)
|
||||
|
||||
print(f"File ID: {response.id}")
|
||||
print(f"Filename: {response.filename}")
|
||||
print(f"Purpose: {response.purpose}")
|
||||
print(f"Bytes: {response.bytes}")
|
||||
```
|
||||
|
||||
### Delete a File
|
||||
|
||||
```python
|
||||
# Delete file (specify provider)
|
||||
file_id = "file-abc123"
|
||||
response = client.files.delete(
|
||||
file_id,
|
||||
extra_query={"provider": "bedrock"} # or "openai", "gemini"
|
||||
)
|
||||
|
||||
print(f"Deleted: {response.deleted}")
|
||||
```
|
||||
|
||||
### Download File Content
|
||||
|
||||
```python
|
||||
# Download file content (specify provider)
|
||||
file_id = "file-abc123"
|
||||
response = client.files.content(
|
||||
file_id,
|
||||
extra_query={"provider": "bedrock"} # or "openai", "gemini"
|
||||
)
|
||||
|
||||
# Handle different response types
|
||||
if hasattr(response, "read"):
|
||||
content = response.read()
|
||||
elif hasattr(response, "content"):
|
||||
content = response.content
|
||||
else:
|
||||
content = response
|
||||
|
||||
# Decode bytes to string if needed
|
||||
if isinstance(content, bytes):
|
||||
content = content.decode("utf-8")
|
||||
|
||||
print(f"File content:\n{content}")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Batch API
|
||||
|
||||
### Create a Batch
|
||||
|
||||
<Tabs group="provider">
|
||||
<Tab title="OpenAI Provider">
|
||||
|
||||
For native OpenAI batching:
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-openai-api-key"
|
||||
)
|
||||
|
||||
# First upload a file (see Files API section)
|
||||
# Then create batch using the file ID
|
||||
|
||||
batch = client.batches.create(
|
||||
input_file_id="file-abc123",
|
||||
endpoint="/v1/chat/completions",
|
||||
completion_window="24h",
|
||||
extra_body={"provider": "openai"},
|
||||
)
|
||||
|
||||
print(f"Batch ID: {batch.id}")
|
||||
print(f"Status: {batch.status}")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Bedrock Provider">
|
||||
|
||||
For Bedrock, you need to provide output S3 URI:
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-api-key"
|
||||
)
|
||||
|
||||
# First upload a file with S3 config (see Files API section)
|
||||
# Then create batch using the file ID
|
||||
|
||||
batch = client.batches.create(
|
||||
input_file_id="file-abc123",
|
||||
endpoint="/v1/chat/completions",
|
||||
completion_window="24h",
|
||||
extra_body={
|
||||
"provider": "bedrock",
|
||||
"model": "anthropic.claude-3-sonnet-20240229-v1:0",
|
||||
"output_s3_uri": "s3://your-bucket/batch-output",
|
||||
},
|
||||
)
|
||||
|
||||
print(f"Batch ID: {batch.id}")
|
||||
print(f"Status: {batch.status}")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Anthropic Provider">
|
||||
|
||||
Anthropic supports inline requests (no file upload required):
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-anthropic-api-key"
|
||||
)
|
||||
|
||||
# Create inline requests for Anthropic
|
||||
requests = [
|
||||
{
|
||||
"custom_id": "request-1",
|
||||
"params": {
|
||||
"model": "claude-3-sonnet-20240229",
|
||||
"max_tokens": 100,
|
||||
"messages": [{"role": "user", "content": "Hello!"}]
|
||||
}
|
||||
},
|
||||
{
|
||||
"custom_id": "request-2",
|
||||
"params": {
|
||||
"model": "claude-3-sonnet-20240229",
|
||||
"max_tokens": 100,
|
||||
"messages": [{"role": "user", "content": "How are you?"}]
|
||||
}
|
||||
}
|
||||
]
|
||||
|
||||
# Create batch with inline requests (no file ID needed)
|
||||
batch = client.batches.create(
|
||||
input_file_id="", # Empty for inline requests
|
||||
endpoint="/v1/chat/completions",
|
||||
completion_window="24h",
|
||||
extra_body={
|
||||
"provider": "anthropic",
|
||||
"requests": requests,
|
||||
},
|
||||
)
|
||||
|
||||
print(f"Batch ID: {batch.id}")
|
||||
print(f"Status: {batch.status}")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Gemini Provider">
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-api-key"
|
||||
)
|
||||
|
||||
# First upload a file with Gemini format (see Files API section)
|
||||
# Then create batch using the file ID
|
||||
|
||||
batch = client.batches.create(
|
||||
input_file_id="file-abc123",
|
||||
endpoint="/v1/chat/completions",
|
||||
completion_window="24h",
|
||||
extra_body={
|
||||
"provider": "gemini",
|
||||
"model": "gemini-1.5-flash",
|
||||
},
|
||||
)
|
||||
|
||||
print(f"Batch ID: {batch.id}")
|
||||
print(f"Status: {batch.status}")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### List Batches
|
||||
|
||||
```python
|
||||
# List batches (specify provider)
|
||||
response = client.batches.list(
|
||||
limit=10,
|
||||
extra_query={
|
||||
"provider": "bedrock", # or "openai", "anthropic", "gemini"
|
||||
"model": "anthropic.claude-3-sonnet-20240229-v1:0", # Required for bedrock
|
||||
}
|
||||
)
|
||||
|
||||
for batch in response.data:
|
||||
print(f"Batch ID: {batch.id}, Status: {batch.status}")
|
||||
```
|
||||
|
||||
### Retrieve Batch Status
|
||||
|
||||
```python
|
||||
# Retrieve batch status (specify provider)
|
||||
batch_id = "batch-abc123"
|
||||
batch = client.batches.retrieve(
|
||||
batch_id,
|
||||
extra_query={"provider": "bedrock"} # or "openai", "anthropic", "gemini"
|
||||
)
|
||||
|
||||
print(f"Batch ID: {batch.id}")
|
||||
print(f"Status: {batch.status}")
|
||||
|
||||
if batch.request_counts:
|
||||
print(f"Total: {batch.request_counts.total}")
|
||||
print(f"Completed: {batch.request_counts.completed}")
|
||||
print(f"Failed: {batch.request_counts.failed}")
|
||||
```
|
||||
|
||||
### Cancel a Batch
|
||||
|
||||
```python
|
||||
# Cancel batch (specify provider)
|
||||
batch_id = "batch-abc123"
|
||||
batch = client.batches.cancel(
|
||||
batch_id,
|
||||
extra_body={"provider": "bedrock"} # or "openai", "anthropic", "gemini"
|
||||
)
|
||||
|
||||
print(f"Batch ID: {batch.id}")
|
||||
print(f"Status: {batch.status}") # "cancelling" or "cancelled"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## End-to-End Workflows
|
||||
|
||||
### OpenAI Batch Workflow
|
||||
|
||||
```python
|
||||
import time
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-openai-api-key"
|
||||
)
|
||||
|
||||
# Configuration
|
||||
provider = "openai"
|
||||
|
||||
# Step 1: Create OpenAI JSONL content
|
||||
jsonl_content = '''{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "What is 2+2?"}], "max_tokens": 100}}
|
||||
{"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 100}}'''
|
||||
|
||||
# Step 2: Upload file (uses OpenAI's native file storage)
|
||||
print("Step 1: Uploading batch input file...")
|
||||
uploaded_file = client.files.create(
|
||||
file=("batch_e2e.jsonl", jsonl_content.encode(), "application/jsonl"),
|
||||
purpose="batch",
|
||||
extra_body={"provider": provider},
|
||||
)
|
||||
print(f" Uploaded file: {uploaded_file.id}")
|
||||
|
||||
# Step 3: Create batch
|
||||
print("Step 2: Creating batch job...")
|
||||
batch = client.batches.create(
|
||||
input_file_id=uploaded_file.id,
|
||||
endpoint="/v1/chat/completions",
|
||||
completion_window="24h",
|
||||
extra_body={"provider": provider},
|
||||
)
|
||||
print(f" Created batch: {batch.id}, status: {batch.status}")
|
||||
|
||||
# Step 4: Poll for completion
|
||||
print("Step 3: Polling batch status...")
|
||||
for i in range(10):
|
||||
batch = client.batches.retrieve(batch.id, extra_query={"provider": provider})
|
||||
print(f" Poll {i+1}: status = {batch.status}")
|
||||
|
||||
if batch.status in ["completed", "failed", "expired", "cancelled"]:
|
||||
break
|
||||
|
||||
if batch.request_counts:
|
||||
print(f" Completed: {batch.request_counts.completed}/{batch.request_counts.total}")
|
||||
|
||||
time.sleep(5)
|
||||
|
||||
print(f"\nSuccess! Batch {batch.id} workflow completed.")
|
||||
```
|
||||
|
||||
### Bedrock Batch Workflow
|
||||
|
||||
```python
|
||||
import time
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-api-key"
|
||||
)
|
||||
|
||||
# Configuration
|
||||
provider = "bedrock"
|
||||
s3_bucket = "your-s3-bucket"
|
||||
s3_region = "us-west-2"
|
||||
model = "anthropic.claude-3-sonnet-20240229-v1:0"
|
||||
|
||||
# Step 1: Create JSONL content using OpenAI-style format (Bifrost converts to Bedrock format internally)
|
||||
jsonl_content = '''{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic.claude-3-sonnet-20240229-v1:0", "messages": [{"role": "user", "content": "What is 2+2?"}], "max_tokens": 100}}
|
||||
{"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "anthropic.claude-3-sonnet-20240229-v1:0", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 100}}'''
|
||||
|
||||
# Step 2: Upload file
|
||||
print("Step 1: Uploading batch input file...")
|
||||
uploaded_file = client.files.create(
|
||||
file=("batch_e2e.jsonl", jsonl_content.encode(), "application/jsonl"),
|
||||
purpose="batch",
|
||||
extra_body={
|
||||
"provider": provider,
|
||||
"storage_config": {
|
||||
"s3": {"bucket": s3_bucket, "region": s3_region, "prefix": "batch-input"},
|
||||
},
|
||||
},
|
||||
)
|
||||
print(f" Uploaded file: {uploaded_file.id}")
|
||||
|
||||
# Step 3: Create batch
|
||||
print("Step 2: Creating batch job...")
|
||||
batch = client.batches.create(
|
||||
input_file_id=uploaded_file.id,
|
||||
endpoint="/v1/chat/completions",
|
||||
completion_window="24h",
|
||||
extra_body={
|
||||
"provider": provider,
|
||||
"model": model,
|
||||
"output_s3_uri": f"s3://{s3_bucket}/batch-output",
|
||||
},
|
||||
)
|
||||
print(f" Created batch: {batch.id}, status: {batch.status}")
|
||||
|
||||
# Step 4: Poll for completion
|
||||
print("Step 3: Polling batch status...")
|
||||
for i in range(10):
|
||||
batch = client.batches.retrieve(batch.id, extra_query={"provider": provider})
|
||||
print(f" Poll {i+1}: status = {batch.status}")
|
||||
|
||||
if batch.status in ["completed", "failed", "expired", "cancelled"]:
|
||||
break
|
||||
|
||||
if batch.request_counts:
|
||||
print(f" Completed: {batch.request_counts.completed}/{batch.request_counts.total}")
|
||||
|
||||
time.sleep(5)
|
||||
|
||||
print(f"\nSuccess! Batch {batch.id} workflow completed.")
|
||||
```
|
||||
|
||||
### Anthropic Inline Batch Workflow
|
||||
|
||||
```python
|
||||
import time
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-anthropic-api-key"
|
||||
)
|
||||
|
||||
provider = "anthropic"
|
||||
|
||||
# Step 1: Create inline requests
|
||||
print("Step 1: Creating inline requests...")
|
||||
requests = [
|
||||
{
|
||||
"custom_id": "math-question",
|
||||
"params": {
|
||||
"model": "claude-3-sonnet-20240229",
|
||||
"max_tokens": 100,
|
||||
"messages": [{"role": "user", "content": "What is 15 * 7?"}]
|
||||
}
|
||||
},
|
||||
{
|
||||
"custom_id": "geography-question",
|
||||
"params": {
|
||||
"model": "claude-3-sonnet-20240229",
|
||||
"max_tokens": 100,
|
||||
"messages": [{"role": "user", "content": "What is the largest ocean?"}]
|
||||
}
|
||||
}
|
||||
]
|
||||
print(f" Created {len(requests)} inline requests")
|
||||
|
||||
# Step 2: Create batch
|
||||
print("Step 2: Creating batch job...")
|
||||
batch = client.batches.create(
|
||||
input_file_id="",
|
||||
endpoint="/v1/chat/completions",
|
||||
completion_window="24h",
|
||||
extra_body={"provider": provider, "requests": requests},
|
||||
)
|
||||
print(f" Created batch: {batch.id}, status: {batch.status}")
|
||||
|
||||
# Step 3: Poll for completion
|
||||
print("Step 3: Polling batch status...")
|
||||
for i in range(10):
|
||||
batch = client.batches.retrieve(batch.id, extra_query={"provider": provider})
|
||||
print(f" Poll {i+1}: status = {batch.status}")
|
||||
|
||||
if batch.status in ["completed", "failed", "expired", "cancelled", "ended"]:
|
||||
break
|
||||
|
||||
time.sleep(5)
|
||||
|
||||
print(f"\nSuccess! Batch {batch.id} workflow completed.")
|
||||
```
|
||||
|
||||
### Gemini Batch Workflow
|
||||
|
||||
```python
|
||||
import time
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="your-api-key"
|
||||
)
|
||||
|
||||
# Configuration
|
||||
provider = "gemini"
|
||||
model = "gemini-1.5-flash"
|
||||
|
||||
# Step 1: Create JSONL content using OpenAI-style format (Bifrost converts to Gemini format internally)
|
||||
jsonl_content = '''{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash", "messages": [{"role": "user", "content": "What is 2+2?"}], "max_tokens": 100}}
|
||||
{"custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 100}}'''
|
||||
|
||||
# Step 2: Upload file (uses Gemini's native file storage)
|
||||
print("Step 1: Uploading batch input file...")
|
||||
uploaded_file = client.files.create(
|
||||
file=("batch_e2e.jsonl", jsonl_content.encode(), "application/jsonl"),
|
||||
purpose="batch",
|
||||
extra_body={"provider": provider},
|
||||
)
|
||||
print(f" Uploaded file: {uploaded_file.id}")
|
||||
|
||||
# Step 3: Create batch
|
||||
print("Step 2: Creating batch job...")
|
||||
batch = client.batches.create(
|
||||
input_file_id=uploaded_file.id,
|
||||
endpoint="/v1/chat/completions",
|
||||
completion_window="24h",
|
||||
extra_body={
|
||||
"provider": provider,
|
||||
"model": model,
|
||||
},
|
||||
)
|
||||
print(f" Created batch: {batch.id}, status: {batch.status}")
|
||||
|
||||
# Step 4: Poll for completion
|
||||
print("Step 3: Polling batch status...")
|
||||
for i in range(10):
|
||||
batch = client.batches.retrieve(batch.id, extra_query={"provider": provider})
|
||||
print(f" Poll {i+1}: status = {batch.status}")
|
||||
|
||||
if batch.status in ["completed", "failed", "expired", "cancelled"]:
|
||||
break
|
||||
|
||||
if batch.request_counts:
|
||||
print(f" Completed: {batch.request_counts.completed}/{batch.request_counts.total}")
|
||||
|
||||
time.sleep(5)
|
||||
|
||||
print(f"\nSuccess! Batch {batch.id} workflow completed.")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Provider-Specific Notes
|
||||
|
||||
| Provider | File Upload | Batch Creation | Extra Configuration |
|
||||
|----------|-------------|----------------|---------------------|
|
||||
| **OpenAI** | ✅ Native storage | ✅ File-based | None |
|
||||
| **Bedrock** | ✅ S3-based | ✅ File-based | `storage_config`, `output_s3_uri` |
|
||||
| **Anthropic** | ❌ Not supported | ✅ Inline requests | `requests` array in `extra_body` |
|
||||
| **Gemini** | ✅ Native storage | ✅ File-based | `model` in `extra_body` |
|
||||
|
||||
<Note>
|
||||
- **OpenAI** and **Gemini** use their native file storage - no S3 configuration needed
|
||||
- **Bedrock** requires S3 storage configuration (`storage_config`, `output_s3_uri`)
|
||||
- **Anthropic** does not support file-based batch operations - use inline requests instead
|
||||
</Note>
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[Overview](./overview)** - OpenAI SDK integration basics
|
||||
- **[Configuration](../../quickstart/gateway/provider-configuration)** - Bifrost setup and configuration
|
||||
- **[Core Features](../../features/)** - Governance, semantic caching, and more
|
||||
563
docs/integrations/openai-sdk/overview.mdx
Normal file
563
docs/integrations/openai-sdk/overview.mdx
Normal file
@@ -0,0 +1,563 @@
|
||||
---
|
||||
title: "Overview"
|
||||
description: "Use Bifrost as a drop-in replacement for OpenAI API with full compatibility and enhanced features."
|
||||
icon: "book"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Bifrost provides complete OpenAI API compatibility through protocol adaptation. The integration handles request transformation, response normalization, and error mapping between OpenAI's API specification and Bifrost's internal processing pipeline.
|
||||
|
||||
This integration enables you to utilize Bifrost's features like governance, load balancing, semantic caching, multi-provider support, and more, all while preserving your existing OpenAI SDK-based architecture.
|
||||
|
||||
**Endpoint:** `/openai`
|
||||
|
||||
---
|
||||
|
||||
## Setup
|
||||
|
||||
<Tabs group="openai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python {5}
|
||||
import openai
|
||||
|
||||
# Configure client to use Bifrost
|
||||
client = openai.OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="dummy-key" # Keys handled by Bifrost
|
||||
)
|
||||
|
||||
# Make requests as usual
|
||||
response = client.chat.completions.create(
|
||||
model="gpt-4o-mini",
|
||||
messages=[{"role": "user", "content": "Hello!"}]
|
||||
)
|
||||
|
||||
print(response.choices[0].message.content)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript {5}
|
||||
import OpenAI from "openai";
|
||||
|
||||
// Configure client to use Bifrost
|
||||
const openai = new OpenAI({
|
||||
baseURL: "http://localhost:8080/openai",
|
||||
apiKey: "dummy-key", // Keys handled by Bifrost
|
||||
});
|
||||
|
||||
// Make requests as usual
|
||||
const response = await openai.chat.completions.create({
|
||||
model: "gpt-4o-mini",
|
||||
messages: [{ role: "user", content: "Hello!" }],
|
||||
});
|
||||
|
||||
console.log(response.choices[0].message.content);
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Provider/Model Usage Examples
|
||||
|
||||
Use multiple providers through the same OpenAI SDK format by prefixing model names with the provider:
|
||||
|
||||
<Tabs group="openai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
import openai
|
||||
|
||||
client = openai.OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="dummy-key"
|
||||
)
|
||||
|
||||
# OpenAI models (default)
|
||||
openai_response = client.chat.completions.create(
|
||||
model="gpt-4o-mini",
|
||||
messages=[{"role": "user", "content": "Hello from OpenAI!"}]
|
||||
)
|
||||
|
||||
# Anthropic models via OpenAI SDK format
|
||||
anthropic_response = client.chat.completions.create(
|
||||
model="anthropic/claude-3-sonnet-20240229",
|
||||
messages=[{"role": "user", "content": "Hello from Claude!"}]
|
||||
)
|
||||
|
||||
# Google Vertex models via OpenAI SDK format
|
||||
vertex_response = client.chat.completions.create(
|
||||
model="vertex/gemini-pro",
|
||||
messages=[{"role": "user", "content": "Hello from Gemini!"}]
|
||||
)
|
||||
|
||||
# Azure models
|
||||
azure_response = client.chat.completions.create(
|
||||
model="azure/gpt-4o",
|
||||
messages=[{"role": "user", "content": "Hello from Azure!"}]
|
||||
)
|
||||
|
||||
# Local Ollama models
|
||||
ollama_response = client.chat.completions.create(
|
||||
model="ollama/llama3.1:8b",
|
||||
messages=[{"role": "user", "content": "Hello from Ollama!"}]
|
||||
)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import OpenAI from "openai";
|
||||
|
||||
const openai = new OpenAI({
|
||||
baseURL: "http://localhost:8080/openai",
|
||||
apiKey: "dummy-key",
|
||||
});
|
||||
|
||||
// OpenAI models (default)
|
||||
const openaiResponse = await openai.chat.completions.create({
|
||||
model: "gpt-4o-mini",
|
||||
messages: [{ role: "user", content: "Hello from OpenAI!" }],
|
||||
});
|
||||
|
||||
// Anthropic models via OpenAI SDK format
|
||||
const anthropicResponse = await openai.chat.completions.create({
|
||||
model: "anthropic/claude-3-sonnet-20240229",
|
||||
messages: [{ role: "user", content: "Hello from Claude!" }],
|
||||
});
|
||||
|
||||
// Google Vertex models via OpenAI SDK format
|
||||
const vertexResponse = await openai.chat.completions.create({
|
||||
model: "vertex/gemini-pro",
|
||||
messages: [{ role: "user", content: "Hello from Gemini!" }],
|
||||
});
|
||||
|
||||
// Azure models
|
||||
const azureResponse = await openai.chat.completions.create({
|
||||
model: "azure/gpt-4o",
|
||||
messages: [{ role: "user", content: "Hello from Azure!" }],
|
||||
});
|
||||
|
||||
// Local Ollama models
|
||||
const ollamaResponse = await openai.chat.completions.create({
|
||||
model: "ollama/llama3.1:8b",
|
||||
messages: [{ role: "user", content: "Hello from Ollama!" }],
|
||||
});
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Adding Custom Headers
|
||||
|
||||
Pass custom headers required by Bifrost plugins (like governance, telemetry, etc.):
|
||||
|
||||
<Tabs group="openai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
import openai
|
||||
|
||||
client = openai.OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="dummy-key",
|
||||
default_headers={
|
||||
"x-bf-vk": "vk_12345", # Virtual key for governance
|
||||
}
|
||||
)
|
||||
|
||||
response = client.chat.completions.create(
|
||||
model="gpt-4o-mini",
|
||||
messages=[{"role": "user", "content": "Hello with custom headers!"}]
|
||||
)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import OpenAI from "openai";
|
||||
|
||||
const openai = new OpenAI({
|
||||
baseURL: "http://localhost:8080/openai",
|
||||
apiKey: "dummy-key",
|
||||
defaultHeaders: {
|
||||
"x-bf-vk": "vk_12345", // Virtual key for governance
|
||||
},
|
||||
});
|
||||
|
||||
const response = await openai.chat.completions.create({
|
||||
model: "gpt-4o-mini",
|
||||
messages: [{ role: "user", content: "Hello with custom headers!" }],
|
||||
});
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Using Direct Keys
|
||||
|
||||
Pass API keys directly in requests to bypass Bifrost's load balancing. You can pass any provider's API key (OpenAI, Anthropic, Mistral, etc.) since Bifrost only looks for `Authorization` or `x-api-key` headers. This requires the **Allow Direct API keys** option to be enabled in Bifrost configuration.
|
||||
|
||||
> **Learn more:** See [Key Management](../../features/keys-management#direct-key-bypass) for enabling direct API key usage.
|
||||
|
||||
<Tabs group="openai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
import openai
|
||||
|
||||
# Using OpenAI's API key directly
|
||||
client_with_direct_key = openai.OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="sk-your-openai-key" # OpenAI's API key works
|
||||
)
|
||||
|
||||
openai_response = client_with_direct_key.chat.completions.create(
|
||||
model="openai/gpt-4o-mini",
|
||||
messages=[{"role": "user", "content": "Hello from GPT!"}]
|
||||
)
|
||||
|
||||
# Or pass different provider keys per request
|
||||
client = openai.OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="dummy-key"
|
||||
)
|
||||
|
||||
# Use OpenAI key for GPT models
|
||||
openai_response = client.chat.completions.create(
|
||||
model="gpt-4o-mini",
|
||||
messages=[{"role": "user", "content": "Hello GPT!"}],
|
||||
extra_headers={
|
||||
"Authorization": "Bearer sk-your-openai-key"
|
||||
}
|
||||
)
|
||||
|
||||
# Use Anthropic key for Claude models
|
||||
anthropic_response = client.chat.completions.create(
|
||||
model="anthropic/claude-3-sonnet-20240229",
|
||||
messages=[{"role": "user", "content": "Hello Claude!"}],
|
||||
extra_headers={
|
||||
"x-api-key": "sk-ant-your-anthropic-key"
|
||||
}
|
||||
)
|
||||
|
||||
# Use Gemini key for Gemini models
|
||||
gemini_response = client.chat.completions.create(
|
||||
model="gemini/gemini-2.5-flash",
|
||||
messages=[{"role": "user", "content": "Hello Gemini!"}],
|
||||
extra_headers={
|
||||
"x-goog-api-key": "sk-gemini-your-gemini-key"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import OpenAI from "openai";
|
||||
|
||||
// Using OpenAI's API key directly
|
||||
const openaiWithDirectKey = new OpenAI({
|
||||
baseURL: "http://localhost:8080/openai",
|
||||
apiKey: "sk-your-openai-key", // OpenAI's API key works
|
||||
});
|
||||
|
||||
const openaiResponse = await openaiWithDirectKey.chat.completions.create({
|
||||
model: "openai/gpt-4o-mini",
|
||||
messages: [{ role: "user", content: "Hello from GPT!" }],
|
||||
});
|
||||
|
||||
// Or pass different provider keys per request
|
||||
const openai = new OpenAI({
|
||||
baseURL: "http://localhost:8080/openai",
|
||||
apiKey: "dummy-key",
|
||||
});
|
||||
|
||||
// Use OpenAI key for GPT models
|
||||
const openaiResponse = await openai.chat.completions.create({
|
||||
model: "gpt-4o-mini",
|
||||
messages: [{ role: "user", content: "Hello GPT!" }],
|
||||
headers: {
|
||||
"Authorization": "Bearer sk-your-openai-key",
|
||||
},
|
||||
});
|
||||
|
||||
// Use Anthropic key for Claude models
|
||||
const anthropicResponseWithHeader = await openai.chat.completions.create({
|
||||
model: "anthropic/claude-3-sonnet-20240229",
|
||||
messages: [{ role: "user", content: "Hello Claude!" }],
|
||||
headers: {
|
||||
"x-api-key": "sk-ant-your-anthropic-key",
|
||||
},
|
||||
});
|
||||
|
||||
// Use Gemini key for Gemini models
|
||||
const geminiResponseWithHeader = await openai.chat.completions.create({
|
||||
model: "gemini/gemini-2.5-flash",
|
||||
messages: [{ role: "user", content: "Hello Gemini!" }],
|
||||
headers: {
|
||||
"x-goog-api-key": "sk-gemini-your-gemini-key",
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
For Azure, you can use the AzureOpenAI client and point it to Bifrost integration endpoint. The `x-bf-azure-endpoint` header is required to specify your Azure resource endpoint.
|
||||
|
||||
<Tabs group="openai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
from openai import AzureOpenAI
|
||||
|
||||
azure_client = AzureOpenAI(
|
||||
api_key="your-azure-api-key",
|
||||
api_version="2024-02-01",
|
||||
azure_endpoint="http://localhost:8080/openai", # Point to Bifrost
|
||||
default_headers={
|
||||
"x-bf-azure-endpoint": "https://your-resource.openai.azure.com"
|
||||
}
|
||||
)
|
||||
|
||||
azure_response = azure_client.chat.completions.create(
|
||||
model="gpt-4-deployment", # Your deployment name
|
||||
messages=[{"role": "user", "content": "Hello from Azure!"}]
|
||||
)
|
||||
|
||||
print(azure_response.choices[0].message.content)
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import { AzureOpenAI } from "openai";
|
||||
|
||||
const azureClient = new AzureOpenAI({
|
||||
apiKey: "your-azure-api-key",
|
||||
apiVersion: "2024-02-01",
|
||||
baseURL: "http://localhost:8080/openai", // Point to Bifrost
|
||||
defaultHeaders: {
|
||||
"x-bf-azure-endpoint": "https://your-resource.openai.azure.com"
|
||||
}
|
||||
});
|
||||
|
||||
const azureResponse = await azureClient.chat.completions.create({
|
||||
model: "gpt-4-deployment", // Your deployment name
|
||||
messages: [{ role: "user", content: "Hello from Azure!" }],
|
||||
});
|
||||
|
||||
console.log(azureResponse.choices[0].message.content);
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Async Inference
|
||||
|
||||
Submit inference requests asynchronously and poll for results later using the `x-bf-async` header. This is useful for long-running requests where you don't want to hold a connection open. See [Async Inference](../../features/async-inference) for full details.
|
||||
|
||||
<Note>
|
||||
Async inference requires a [Logs Store](../../features/observability/default) to be configured and is not compatible with streaming.
|
||||
</Note>
|
||||
|
||||
### Chat Completions
|
||||
|
||||
<Tabs group="openai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
import openai
|
||||
import time
|
||||
|
||||
client = openai.OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="dummy-key"
|
||||
)
|
||||
|
||||
# Submit async request
|
||||
initial = client.chat.completions.create(
|
||||
model="openai/gpt-4o-mini",
|
||||
messages=[{"role": "user", "content": "Tell me a short story."}],
|
||||
extra_headers={"x-bf-async": "true"}
|
||||
)
|
||||
|
||||
# If choices are present, the request completed synchronously
|
||||
if initial.choices:
|
||||
print(initial.choices[0].message.content)
|
||||
else:
|
||||
# Poll until completed
|
||||
while True:
|
||||
time.sleep(2)
|
||||
poll = client.chat.completions.create(
|
||||
model="openai/gpt-4o-mini",
|
||||
messages=[{"role": "user", "content": "Tell me a short story."}],
|
||||
extra_headers={"x-bf-async-id": initial.id}
|
||||
)
|
||||
if poll.choices:
|
||||
print(poll.choices[0].message.content)
|
||||
break
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import OpenAI from "openai";
|
||||
|
||||
const openai = new OpenAI({
|
||||
baseURL: "http://localhost:8080/openai",
|
||||
apiKey: "dummy-key",
|
||||
});
|
||||
|
||||
// Submit async request
|
||||
const initial = await openai.chat.completions.create(
|
||||
{
|
||||
model: "openai/gpt-4o-mini",
|
||||
messages: [{ role: "user", content: "Tell me a short story." }],
|
||||
},
|
||||
{ headers: { "x-bf-async": "true" } }
|
||||
);
|
||||
|
||||
// If choices are present, the request completed synchronously
|
||||
if (initial.choices?.length > 0) {
|
||||
console.log(initial.choices[0].message.content);
|
||||
} else {
|
||||
// Poll until completed
|
||||
while (true) {
|
||||
await new Promise((r) => setTimeout(r, 2000));
|
||||
const poll = await openai.chat.completions.create(
|
||||
{
|
||||
model: "openai/gpt-4o-mini",
|
||||
messages: [{ role: "user", content: "Tell me a short story." }],
|
||||
},
|
||||
{ headers: { "x-bf-async-id": initial.id } }
|
||||
);
|
||||
if (poll.choices?.length > 0) {
|
||||
console.log(poll.choices[0].message.content);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### Responses API
|
||||
|
||||
<Tabs group="openai-sdk">
|
||||
<Tab title="Python">
|
||||
|
||||
```python
|
||||
import openai
|
||||
import time
|
||||
|
||||
client = openai.OpenAI(
|
||||
base_url="http://localhost:8080/openai",
|
||||
api_key="dummy-key"
|
||||
)
|
||||
|
||||
# Submit async request
|
||||
initial = client.responses.create(
|
||||
model="openai/gpt-4o-mini",
|
||||
input="Tell me a short story.",
|
||||
extra_headers={"x-bf-async": "true"}
|
||||
)
|
||||
|
||||
# If status is "completed", the request completed synchronously
|
||||
if initial.status == "completed":
|
||||
print(initial.output_text)
|
||||
else:
|
||||
# Poll until completed
|
||||
while True:
|
||||
time.sleep(2)
|
||||
poll = client.responses.create(
|
||||
model="openai/gpt-4o-mini",
|
||||
input="Tell me a short story.",
|
||||
extra_headers={"x-bf-async-id": initial.id}
|
||||
)
|
||||
if poll.status == "completed":
|
||||
print(poll.output_text)
|
||||
break
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="JavaScript">
|
||||
|
||||
```javascript
|
||||
import OpenAI from "openai";
|
||||
|
||||
const openai = new OpenAI({
|
||||
baseURL: "http://localhost:8080/openai",
|
||||
apiKey: "dummy-key",
|
||||
});
|
||||
|
||||
// Submit async request
|
||||
const initial = await openai.responses.create(
|
||||
{ model: "openai/gpt-4o-mini", input: "Tell me a short story." },
|
||||
{ headers: { "x-bf-async": "true" } }
|
||||
);
|
||||
|
||||
// If status is "completed", the request completed synchronously
|
||||
if (initial.status === "completed") {
|
||||
console.log(initial.output_text);
|
||||
} else {
|
||||
// Poll until completed
|
||||
while (true) {
|
||||
await new Promise((r) => setTimeout(r, 2000));
|
||||
const poll = await openai.responses.create(
|
||||
{ model: "openai/gpt-4o-mini", input: "Tell me a short story." },
|
||||
{ headers: { "x-bf-async-id": initial.id } }
|
||||
);
|
||||
if (poll.status === "completed") {
|
||||
console.log(poll.output_text);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### Async Headers
|
||||
|
||||
| Header | Description |
|
||||
|---|---|
|
||||
| `x-bf-async: true` | Submit the request as an async job. Returns immediately with a job ID. |
|
||||
| `x-bf-async-id: <job-id>` | Poll for results of a previously submitted async job. |
|
||||
| `x-bf-async-job-result-ttl: <seconds>` | Override the default result TTL (default: 3600s). |
|
||||
|
||||
---
|
||||
|
||||
## Supported Features
|
||||
|
||||
The OpenAI integration supports all features that are available in both the OpenAI SDK and Bifrost core functionality. If the OpenAI SDK supports a feature and Bifrost supports it, the integration will work seamlessly.
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[Files and Batch API](./files-and-batch)** - File uploads and batch processing
|
||||
- **[Anthropic SDK](../anthropic-sdk/overview)** - Claude integration patterns
|
||||
- **[Google GenAI SDK](../genai-sdk)** - Gemini integration patterns
|
||||
- **[Configuration](../../quickstart/README)** - Bifrost setup and configuration
|
||||
- **[Core Features](../../features/)** - Advanced Bifrost capabilities
|
||||
|
||||
Reference in New Issue
Block a user