Files
Beyhan Oğur 880f412e2c first commit
2026-04-26 21:52:23 +03:00

766 lines
22 KiB
Plaintext

---
title: "Replicate"
description: "Replicate API conversion guide - prediction-based architecture, model-specific parameters, and async/sync modes"
icon: "R"
---
## Overview
Replicate is architecturally different from other providers in Bifrost. It uses a **prediction-based API** where every request creates a "prediction" that runs asynchronously. Each model on Replicate defines its own input schema, making it highly flexible but requiring model-specific parameter knowledge.
### Key Architectural Differences
1. **Prediction-Based System**: All operations create predictions via `/v1/predictions` or deployment endpoints
2. **Model-Specific Inputs**: Each model has its own parameter schema (use `extra_params` for model-specific fields)
3. **Async/Sync Modes**: Predictions can run synchronously (with `Prefer: wait` header) or asynchronously (with polling)
4. **Flexible Output**: Output can be strings, arrays, URLs, or data URIs depending on the model
### Supported Operations
| Operation | Non-Streaming | Streaming | Endpoint |
|-----------|---------------|-----------|----------|
| Chat Completions | ✅ | ✅ | `/v1/predictions` |
| Responses API | ✅ | ✅ | `/v1/predictions` |
| Text Completions | ✅ | ✅ | `/v1/predictions` |
| Image Generation | ✅ | ✅ | `/v1/predictions` |
| Image Edit | ✅ | ✅ | `/v1/predictions` |
| Video Generation | ✅ | - | `/v1/predictions` |
| Image Variation | ❌ | ❌ | - |
| Files | ✅ | - | `/v1/files` |
| List Models | ✅ | - | `/v1/deployments` |
| Embeddings | ❌ | ❌ | - |
| Speech (TTS) | ❌ | ❌ | - |
| Transcriptions (STT) | ❌ | ❌ | - |
| Batch | ❌ | ❌ | - |
<Note>
**List Models** returns account-specific deployments only, not all public models on Replicate.
</Note>
---
# Model Identification
Replicate models can be specified in three ways:
## 1. Version ID
```bash
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
"messages": [{"role": "user", "content": "Hello"}]
}'
```
## 2. Model Name
Format: `owner/model-name`
```bash
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/meta/llama-2-7b-chat",
"messages": [{"role": "user", "content": "Hello"}]
}'
```
## 3. Deployment
Configure deployed models in the Replicate key configuration. Deployments map custom model identifiers to actual deployment paths.
**Configuration Example:**
```json
{
"provider": "replicate",
"value": "your-api-key",
"aliases": {
"my-model": "owner/my-deployment-name"
}
}
```
**Usage:**
```bash
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/my-model",
"messages": [{"role": "user", "content": "Hello"}]
}'
```
---
# Prediction Modes
## Sync Mode
Bifrost uses sync mode with the `Prefer: wait` header if it is present in the request headers. The request blocks until the prediction completes or times out (default 60 seconds).
**How it works:**
1. Creates prediction with `Prefer: wait=60` header
2. Replicate holds connection open for up to 60 seconds
3. If prediction completes within timeout, returns result immediately
4. If timeout expires, falls back to polling mode
## Async Mode (Polling)
It is the default mode of Replicate predictions. Bifrost automatically polls the prediction URL every 2 seconds until completion.
**Status Flow**: `starting` → `processing` → `succeeded`/`failed`/`canceled`
---
# 1. Chat Completions
### Message Conversion
**System Messages**: Extracted from messages array and concatenated into `system_prompt` field.
**User/Assistant Messages**: Preserved as conversation context. Text content from content blocks is concatenated with newlines.
**Image Content**: Non-base64 image URLs from message content blocks are extracted and passed as `image_input` array.
```json
// Input
{
"messages": [
{"role": "system", "content": "You are helpful"},
{"role": "user", "content": "Hello"}
]
}
// Converted to Replicate format
{
"input": {
"system_prompt": "You are helpful",
"prompt": "Hello",
"messages": [...] // Original messages array also included
}
}
```
### System Prompt Filtering
**Important**: Not all Replicate models support the `system_prompt` field. For unsupported models, the system prompt is automatically prepended to the conversation prompt.
**Models without system_prompt support:**
- `meta/meta-llama-3-8b`
- `meta/llama-2-70b`
- `openai/gpt-oss-20b`
- `openai/o1-mini`
- `xai/grok-4`
- All `deepseek-ai/deepseek*` models (e.g., `deepseek-r1`, `deepseek-v3`)
### Model-Specific Parameters
Use `extra_params` to pass model-specific parameters. These are **flattened into the input object**:
<Tabs>
<Tab title="Gateway">
```bash
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/meta/llama-2-7b-chat",
"messages": [{"role": "user", "content": "Hello"}],
"temperature": 0.7,
"top_k": 50,
"repetition_penalty": 1.1,
"min_new_tokens": 10
}'
```
</Tab>
<Tab title="Go SDK">
```go
resp, err := client.ChatCompletionRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostChatRequest{
Provider: schemas.Replicate,
Model: "meta/llama-2-7b-chat",
Input: messages,
Params: &schemas.ChatParameters{
Temperature: schemas.Ptr(0.7),
ExtraParams: map[string]interface{}{
"top_k": 50,
"repetition_penalty": 1.1,
"min_new_tokens": 10,
},
},
})
```
</Tab>
</Tabs>
<Warning>
**Model Schema Discovery**: Each Replicate model has unique parameters. Check the model's documentation on replicate.com or use the OpenAPI schema from the model version to discover available parameters.
</Warning>
## Response Conversion
### Field Mapping
- **Output**:
- String → `choices[0].message.content`
- Array of strings → joined and mapped to `choices[0].message.content`
- Object with `text` field → `text` value mapped to `choices[0].message.content`
- **Status**: `succeeded` → `finish_reason: "stop"`, `failed` → `finish_reason: "error"`
- **Metrics**: `input_token_count` → `prompt_tokens`, `output_token_count` → `completion_tokens`
### Example Response
```json
{
"id": "abc123",
"model": "meta/llama-2-7b-chat",
"object": "chat.completion",
"created": 1234567890,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 8,
"total_tokens": 18
}
}
```
## Streaming
Replicate streaming uses Server-Sent Events (SSE) with the following event types:
| Event Type | Description | Data Format |
|------------|-------------|-------------|
| `output` | Content chunk | Plain text string |
| `done` | Completion | JSON: `{"reason": ""}` (empty = success) |
| `error` | Error occurred | JSON: `{"detail": "error message"}` |
**Streaming Flow:**
1. Bifrost sets `stream: true` in prediction input
2. Replicate returns `urls.stream` in initial response
3. Bifrost connects to stream URL and processes SSE events
4. `output` events → content deltas
5. `done` event → final chunk with `finish_reason`
**Done Event Reasons:**
- Empty or no reason = success (`finish_reason: "stop"`)
- `"canceled"` = prediction was canceled
- `"error"` = prediction failed
---
# 2. Responses API
The Responses API is converted internally to Chat Completions or native Replicate format depending on the model:
```go
// Responses request → Replicate prediction conversion
ResponsesRequest → ReplicatePredictionRequest → ReplicatePredictionResponse → BifrostResponsesResponse
```
**Conversion Logic:**
1. **For OpenAI models with `gpt-5-structured`**: Uses native Responses format with `input_item_list`, `tools`, and `json_schema` support
2. **For all other models**: Converted to Chat Completions format using message conversion logic
Same parameter mapping and system prompt handling as [Chat Completions](#1-chat-completions).
## Response Format
Responses follow standard Responses API format with status mapping:
| Replicate Status | Responses Status |
|------------------|------------------|
| `succeeded` | `completed` |
| `failed` | `failed` |
| `canceled` | `cancelled` |
| `processing` | `in_progress` |
| `starting` | `queued` |
---
# 3. Text Completions (Legacy)
### Conversion
- **Prompt array**: Joined with newlines into single `prompt` field
- **top_k**: Pass via `extra_params` (model-specific)
### Example
```bash
curl -X POST http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/meta/llama-2-7b",
"prompt": "Once upon a time",
"max_tokens": 100,
"temperature": 0.8,
"top_k": 40
}'
```
## Response
Same conversion as chat completions: output string/array → `choices[0].text`, with usage metrics from prediction metrics.
---
# 4. Image Generation
### Parameter Mapping
```json
{
"prompt": "prompt",
"n": "number_of_images",
"aspect_ratio": "aspect_ratio",
"resolution": "resolution",
"output_format": "output_format",
"quality": "quality",
"background": "background",
"seed": "seed",
"negative_prompt": "negative_prompt",
"num_inference_steps": "num_inference_steps",
"input_images": "input_images"
}
```
### Input Image Field Mapping
**Important**: Different Replicate models expect input images in different fields. Bifrost automatically maps `input_images` to the correct field based on the model.
**Field Mapping by Model:**
| Field | Models |
|-------|--------|
| `image_prompt` | `black-forest-labs/flux-1.1-pro`<br/>`black-forest-labs/flux-1.1-pro-ultra`<br/>`black-forest-labs/flux-pro`<br/>`black-forest-labs/flux-1.1-pro-ultra-finetuned` |
| `input_image` | `black-forest-labs/flux-kontext-pro`<br/>`black-forest-labs/flux-kontext-max`<br/>`black-forest-labs/flux-kontext-dev` |
| `image` | `black-forest-labs/flux-dev`<br/>`black-forest-labs/flux-fill-pro`<br/>`black-forest-labs/flux-dev-lora`<br/>`black-forest-labs/flux-krea-dev` |
| `input_images` | All other models (default) |
<Note>
For models that expect a single image field (`image_prompt`, `input_image`, `image`), only the first image from the `input_images` array is used.
</Note>
### Example
<Tabs>
<Tab title="Gateway">
```bash
curl -X POST http://localhost:8080/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/black-forest-labs/flux-schnell",
"prompt": "A serene mountain landscape at sunset",
"aspect_ratio": "16:9",
"output_format": "webp",
"num_inference_steps": 4,
"seed": 42
}'
```
</Tab>
<Tab title="Go SDK">
```go
resp, err := client.ImageGenerationRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostImageGenerationRequest{
Provider: schemas.Replicate,
Model: "black-forest-labs/flux-schnell",
Input: &schemas.ImageGenerationInput{
Prompt: "A serene mountain landscape at sunset",
},
Params: &schemas.ImageGenerationParameters{
AspectRatio: schemas.Ptr("16:9"),
OutputFormat: schemas.Ptr("webp"),
NumInferenceSteps: schemas.Ptr(4),
Seed: schemas.Ptr(42),
},
})
```
</Tab>
</Tabs>
## Response Conversion
Replicate output can be:
- **Single URL**: String → `data[0].url`
- **Multiple URLs**: Array → `data[i].url` for each image
- **Data URIs**: Base64-encoded images in data URI format
```json
{
"id": "xyz789",
"created": 1234567890,
"model": "black-forest-labs/flux-schnell",
"data": [
{
"url": "https://replicate.delivery/pbxt/...",
"index": 0
}
],
"usage": {
"input_tokens": 15,
"output_tokens": 0,
"total_tokens": 15
}
}
```
## Streaming
Image generation streaming provides progressive image updates as data URIs:
**SSE Events:**
- `output`: Data URI chunk (partial image)
- `done`: Final completion with reason
- `error`: Error details
**Flow:**
1. Each `output` event contains a complete data URI (e.g., `data:image/webp;base64,...`)
2. Progressive refinement shows generation progress
3. `done` event signals completion with final image
4. Each chunk includes `Index`, `ChunkIndex`, and `B64JSON` fields
---
# 5. Image Edit
Image edit runs as a prediction like image generation. You send one or more input images plus a prompt; the model returns edited image(s). The same **input image field mapping** as Image Generation applies (see [Field Mapping by Model](#field-mapping-by-model-1) below).
**Endpoint**: `/v1/images/edits` (Bifrost) → Replicate `/v1/predictions` or deployment predictions.
### Parameter Mapping
| Bifrost / Request | Replicate input |
|-------------------|-----------------|
| `input.images` | Mapped to `image_prompt`, `input_image`, `image`, or `input_images` by model |
| `input.prompt` | `prompt` |
| `params.n` | `number_of_images` |
| `params.output_format` | `output_format` |
| `params.quality` | `quality` |
| `params.background` | `background` |
| `params.seed` | `seed` |
| `params.negative_prompt` | `negative_prompt` |
| `params.num_inference_steps` | `num_inference_steps` |
| `params.extra_params` | Merged into prediction input |
### Field Mapping by Model
Input images are mapped to the same fields as in [Image Generation](#field-mapping-by-model):
| Field | Models |
|-------|--------|
| `image_prompt` | `black-forest-labs/flux-1.1-pro`, `black-forest-labs/flux-1.1-pro-ultra`, `black-forest-labs/flux-pro`, `black-forest-labs/flux-1.1-pro-ultra-finetuned` |
| `input_image` | `black-forest-labs/flux-kontext-pro`, `black-forest-labs/flux-kontext-max`, `black-forest-labs/flux-kontext-dev` |
| `image` | `black-forest-labs/flux-dev`, `black-forest-labs/flux-fill-pro`, `black-forest-labs/flux-dev-lora`, `black-forest-labs/flux-krea-dev` |
| `input_images` | All other models (default) |
<Note>
For single-image fields (`image_prompt`, `input_image`, `image`), only the first image from `input.images` is used.
</Note>
### Example
<Tabs>
<Tab title="Gateway">
```bash
curl -X POST 'http://localhost:8080/v1/images/edits' \
--form 'model="replicate/black-forest-labs/flux-fill-pro"' \
--form 'image[]=@"image.png"' \
--form 'prompt="Replace the sky with a starry night"' \
--form 'mask=@"mask.png"'
```
</Tab>
<Tab title="Go SDK">
```go
resp, err := client.ImageEditRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostImageEditRequest{
Provider: schemas.Replicate,
Model: "black-forest-labs/flux-fill-pro",
Input: &schemas.ImageEditInput{
Prompt: "Replace the sky with a starry night",
Images: []schemas.ImageInput{
{ Image: imageBytes },
},
},
})
```
</Tab>
</Tabs>
### Response
Same as Image Generation: single URL → `data[0].url`, array of URLs → `data[i].url`, or data URIs. Response shape is `BifrostImageGenerationResponse` with `data[].url` or `data[].b64_json`.
### Streaming
Image edit streaming is supported. Events use the same prediction log stream as image generation:
- **Partial chunks**: `type: "image_edit.partial_image"` with `b64_json` (or data URI) until completion.
- **Completed**: `type: "image_edit.completed"` with final image and usage.
Use `Prefer: wait` for sync behavior or rely on polling (async) like other Replicate predictions.
---
# 6. Files API
Replicate's Files API supports uploading, listing, and managing files for use in predictions.
## Upload
**Request**: Multipart form-data
| Field | Type | Required | Notes |
|-------|------|----------|-------|
| `file` | binary | ✅ | File content |
| `filename` | string | ❌ | Custom filename |
| `content_type` | string | ❌ | MIME type (auto-detected from extension) |
**Example:**
```bash
curl -X POST http://localhost:8080/v1/files \
-H "Authorization: Bearer $API_KEY" \
-F "file=@document.pdf" \
-F "filename=my-document.pdf"
```
**Response:**
```json
{
"id": "file_abc123",
"object": "file",
"bytes": 12345,
"created_at": 1234567890,
"filename": "my-document.pdf",
"purpose": "batch",
"status": "processed"
}
```
## List Files
**Query Parameters:**
| Parameter | Type | Notes |
|-----------|------|-------|
| `limit` | int | Results per page |
| `after` | string | Pagination cursor |
**Example:**
```bash
curl -X GET "http://localhost:8080/v1/files?limit=20" \
-H "Authorization: Bearer $API_KEY"
```
**Pagination**: Uses cursor-based pagination with `next` URL in response. Bifrost serializes this into the `after` cursor.
## Retrieve / Delete
**Operations:**
- GET `/v1/files/{file_id}` - Retrieve file metadata
- DELETE `/v1/files/{file_id}` - Delete file
## File Content Download
<Warning>
Replicate requires signed download URLs with `owner`, `expiry`, and `signature` parameters.
</Warning>
**Required Parameters in ExtraParams:**
| Parameter | Type | Description |
|-----------|------|-------------|
| `owner` | string | File owner username |
| `expiry` | int64 | Unix timestamp for expiration |
| `signature` | string | Base64-encoded HMAC-SHA256 signature |
**Signature Format**: HMAC-SHA256 of `"{owner} {file_id} {expiry}"` using Files API signing secret
**Example:**
```bash
curl -X POST http://localhost:8080/v1/files/file_abc123/content \
-H "Content-Type: application/json" \
-d '{
"owner": "my-username",
"expiry": 1735689600,
"signature": "base64-encoded-signature"
}'
```
---
# 7. List Models
**Endpoint**: `/v1/models`
<Warning>
List Models returns **account-specific deployments only**, not all public models on Replicate.
</Warning>
Deployments are private or organization models with dedicated infrastructure. The response includes:
```json
{
"data": [
{
"id": "replicate/my-org/my-deployment",
"name": "my-deployment",
"owner": "my-org"
}
],
"has_more": false
}
```
**Usage:**
1. List your deployments via this endpoint
2. Use deployment name as model identifier: `replicate/my-org/my-deployment`
3. Predictions route to deployment-specific endpoint: `/v1/deployments/my-org/my-deployment/predictions`
---
# Extra Parameters
## Model-Specific Parameters
The most important feature for Replicate integration is **extra_params**. Parameters not in Bifrost's standard schema are flattened directly into the prediction `input` object.
### How It Works
```json
// Request with extra params
{
"model": "replicate/stability-ai/sdxl",
"prompt": "A photo of an astronaut",
"temperature": 0.7, // Standard param
"guidance_scale": 7.5, // Model-specific (extra param)
"num_inference_steps": 50, // Model-specific (extra param)
"scheduler": "DPMSolverMultistep" // Model-specific (extra param)
}
// Converted to Replicate prediction input
{
"version": "...",
"input": {
"prompt": "A photo of an astronaut",
"temperature": 0.7,
"guidance_scale": 7.5, // Flattened from extra_params
"num_inference_steps": 50, // Flattened from extra_params
"scheduler": "DPMSolverMultistep" // Flattened from extra_params
}
}
```
### Discovering Model Parameters
Each Replicate model has unique parameters. To find available parameters:
1. **Model Page**: Visit the model on [replicate.com](https://replicate.com)
2. **OpenAPI Schema**: Available at `/v1/models/{owner}/{name}/versions/{version_id}` (includes `openapi_schema`)
3. **Cog Definition**: Check the model's source code (if public)
---
## Caveats
<Accordion title="System Prompt Field Support">
**Severity**: Medium
**Behavior**: Not all models support `system_prompt` field. For unsupported models, system prompt is prepended to conversation prompt.
**Impact**: Prompt structure differs between models
**Models Affected**: `meta/meta-llama-3-8b`, `meta/llama-2-70b`, `openai/gpt-oss-20b`, `openai/o1-mini`, `xai/grok-4`, and all `deepseek-ai/deepseek*` models
**Code**: `chat.go:300-318`
</Accordion>
<Accordion title="Input Image Field Mapping">
**Severity**: Medium
**Behavior**: Different models expect input images in different fields (`image_prompt`, `input_image`, `image`, `input_images`)
**Impact**: Bifrost automatically maps to correct field based on model
**Models Affected**: Flux family models (see Input Image Field Mapping table)
**Code**: `images.go:192-209`
</Accordion>
<Accordion title="Image Content in Chat">
**Severity**: Low
**Behavior**: Only non-base64 image URLs from message content blocks are extracted to `image_input`
**Impact**: Base64-encoded images in messages are ignored
**Code**: `chat.go:58-63`
</Accordion>
<Accordion title="Model-Specific Parameters">
**Severity**: Medium
**Behavior**: Each model has unique input schema; standard parameters may not work for all models
**Impact**: Requires checking model documentation for available parameters
**Mitigation**: Use `extra_params` for model-specific fields
</Accordion>
---
## Video Generation
### Generate (`POST /v1/videos`)
**Request Parameters**
| Parameter | Type | Required | Notes |
|-----------|------|----------|-------|
| `model` | string | ✅ | Replicate model (owner/model or version ID) |
| `prompt` | string | ✅ | Text description of the video |
| `input_reference` | string | ❌ | Reference image (base64 data URL or URL) → mapped to `image` field; OpenAI-hosted models use `input_reference` |
| `seconds` | string | ❌ | Duration → `duration` |
| `seed` | int | ❌ | Seed for reproducibility |
| `negative_prompt` | string | ❌ | What to avoid |
**Extra Params**: Pass model-specific fields directly in the JSON body (unrecognized fields become `extra_params` and are flattened into the prediction input). `webhook` and `webhook_events_filter` are extracted automatically.
**Response**: [`BifrostVideoGenerationResponse`](https://github.com/maximhq/bifrost/blob/main/core/schemas/videos.go) — `id`, `status`, `model`, `videos[]`
**Job Statuses**: `queued` (starting) → `in_progress` (processing) → `completed` / `failed`
### Retrieve / Download
| Operation | Endpoint | Notes |
|-----------|----------|-------|
| Get status | `GET /v1/videos/{id}` | Maps to `/v1/predictions/{id}` |
| Download | `GET /v1/videos/{id}/content` | Downloads from the prediction output URL |
<Note>
Video Delete, List, and Remix are not supported by Replicate.
</Note>
---
## Reference Links
- [Replicate API Documentation](https://replicate.com/docs/topics/predictions/create-a-prediction)
- [Replicate Models](https://replicate.com/explore)
- [Bifrost Replicate Provider Source](https://github.com/maximhq/bifrost/tree/main/core/providers/replicate)