bifrost/docs/providers/supported-providers/replicate.mdx

---
title: "Replicate"
description: "Replicate API conversion guide - prediction-based architecture, model-specific parameters, and async/sync modes"
icon: "R"
---

## Overview

Replicate is architecturally different from other providers in Bifrost. It uses a **prediction-based API** where every request creates a "prediction" that runs asynchronously. Each model on Replicate defines its own input schema, making it highly flexible but requiring model-specific parameter knowledge.

### Key Architectural Differences

1. **Prediction-Based System**: All operations create predictions via `/v1/predictions` or deployment endpoints
2. **Model-Specific Inputs**: Each model has its own parameter schema (use `extra_params` for model-specific fields)
3. **Async/Sync Modes**: Predictions can run synchronously (with `Prefer: wait` header) or asynchronously (with polling)
4. **Flexible Output**: Output can be strings, arrays, URLs, or data URIs depending on the model

### Supported Operations

| Operation | Non-Streaming | Streaming | Endpoint |
|-----------|---------------|-----------|----------|
| Chat Completions | ✅ | ✅ | `/v1/predictions` |
| Responses API | ✅ | ✅ | `/v1/predictions` |
| Text Completions | ✅ | ✅ | `/v1/predictions` |
| Image Generation | ✅ | ✅ | `/v1/predictions` |
| Image Edit | ✅ | ✅ | `/v1/predictions` |
| Video Generation | ✅ | - | `/v1/predictions` |
| Image Variation | ❌ | ❌ | - |
| Files | ✅ | - | `/v1/files` |
| List Models | ✅ | - | `/v1/deployments` |
| Embeddings | ❌ | ❌ | - |
| Speech (TTS) | ❌ | ❌ | - |
| Transcriptions (STT) | ❌ | ❌ | - |
| Batch | ❌ | ❌ | - |

<Note>
**List Models** returns account-specific deployments only, not all public models on Replicate.
</Note>

---

# Model Identification

Replicate models can be specified in three ways:

## 1. Version ID

```bash
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
```

## 2. Model Name

Format: `owner/model-name`

```bash
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/meta/llama-2-7b-chat",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
```

## 3. Deployment

Configure deployed models in the Replicate key configuration. Deployments map custom model identifiers to actual deployment paths.

**Configuration Example:**

```json
{
  "provider": "replicate",
  "value": "your-api-key",
  "aliases": {
    "my-model": "owner/my-deployment-name"
  }
}
```

**Usage:**

```bash
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/my-model",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
```

---

# Prediction Modes

## Sync Mode

Bifrost uses sync mode with the `Prefer: wait` header if it is present in the request headers. The request blocks until the prediction completes or times out (default 60 seconds).

**How it works:**
1. Creates prediction with `Prefer: wait=60` header
2. Replicate holds connection open for up to 60 seconds
3. If prediction completes within timeout, returns result immediately
4. If timeout expires, falls back to polling mode

## Async Mode (Polling)

It is the default mode of Replicate predictions. Bifrost automatically polls the prediction URL every 2 seconds until completion.

**Status Flow**: `starting` → `processing` → `succeeded`/`failed`/`canceled`

---

# 1. Chat Completions

### Message Conversion

**System Messages**: Extracted from messages array and concatenated into `system_prompt` field.

**User/Assistant Messages**: Preserved as conversation context. Text content from content blocks is concatenated with newlines.

**Image Content**: Non-base64 image URLs from message content blocks are extracted and passed as `image_input` array.

```json
// Input
{
  "messages": [
    {"role": "system", "content": "You are helpful"},
    {"role": "user", "content": "Hello"}
  ]
}

// Converted to Replicate format
{
  "input": {
    "system_prompt": "You are helpful",
    "prompt": "Hello",
    "messages": [...] // Original messages array also included
  }
}
```

### System Prompt Filtering

**Important**: Not all Replicate models support the `system_prompt` field. For unsupported models, the system prompt is automatically prepended to the conversation prompt.

**Models without system_prompt support:**
- `meta/meta-llama-3-8b`
- `meta/llama-2-70b`
- `openai/gpt-oss-20b`
- `openai/o1-mini`
- `xai/grok-4`
- All `deepseek-ai/deepseek*` models (e.g., `deepseek-r1`, `deepseek-v3`)

### Model-Specific Parameters

Use `extra_params` to pass model-specific parameters. These are **flattened into the input object**:

<Tabs>
<Tab title="Gateway">

```bash
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/meta/llama-2-7b-chat",
    "messages": [{"role": "user", "content": "Hello"}],
    "temperature": 0.7,
    "top_k": 50,
    "repetition_penalty": 1.1,
    "min_new_tokens": 10
  }'
```

</Tab>
<Tab title="Go SDK">

```go
resp, err := client.ChatCompletionRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostChatRequest{
    Provider: schemas.Replicate,
    Model:    "meta/llama-2-7b-chat",
    Input:    messages,
    Params: &schemas.ChatParameters{
        Temperature: schemas.Ptr(0.7),
        ExtraParams: map[string]interface{}{
            "top_k": 50,
            "repetition_penalty": 1.1,
            "min_new_tokens": 10,
        },
    },
})
```

</Tab>
</Tabs>

<Warning>
**Model Schema Discovery**: Each Replicate model has unique parameters. Check the model's documentation on replicate.com or use the OpenAPI schema from the model version to discover available parameters.
</Warning>

## Response Conversion

### Field Mapping

- **Output**:
  - String → `choices[0].message.content`
  - Array of strings → joined and mapped to `choices[0].message.content`
  - Object with `text` field → `text` value mapped to `choices[0].message.content`
- **Status**: `succeeded` → `finish_reason: "stop"`, `failed` → `finish_reason: "error"`
- **Metrics**: `input_token_count` → `prompt_tokens`, `output_token_count` → `completion_tokens`

### Example Response

```json
{
  "id": "abc123",
  "model": "meta/llama-2-7b-chat",
  "object": "chat.completion",
  "created": 1234567890,
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8,
    "total_tokens": 18
  }
}
```

## Streaming

Replicate streaming uses Server-Sent Events (SSE) with the following event types:

| Event Type | Description | Data Format |
|------------|-------------|-------------|
| `output` | Content chunk | Plain text string |
| `done` | Completion | JSON: `{"reason": ""}` (empty = success) |
| `error` | Error occurred | JSON: `{"detail": "error message"}` |

**Streaming Flow:**
1. Bifrost sets `stream: true` in prediction input
2. Replicate returns `urls.stream` in initial response
3. Bifrost connects to stream URL and processes SSE events
4. `output` events → content deltas
5. `done` event → final chunk with `finish_reason`

**Done Event Reasons:**
- Empty or no reason = success (`finish_reason: "stop"`)
- `"canceled"` = prediction was canceled
- `"error"` = prediction failed

---

# 2. Responses API

The Responses API is converted internally to Chat Completions or native Replicate format depending on the model:

```go
// Responses request → Replicate prediction conversion
ResponsesRequest → ReplicatePredictionRequest → ReplicatePredictionResponse → BifrostResponsesResponse
```

**Conversion Logic:**

1. **For OpenAI models with `gpt-5-structured`**: Uses native Responses format with `input_item_list`, `tools`, and `json_schema` support
2. **For all other models**: Converted to Chat Completions format using message conversion logic

Same parameter mapping and system prompt handling as [Chat Completions](#1-chat-completions).

## Response Format

Responses follow standard Responses API format with status mapping:

| Replicate Status | Responses Status |
|------------------|------------------|
| `succeeded` | `completed` |
| `failed` | `failed` |
| `canceled` | `cancelled` |
| `processing` | `in_progress` |
| `starting` | `queued` |

---

# 3. Text Completions (Legacy)

### Conversion

- **Prompt array**: Joined with newlines into single `prompt` field
- **top_k**: Pass via `extra_params` (model-specific)

### Example

```bash
curl -X POST http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/meta/llama-2-7b",
    "prompt": "Once upon a time",
    "max_tokens": 100,
    "temperature": 0.8,
    "top_k": 40
  }'
```

## Response

Same conversion as chat completions: output string/array → `choices[0].text`, with usage metrics from prediction metrics.

---

# 4. Image Generation

### Parameter Mapping

```json
{
  "prompt": "prompt",
  "n": "number_of_images",
  "aspect_ratio": "aspect_ratio",
  "resolution": "resolution",
  "output_format": "output_format",
  "quality": "quality",
  "background": "background",
  "seed": "seed",
  "negative_prompt": "negative_prompt",
  "num_inference_steps": "num_inference_steps",
  "input_images": "input_images"
}
```

### Input Image Field Mapping

**Important**: Different Replicate models expect input images in different fields. Bifrost automatically maps `input_images` to the correct field based on the model.

**Field Mapping by Model:**

| Field | Models |
|-------|--------|
| `image_prompt` | `black-forest-labs/flux-1.1-pro`<br/>`black-forest-labs/flux-1.1-pro-ultra`<br/>`black-forest-labs/flux-pro`<br/>`black-forest-labs/flux-1.1-pro-ultra-finetuned` |
| `input_image` | `black-forest-labs/flux-kontext-pro`<br/>`black-forest-labs/flux-kontext-max`<br/>`black-forest-labs/flux-kontext-dev` |
| `image` | `black-forest-labs/flux-dev`<br/>`black-forest-labs/flux-fill-pro`<br/>`black-forest-labs/flux-dev-lora`<br/>`black-forest-labs/flux-krea-dev` |
| `input_images` | All other models (default) |

<Note>
For models that expect a single image field (`image_prompt`, `input_image`, `image`), only the first image from the `input_images` array is used.
</Note>

### Example

<Tabs>
<Tab title="Gateway">

```bash
curl -X POST http://localhost:8080/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/black-forest-labs/flux-schnell",
    "prompt": "A serene mountain landscape at sunset",
    "aspect_ratio": "16:9",
    "output_format": "webp",
    "num_inference_steps": 4,
    "seed": 42
  }'
```

</Tab>
<Tab title="Go SDK">

```go
resp, err := client.ImageGenerationRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostImageGenerationRequest{
    Provider: schemas.Replicate,
    Model:    "black-forest-labs/flux-schnell",
    Input: &schemas.ImageGenerationInput{
        Prompt: "A serene mountain landscape at sunset",
    },
    Params: &schemas.ImageGenerationParameters{
        AspectRatio: schemas.Ptr("16:9"),
        OutputFormat: schemas.Ptr("webp"),
        NumInferenceSteps: schemas.Ptr(4),
        Seed: schemas.Ptr(42),
    },
})
```

</Tab>
</Tabs>

## Response Conversion

Replicate output can be:
- **Single URL**: String → `data[0].url`
- **Multiple URLs**: Array → `data[i].url` for each image
- **Data URIs**: Base64-encoded images in data URI format

```json
{
  "id": "xyz789",
  "created": 1234567890,
  "model": "black-forest-labs/flux-schnell",
  "data": [
    {
      "url": "https://replicate.delivery/pbxt/...",
      "index": 0
    }
  ],
  "usage": {
    "input_tokens": 15,
    "output_tokens": 0,
    "total_tokens": 15
  }
}
```

## Streaming

Image generation streaming provides progressive image updates as data URIs:

**SSE Events:**
- `output`: Data URI chunk (partial image)
- `done`: Final completion with reason
- `error`: Error details

**Flow:**
1. Each `output` event contains a complete data URI (e.g., `data:image/webp;base64,...`)
2. Progressive refinement shows generation progress
3. `done` event signals completion with final image
4. Each chunk includes `Index`, `ChunkIndex`, and `B64JSON` fields

---

# 5. Image Edit

Image edit runs as a prediction like image generation. You send one or more input images plus a prompt; the model returns edited image(s). The same **input image field mapping** as Image Generation applies (see [Field Mapping by Model](#field-mapping-by-model-1) below).

**Endpoint**: `/v1/images/edits` (Bifrost) → Replicate `/v1/predictions` or deployment predictions.

### Parameter Mapping

| Bifrost / Request | Replicate input |
|-------------------|-----------------|
| `input.images` | Mapped to `image_prompt`, `input_image`, `image`, or `input_images` by model |
| `input.prompt` | `prompt` |
| `params.n` | `number_of_images` |
| `params.output_format` | `output_format` |
| `params.quality` | `quality` |
| `params.background` | `background` |
| `params.seed` | `seed` |
| `params.negative_prompt` | `negative_prompt` |
| `params.num_inference_steps` | `num_inference_steps` |
| `params.extra_params` | Merged into prediction input |

### Field Mapping by Model

Input images are mapped to the same fields as in [Image Generation](#field-mapping-by-model):

| Field | Models |
|-------|--------|
| `image_prompt` | `black-forest-labs/flux-1.1-pro`, `black-forest-labs/flux-1.1-pro-ultra`, `black-forest-labs/flux-pro`, `black-forest-labs/flux-1.1-pro-ultra-finetuned` |
| `input_image` | `black-forest-labs/flux-kontext-pro`, `black-forest-labs/flux-kontext-max`, `black-forest-labs/flux-kontext-dev` |
| `image` | `black-forest-labs/flux-dev`, `black-forest-labs/flux-fill-pro`, `black-forest-labs/flux-dev-lora`, `black-forest-labs/flux-krea-dev` |
| `input_images` | All other models (default) |

<Note>
For single-image fields (`image_prompt`, `input_image`, `image`), only the first image from `input.images` is used.
</Note>

### Example

<Tabs>
<Tab title="Gateway">

```bash
curl -X POST 'http://localhost:8080/v1/images/edits' \
--form 'model="replicate/black-forest-labs/flux-fill-pro"' \
--form 'image[]=@"image.png"' \
--form 'prompt="Replace the sky with a starry night"' \
--form 'mask=@"mask.png"'
```

</Tab>
<Tab title="Go SDK">

```go
resp, err := client.ImageEditRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostImageEditRequest{
    Provider: schemas.Replicate,
    Model:    "black-forest-labs/flux-fill-pro",
    Input: &schemas.ImageEditInput{
        Prompt: "Replace the sky with a starry night",
        Images: []schemas.ImageInput{
            { Image: imageBytes },
        },
    },
})
```

</Tab>
</Tabs>

### Response

Same as Image Generation: single URL → `data[0].url`, array of URLs → `data[i].url`, or data URIs. Response shape is `BifrostImageGenerationResponse` with `data[].url` or `data[].b64_json`.

### Streaming

Image edit streaming is supported. Events use the same prediction log stream as image generation:

- **Partial chunks**: `type: "image_edit.partial_image"` with `b64_json` (or data URI) until completion.
- **Completed**: `type: "image_edit.completed"` with final image and usage.

Use `Prefer: wait` for sync behavior or rely on polling (async) like other Replicate predictions.

---

# 6. Files API

Replicate's Files API supports uploading, listing, and managing files for use in predictions.

## Upload

**Request**: Multipart form-data

| Field | Type | Required | Notes |
|-------|------|----------|-------|
| `file` | binary | ✅ | File content |
| `filename` | string | ❌ | Custom filename |
| `content_type` | string | ❌ | MIME type (auto-detected from extension) |

**Example:**

```bash
curl -X POST http://localhost:8080/v1/files \
  -H "Authorization: Bearer $API_KEY" \
  -F "file=@document.pdf" \
  -F "filename=my-document.pdf"
```

**Response:**

```json
{
  "id": "file_abc123",
  "object": "file",
  "bytes": 12345,
  "created_at": 1234567890,
  "filename": "my-document.pdf",
  "purpose": "batch",
  "status": "processed"
}
```

## List Files

**Query Parameters:**

| Parameter | Type | Notes |
|-----------|------|-------|
| `limit` | int | Results per page |
| `after` | string | Pagination cursor |

**Example:**

```bash
curl -X GET "http://localhost:8080/v1/files?limit=20" \
  -H "Authorization: Bearer $API_KEY"
```

**Pagination**: Uses cursor-based pagination with `next` URL in response. Bifrost serializes this into the `after` cursor.

## Retrieve / Delete

**Operations:**
- GET `/v1/files/{file_id}` - Retrieve file metadata
- DELETE `/v1/files/{file_id}` - Delete file

## File Content Download

<Warning>
Replicate requires signed download URLs with `owner`, `expiry`, and `signature` parameters.
</Warning>

**Required Parameters in ExtraParams:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `owner` | string | File owner username |
| `expiry` | int64 | Unix timestamp for expiration |
| `signature` | string | Base64-encoded HMAC-SHA256 signature |

**Signature Format**: HMAC-SHA256 of `"{owner} {file_id} {expiry}"` using Files API signing secret

**Example:**

```bash
curl -X POST http://localhost:8080/v1/files/file_abc123/content \
  -H "Content-Type: application/json" \
  -d '{
    "owner": "my-username",
    "expiry": 1735689600,
    "signature": "base64-encoded-signature"
  }'
```

---

# 7. List Models

**Endpoint**: `/v1/models`

<Warning>
List Models returns **account-specific deployments only**, not all public models on Replicate.
</Warning>

Deployments are private or organization models with dedicated infrastructure. The response includes:

```json
{
  "data": [
    {
      "id": "replicate/my-org/my-deployment",
      "name": "my-deployment",
      "owner": "my-org"
    }
  ],
  "has_more": false
}
```

**Usage:**
1. List your deployments via this endpoint
2. Use deployment name as model identifier: `replicate/my-org/my-deployment`
3. Predictions route to deployment-specific endpoint: `/v1/deployments/my-org/my-deployment/predictions`

---

# Extra Parameters

## Model-Specific Parameters

The most important feature for Replicate integration is **extra_params**. Parameters not in Bifrost's standard schema are flattened directly into the prediction `input` object.

### How It Works

```json
// Request with extra params
{
  "model": "replicate/stability-ai/sdxl",
  "prompt": "A photo of an astronaut",
  "temperature": 0.7,          // Standard param
  "guidance_scale": 7.5,       // Model-specific (extra param)
  "num_inference_steps": 50,   // Model-specific (extra param)
  "scheduler": "DPMSolverMultistep"  // Model-specific (extra param)
}

// Converted to Replicate prediction input
{
  "version": "...",
  "input": {
    "prompt": "A photo of an astronaut",
    "temperature": 0.7,
    "guidance_scale": 7.5,       // Flattened from extra_params
    "num_inference_steps": 50,   // Flattened from extra_params
    "scheduler": "DPMSolverMultistep"  // Flattened from extra_params
  }
}
```

### Discovering Model Parameters

Each Replicate model has unique parameters. To find available parameters:

1. **Model Page**: Visit the model on [replicate.com](https://replicate.com)
2. **OpenAPI Schema**: Available at `/v1/models/{owner}/{name}/versions/{version_id}` (includes `openapi_schema`)
3. **Cog Definition**: Check the model's source code (if public)

---

## Caveats

<Accordion title="System Prompt Field Support">
**Severity**: Medium
**Behavior**: Not all models support `system_prompt` field. For unsupported models, system prompt is prepended to conversation prompt.
**Impact**: Prompt structure differs between models
**Models Affected**: `meta/meta-llama-3-8b`, `meta/llama-2-70b`, `openai/gpt-oss-20b`, `openai/o1-mini`, `xai/grok-4`, and all `deepseek-ai/deepseek*` models
**Code**: `chat.go:300-318`
</Accordion>

<Accordion title="Input Image Field Mapping">
**Severity**: Medium
**Behavior**: Different models expect input images in different fields (`image_prompt`, `input_image`, `image`, `input_images`)
**Impact**: Bifrost automatically maps to correct field based on model
**Models Affected**: Flux family models (see Input Image Field Mapping table)
**Code**: `images.go:192-209`
</Accordion>

<Accordion title="Image Content in Chat">
**Severity**: Low
**Behavior**: Only non-base64 image URLs from message content blocks are extracted to `image_input`
**Impact**: Base64-encoded images in messages are ignored
**Code**: `chat.go:58-63`
</Accordion>

<Accordion title="Model-Specific Parameters">
**Severity**: Medium
**Behavior**: Each model has unique input schema; standard parameters may not work for all models
**Impact**: Requires checking model documentation for available parameters
**Mitigation**: Use `extra_params` for model-specific fields
</Accordion>


---

## Video Generation

### Generate (`POST /v1/videos`)

**Request Parameters**

| Parameter | Type | Required | Notes |
|-----------|------|----------|-------|
| `model` | string | ✅ | Replicate model (owner/model or version ID) |
| `prompt` | string | ✅ | Text description of the video |
| `input_reference` | string | ❌ | Reference image (base64 data URL or URL) → mapped to `image` field; OpenAI-hosted models use `input_reference` |
| `seconds` | string | ❌ | Duration → `duration` |
| `seed` | int | ❌ | Seed for reproducibility |
| `negative_prompt` | string | ❌ | What to avoid |

**Extra Params**: Pass model-specific fields directly in the JSON body (unrecognized fields become `extra_params` and are flattened into the prediction input). `webhook` and `webhook_events_filter` are extracted automatically.


**Response**: [`BifrostVideoGenerationResponse`](https://github.com/maximhq/bifrost/blob/main/core/schemas/videos.go) — `id`, `status`, `model`, `videos[]`

**Job Statuses**: `queued` (starting) → `in_progress` (processing) → `completed` / `failed`

### Retrieve / Download

| Operation | Endpoint | Notes |
|-----------|----------|-------|
| Get status | `GET /v1/videos/{id}` | Maps to `/v1/predictions/{id}` |
| Download | `GET /v1/videos/{id}/content` | Downloads from the prediction output URL |

<Note>
Video Delete, List, and Remix are not supported by Replicate.
</Note>

---

## Reference Links

- [Replicate API Documentation](https://replicate.com/docs/topics/predictions/create-a-prediction)
- [Replicate Models](https://replicate.com/explore)
- [Bifrost Replicate Provider Source](https://github.com/maximhq/bifrost/tree/main/core/providers/replicate)