bifrost/docs/providers/supported-providers/anthropic.mdx

---
title: "Anthropic"
description: "Anthropic API conversion guide - structural differences, message handling, thinking/reasoning, and tool conversion"
icon: "asterisk"
---

## Overview

Anthropic has significant structural differences from OpenAI's format. Bifrost performs extensive conversion including:
- **System message extraction** - Removed from messages array, placed in separate `system` field
- **Tool message grouping** - Consecutive tool messages merged into single user message
- **Thinking block transformation** - `reasoning` parameters mapped to Anthropic's `thinking` structure
- **Parameter renaming** - e.g., `max_completion_tokens` → `max_tokens`, `stop` → `stop_sequences`
- **Content format conversion** - Images, files, and other content types adapted to Anthropic's schema

### Supported Operations

| Operation | Non-Streaming | Streaming | Endpoint |
|-----------|---------------|-----------|----------|
| Chat Completions | ✅ | ✅ | `/v1/messages` |
| Responses API | ✅ | ✅ | `/v1/messages` |
| Text Completions | ✅ | ❌ | `/v1/complete` |
| Embeddings | ❌ | ❌ | - |
| Speech (TTS) | ❌ | ❌ | - |
| Transcriptions (STT) | ❌ | ❌ | - |
| Image Generation | ❌ | ❌ | - |
| Files | ✅ | - | `/v1/files` |
| Batch | ✅ | - | `/v1/messages/batches` |
| List Models | ✅ | - | `/v1/models` |

<Note>
**Unsupported Operations** (❌): Embeddings, Speech, Transcriptions, and Image Generation are not supported by the upstream Anthropic API. These return `UnsupportedOperationError`.
</Note>

## Beta Headers

Bifrost automatically manages Anthropic beta headers — detecting required headers from request features and injecting them. Headers are validated per provider to prevent unsupported headers from reaching the upstream API.

| Beta Header | Anthropic | Azure | Vertex | Bedrock | Auto-Injected |
|---|---|---|---|---|---|
| `computer-use-2025-01-24` / `computer-use-2025-11-24` | ✅ | ✅ | ✅ | ✅ | ✅ (tool type detection) |
| `structured-outputs-2025-11-13` | ✅ | ✅ | ❌ | ✅ | ✅ (strict/output_format) |
| `advanced-tool-use-2025-11-20` | ✅ | ✅ | ❌ | ❌ | ✅ (defer_loading/input_examples/allowed_callers) |
| `mcp-client-2025-11-20` | ✅ | ✅ | ❌ | ❌ | ✅ (mcp_servers detection) |
| `prompt-caching-scope-2026-01-05` | ✅ | ✅ | ❌ | ❌ | ✅ (cache_control.scope) |
| `compact-2026-01-12` | ✅ | ✅ | ✅ | ✅ | ✅ (compaction edit) |
| `context-management-2025-06-27` | ✅ | ✅ | ✅ | ✅ | ✅ (clear edits) |
| `files-api-2025-04-14` | ✅ | ✅ | ❌ | ❌ | ✅ (files endpoint) |
| `interleaved-thinking-2025-05-14` | ✅ | ✅ | ✅ | ✅ | ✅ (thinking enabled/adaptive) |
| `skills-2025-10-02` | ✅ | ✅ | ❌ | ❌ | Passthrough |
| `context-1m-2025-08-07` | ✅ | ✅ | ✅ | ✅ | Passthrough |
| `fast-mode-2026-02-01` | ✅ | ❌ | ❌ | ❌ | ✅ (speed=fast) |
| `redact-thinking-2026-02-12` | ✅ | ✅ | ❌ | ❌ | Passthrough |

<Note>
**Passthrough headers** are not auto-injected but are validated and forwarded when set manually via the `anthropic-beta` request header. Unknown headers are forwarded to Anthropic only; for other providers (Vertex, Bedrock, Azure), unknown headers are silently dropped by default to prevent upstream errors.

**Beta header overrides**: You can override the default support per provider via the Beta Headers tab in provider configuration, or by setting `beta_header_overrides` in the provider's `network_config`. See [Beta Header Overrides](/quickstart/gateway/provider-configuration#beta-header-overrides) for details.
</Note>

---

# 1. Chat Completions

## Request Parameters

### Parameter Mapping

| Parameter | Transformation |
|-----------|----------------|
| `max_completion_tokens` | Renamed to `max_tokens` |
| `temperature`, `top_p` | Direct pass-through |
| `stop` | Renamed to `stop_sequences` |
| `response_format` | Converted to `output_format` |
| `tools` | Schema restructured (see [Tool Conversion](#tool-conversion)) |
| `tool_choice` | Type mapped (see [Tool Conversion](#tool-conversion)) |
| `reasoning` | Mapped to `thinking` (see [Reasoning / Thinking](#reasoning--thinking)) |
| `user` | Wrapped in `metadata.user_id` |
| `top_k` | Via `extra_params` (Anthropic-specific) |

### Dropped Parameters

The following parameters are silently ignored: `frequency_penalty`, `presence_penalty`, `logit_bias`, `logprobs`, `top_logprobs`, `seed`, `parallel_tool_calls`, `service_tier`

### Extra Parameters

Use `extra_params` (SDK) or pass directly in request body (Gateway) for Anthropic-specific fields:

<Tabs>
<Tab title="Gateway">

```bash
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet",
    "messages": [{"role": "user", "content": "Hello"}],
    "top_k": 40
  }'
```

</Tab>
<Tab title="Go SDK">

```go
resp, err := client.ChatCompletionRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostChatRequest{
    Provider: schemas.Anthropic,
    Model:    "claude-3-5-sonnet",
    Input:    messages,
    Params: &schemas.ChatParameters{
        ExtraParams: map[string]interface{}{
            "top_k": 40,
        },
    },
})
```

</Tab>
</Tabs>

Anthropic also accepts a top-level `"cache_control": {"type": "ephemeral"}` object on `/anthropic/v1/messages` requests to enable automatic prompt caching, and Bifrost now forwards that directive through unchanged.

### Cache Control

Cache directives can be added to system messages, user messages, and tool definitions to enable prompt caching:

<Tabs>
<Tab title="Gateway">

```bash
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "This is cached context",
            "cache_control": {"type": "ephemeral"}
          }
        ]
      }
    ],
    "system": [
      {
        "type": "text",
        "text": "You are a helpful assistant",
        "cache_control": {"type": "ephemeral"}
      }
    ]
  }'
```

</Tab>
<Tab title="Go SDK">

```go
resp, err := client.ChatCompletionRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostChatRequest{
    Provider: schemas.Anthropic,
    Model:    "claude-3-5-sonnet",
    Input: []schemas.ChatMessage{
        {
            Role: schemas.ChatMessageRoleUser,
            Content: &schemas.ChatMessageContent{
                ContentBlocks: []schemas.ChatContentBlock{
                    {
                        Text: schemas.Ptr("This is cached context"),
                        CacheControl: &schemas.CacheControl{
                            Type: schemas.Ptr("ephemeral"),
                        },
                    },
                },
            },
        },
    },
    SystemMessages: []schemas.ChatMessage{
        {
            Role: schemas.ChatMessageRoleSystem,
            Content: &schemas.ChatMessageContent{
                ContentBlocks: []schemas.ChatContentBlock{
                    {
                        Text: schemas.Ptr("You are a helpful assistant"),
                        CacheControl: &schemas.CacheControl{
                            Type: schemas.Ptr("ephemeral"),
                        },
                    },
                },
            },
        },
    },
})
```

</Tab>
</Tabs>

## Reasoning / Thinking

**Documentation**: See [Bifrost Reasoning Reference](/providers/reasoning)

### Parameter Mapping

- `reasoning.effort` → `thinking.type` (always mapped to `"enabled"`)
- `reasoning.max_tokens` → `thinking.budget_tokens` (token budget for thinking)

### Critical Constraints

- **Minimum budget**: 1024 tokens required; requests below this **fail with error**
- **Dynamic budget**: `-1` is converted to `1024` automatically

### Example

```json
// Request
{"reasoning": {"effort": "high", "max_tokens": 2048}}

// Anthropic conversion
{"thinking": {"type": "enabled", "budget_tokens": 2048}}
```

## Message Conversion

### Critical Caveats

- **System message extraction**: System messages are **removed from messages array** and placed in separate `system` field. Multiple system messages become separate text blocks in the system array.
- **Tool message grouping**: Consecutive tool messages are **merged into single user message** with `tool_result` content blocks.

### Image Conversion

- **URL images**: `{"type": "image_url", "image_url": {}}` → `{"type": "image", "source": {"type": "url", ...}}`
- **Base64 images**: Data URL → `{"type": "image", "source": {"type": "base64", "media_type": "image/png", ...}}`

### Cache Control Locations

Cache directives supported on: system content blocks, user message content blocks, tool definitions (see [Cache Control](#cache-control) examples above)

## Tool Conversion

Tool definitions are restructured: `function.name` → `name`, `function.parameters` → `input_schema`, `function.strict` is dropped.

Tool choice mapping: `"auto"` → `auto` | `"none"` → `none` | `"required"` → `any` | Specific tool → `{"type": "tool", "name": "X"}`

## Response Conversion

### Field Mapping

- `stop_reason` → `finish_reason`: `end_turn`/`stop_sequence` → `stop`, `max_tokens` → `length`, `tool_use` → `tool_calls`
- `input_tokens + cache_read_input_tokens + cache_creation_input_tokens` → `prompt_tokens` (all cache counts rolled into the total)
- Cache token breakdown surfaced in `prompt_tokens_details`:
  - `cache_read_input_tokens` → `prompt_tokens_details.cached_read_tokens`
  - `cache_creation_input_tokens` → `prompt_tokens_details.cached_write_tokens`
- `output_tokens` → `completion_tokens`
- `thinking` blocks → `reasoning_details` with index, type, text, and signature fields
- Tool call arguments converted from JSON object → JSON string

## Streaming

Event sequence: `message_start` → `content_block_start` → `content_block_delta` → `content_block_stop` → `message_delta` → `message_stop`

Delta types: `text_delta` → content | `input_json_delta` → tool arguments | `thinking_delta` → reasoning text | `signature_delta` → reasoning signature

---

## Caveats

<Accordion title="System Message Extraction">
**Severity**: High
**Behavior**: System messages removed from array, placed in separate `system` field
**Impact**: Message array structure differs from input
**Code**: `chat.go:145-167`
</Accordion>

<Accordion title="Tool Message Grouping">
**Severity**: High
**Behavior**: Consecutive tool messages merged into single user message
**Impact**: Message count and structure changes
**Code**: `chat.go:169-216`
</Accordion>

<Accordion title="Minimum Reasoning Budget">
**Severity**: High
**Behavior**: `reasoning.max_tokens` must be >= 1024
**Impact**: Requests with lower values **fail with error**
**Code**: `chat.go:113-115`
</Accordion>

<Accordion title="Dynamic Budget Conversion">
**Severity**: Medium
**Behavior**: `reasoning.max_tokens = -1` converted to `1024`
**Impact**: Dynamic budgeting not supported
**Code**: `chat.go:107-111`
</Accordion>

<Accordion title="Strict Tool Mode Dropped">
**Severity**: Medium
**Behavior**: `strict: true` in tool definitions silently dropped
**Impact**: No schema validation enforcement
**Code**: `chat.go:43-72`
</Accordion>

<Accordion title="Arguments Serialization">
**Severity**: Low
**Behavior**: Tool call `input` (object) serialized to `arguments` (JSON string)
**Code**: `chat.go:341-350`
</Accordion>

---

# 2. Responses API

The Responses API uses the same underlying `/v1/messages` endpoint but converts between OpenAI's Responses format and Anthropic's Messages format.

## Request Parameters

### Parameter Mapping

| Parameter | Transformation |
|-----------|----------------|
| `max_output_tokens` | Renamed to `max_tokens` |
| `temperature`, `top_p` | Direct pass-through |
| `instructions` | Becomes system message |
| `tools` | Schema restructured (see [Chat Completions](#1-chat-completions)) |
| `tool_choice` | Type mapped (see [Chat Completions](#1-chat-completions)) |
| `reasoning` | Mapped to `thinking` (see [Reasoning / Thinking](#reasoning--thinking)) |
| `user` | Wrapped in `metadata.user_id` |
| `text` | Converted to `output_format` |
| `include` | Via `extra_params` (Anthropic-specific) |
| `stop` | Via `extra_params`, renamed to `stop_sequences` |
| `top_k` | Via `extra_params` (Anthropic-specific) |
| `truncation` | Auto-set to `"auto"` for computer tools |

### Extra Parameters

Use `extra_params` (SDK) or pass directly in request body (Gateway):

<Tabs>
<Tab title="Gateway">

```bash
curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet",
    "input": "Hello, how are you?",
    "top_k": 40
  }'
```

</Tab>
<Tab title="Go SDK">

```go
resp, err := client.ResponsesRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostResponsesRequest{
    Provider: schemas.Anthropic,
    Model:    "claude-3-5-sonnet",
    Input:    messages,
    Params: &schemas.ResponsesParameters{
        ExtraParams: map[string]interface{}{
            "top_k": 40,
        },
    },
})
```

</Tab>
</Tabs>

### Cache Control

Cache directives can be added to instructions (system) and input messages to enable prompt caching:

<Tabs>
<Tab title="Gateway">

```bash
curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet",
    "instructions": "You are a helpful assistant. This instruction is cached.",
    "instructions_cache_control": {"type": "ephemeral"},
    "input": [
      {
        "type": "text",
        "text": "Answer this question",
        "cache_control": {"type": "ephemeral"}
      }
    ]
  }'
```

</Tab>
<Tab title="Go SDK">

```go
resp, err := client.ResponsesRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostResponsesRequest{
    Provider: schemas.Anthropic,
    Model:    "claude-3-5-sonnet",
    Input: []schemas.ChatMessage{
        {
            Role: schemas.ChatMessageRoleUser,
            Content: &schemas.ChatMessageContent{
                ContentBlocks: []schemas.ChatContentBlock{
                    {
                        Text: schemas.Ptr("Answer this question"),
                        CacheControl: &schemas.CacheControl{
                            Type: schemas.Ptr("ephemeral"),
                        },
                    },
                },
            },
        },
    },
    Params: &schemas.ResponsesParameters{
        Instructions: schemas.Ptr("You are a helpful assistant. This instruction is cached."),
        InstructionsCacheControl: &schemas.CacheControl{
            Type: schemas.Ptr("ephemeral"),
        },
    },
})
```

</Tab>
</Tabs>

## Input & Instructions

- **Input**: String wrapped as user message or array converted to messages
- **Instructions**: Becomes system message (same extraction as [Chat Completions](#1-chat-completions))

## Tool Support

Supported types: `function`, `computer_use_preview`, `web_search`, `mcp`

Tool conversions same as [Chat Completions](#1-chat-completions) with: MCP tools mapped to `mcp_servers` (server_label → name, server_url → url) and computer tools auto-set with `truncation: "auto"`

Cache control supported on instructions and input blocks (see [Cache Control](#cache-control) examples)

## Response Conversion

- `stop_reason` → `status`: `end_turn`/`stop_sequence` → `completed`, `max_tokens` → `incomplete`
- Top-level `input_tokens` and `output_tokens` are rollups that include cache-related usage; they map as `input_tokens` → `input_tokens` | `output_tokens` → `output_tokens`.
- Cache-specific counts are exposed in details: `cache_read_input_tokens` → `input_tokens_details.cached_read_tokens` | `cache_creation_input_tokens` → `input_tokens_details.cached_write_tokens`
- Output items: `text` → `message` | `tool_use` → `function_call` | `thinking` → `reasoning`

## Streaming

Event sequence: `message_start` → `content_block_start` → `content_block_delta` → `content_block_stop` → `message_delta` → `message_stop`

Special handling: Computer tool arguments accumulated across chunks (emitted on `content_block_stop`), synthetic `content_part.added` events emitted for text/reasoning, MCP calls use `mcp_call_arguments_delta`, item IDs generated as `msg_{messageID}_item_{outputIndex}`

---

# 3. Text Completions (Legacy)

<Warning>
Legacy API using `/v1/complete` endpoint. Streaming not supported.
</Warning>

**Request**: `prompt` auto-wrapped with `\n\nHuman: {prompt}\n\nAssistant:` | `max_tokens` → `max_tokens_to_sample` | `temperature`, `top_p` direct pass-through | `top_k`, `stop` via `extra_params` (→ `stop_sequences`)

**Response**: `completion` → `choices[0].text` | `stop_reason` → `finish_reason`

---

# 4. Batch API

**Request formats**: `requests` array (CustomID + Params) or `input_file_id`

**Pagination**: Cursor-based with `after_id`, `before_id`, `limit`

**Endpoints**:
- POST `/v1/messages/batches` - Create
- GET `/v1/messages/batches` - List
- GET `/v1/messages/batches/{batch_id}` - Retrieve
- POST `/v1/messages/batches/{batch_id}/cancel` - Cancel

**Response**: JSONL format with `{custom_id, result: {type, message}}`

**Status mapping**: `in_progress` → `InProgress`, `canceling` → `Cancelling`, `ended` → `Ended`

**Note**: RFC3339Nano timestamps converted to Unix, multi-key retry supported

---

# 5. Files API

<Note>
Requires beta header: `anthropic-beta: files-api-2025-04-14`
</Note>

**Upload**: Multipart/form-data with `file` (required) and `filename` (optional)

**Field mapping**: `id` | `filename` | `size_bytes` → `bytes` | `created_at` (Unix) | `mime_type` → `content_type`

**Endpoints**: POST `/v1/files`, GET `/v1/files` (cursor pagination), GET `/v1/files/{file_id}`, DELETE `/v1/files/{file_id}`, GET `/v1/files/{file_id}/content`

**Note**: File purpose always `"batch"`, status always `"processed"`

---

# 6. List Models

**Request**: GET `/v1/models?limit={defaultPageSize}` (no body)

**Field mapping**: `id` (prefixed `anthropic/`) | `display_name` → `name` | `created_at` (Unix timestamp)

**Pagination**: Token-based with `NextPageToken`, `FirstID`, `LastID`

**Multi-key support**: Results aggregated from all keys, filtered by `allowed_models` if configured