376 lines
11 KiB
Plaintext
376 lines
11 KiB
Plaintext
---
|
|
title: "Cohere"
|
|
description: "Cohere API conversion guide - parameter mapping, message handling, reasoning/thinking, and tool conversion"
|
|
icon: "c"
|
|
---
|
|
|
|
## Overview
|
|
|
|
Cohere has a different API structure from OpenAI's format. Bifrost performs conversions including:
|
|
- **Parameter renaming** - e.g., `max_completion_tokens` → `max_tokens`, `top_p` → `p`, `stop` → `stop_sequences`
|
|
- **Message content conversion** - String and content block formats handled
|
|
- **Tool conversion** - Tool definitions and tool choice mapped to Cohere format
|
|
- **Thinking/Reasoning transformation** - `reasoning` parameters mapped to Cohere's `thinking` structure
|
|
- **Response format conversion** - JSON schema handling adapted to Cohere's format
|
|
|
|
### Supported Operations
|
|
|
|
| Operation | Non-Streaming | Streaming | Endpoint |
|
|
|-----------|---------------|-----------|----------|
|
|
| Chat Completions | ✅ | ✅ | `/v2/chat` |
|
|
| Responses API | ✅ | ✅ | `/v2/chat` |
|
|
| Embeddings | ✅ | - | `/v2/embed` |
|
|
| List Models | ✅ | - | `/v1/models` |
|
|
| Text Completions | ❌ | ❌ | - |
|
|
| Image Generation | ❌ | ❌ | - |
|
|
| Speech (TTS) | ❌ | ❌ | - |
|
|
| Transcriptions (STT) | ❌ | ❌ | - |
|
|
| Files | ❌ | ❌ | - |
|
|
| Batch | ❌ | ❌ | - |
|
|
|
|
<Note>
|
|
**Unsupported Operations** (❌): Text Completions, Image Generation, Speech, Transcriptions, Files, and Batch are not supported by the upstream Cohere API. These return `UnsupportedOperationError`.
|
|
</Note>
|
|
|
|
---
|
|
|
|
# 1. Chat Completions
|
|
|
|
## Request Parameters
|
|
|
|
### Parameter Mapping
|
|
|
|
| Parameter | Transformation |
|
|
|-----------|----------------|
|
|
| `max_completion_tokens` | Renamed to `max_tokens` |
|
|
| `temperature`, `top_p` → `p` | Direct pass-through for temperature; `top_p` renamed to `p` |
|
|
| `stop` | Renamed to `stop_sequences` |
|
|
| `frequency_penalty`, `presence_penalty` | Direct pass-through |
|
|
| `response_format` | Converted to structured format (see [Response Format](#response-format)) |
|
|
| `tools` | Schema structure adapted (see [Tool Conversion](#tool-conversion)) |
|
|
| `tool_choice` | Type mapped (see [Tool Conversion](#tool-conversion)) |
|
|
| `reasoning` | Mapped to `thinking` (see [Reasoning / Thinking](#reasoning--thinking)) |
|
|
| `user` | Via `extra_params` (not directly supported in Cohere v2 API) |
|
|
| `top_k` | Via `extra_params` (Cohere-specific) |
|
|
|
|
### Dropped Parameters
|
|
|
|
The following parameters are silently ignored: `logit_bias`, `logprobs`, `top_logprobs`, `seed`, `parallel_tool_calls`, `service_tier`
|
|
|
|
### Extra Parameters
|
|
|
|
Use `extra_params` (SDK) or pass directly in request body (Gateway) for Cohere-specific fields:
|
|
|
|
<Tabs>
|
|
<Tab title="Gateway">
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8080/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "cohere/command-r-plus",
|
|
"messages": [{"role": "user", "content": "Hello"}],
|
|
"top_k": 40,
|
|
"safety_mode": "STRICT",
|
|
"log_probs": true,
|
|
"strict_tool_choice": false
|
|
}'
|
|
```
|
|
|
|
</Tab>
|
|
<Tab title="Go SDK">
|
|
|
|
```go
|
|
resp, err := client.ChatCompletionRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostChatRequest{
|
|
Provider: schemas.Cohere,
|
|
Model: "cohere/command-r-plus",
|
|
Input: messages,
|
|
Params: &schemas.ChatParameters{
|
|
ExtraParams: map[string]interface{}{
|
|
"top_k": 40,
|
|
"safety_mode": "STRICT",
|
|
"log_probs": true,
|
|
"strict_tool_choice": false,
|
|
},
|
|
},
|
|
})
|
|
```
|
|
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
## Reasoning / Thinking
|
|
|
|
**Documentation**: See [Bifrost Reasoning Reference](/providers/reasoning)
|
|
|
|
### Parameter Mapping
|
|
|
|
- `reasoning.effort` → `thinking.type` (mapped to `"enabled"` or `"disabled"`)
|
|
- `reasoning.max_tokens` → `thinking.token_budget` (token budget for thinking)
|
|
|
|
### Critical Constraints
|
|
|
|
- **Minimum budget**: 1 token required; requests with 0 tokens will be converted to disabled
|
|
- **Dynamic budget**: `-1` is converted to `1` automatically
|
|
|
|
### Example
|
|
|
|
```json
|
|
// Request
|
|
{"reasoning": {"effort": "high", "max_tokens": 2048}}
|
|
|
|
// Cohere conversion
|
|
{"thinking": {"type": "enabled", "token_budget": 2048}}
|
|
```
|
|
|
|
## Message Conversion
|
|
|
|
### Content Handling
|
|
|
|
- **String content**: Messages can have simple string content
|
|
- **Content blocks**: Messages can have arrays of content blocks (text, images, thinking)
|
|
- **Image conversion**: `image_url` blocks with URL are supported
|
|
- **Tool calls**: Converted from message assistant tool calls to Cohere format
|
|
- **Tool messages**: Tool call results are passed with `tool_call_id`
|
|
|
|
## Tool Conversion
|
|
|
|
Tool definitions are adapted to Cohere format with the following mappings:
|
|
- Function `name` → `name` (unchanged)
|
|
- Function `parameters` → `parameters` (flexible JSON format)
|
|
- Strict mode (`strict: true`) is silently dropped (not supported)
|
|
|
|
Tool choice mapping:
|
|
- `"none"` → `"NONE"`
|
|
- `"auto"` or `"required"` → `"REQUIRED"` or `"AUTO"`
|
|
- Specific tool selection → `"REQUIRED"` (Cohere uses function-level selection)
|
|
|
|
## Response Format
|
|
|
|
Supported formats:
|
|
- `text` - Plain text response
|
|
- `json_object` - Structured JSON response
|
|
- `json_schema` - JSON with schema validation (converted to `json_object`)
|
|
|
|
Schema is passed through `response_format.json_schema` field.
|
|
|
|
## Response Conversion
|
|
|
|
### Field Mapping
|
|
|
|
- `finish_reason`: `COMPLETE` / `STOP_SEQUENCE` → `stop`, `MAX_TOKENS` → `length`, `TOOL_CALL` → `tool_calls`
|
|
- `input_tokens` → `prompt_tokens` | `output_tokens` → `completion_tokens`
|
|
- `cached_tokens` → `prompt_tokens_details.cached_tokens` (if present)
|
|
- Tool call arguments converted from string → string (no conversion needed, Cohere uses string format)
|
|
|
|
## Streaming
|
|
|
|
Event sequence: `message-start` → `content-start` → `content-delta` → `content-end` → `message-end`
|
|
|
|
Delta types:
|
|
- `content-delta` with text → message content
|
|
- `content-delta` with thinking → reasoning text
|
|
- `tool-call-start/delta/end` → tool call events
|
|
- `tool-plan-delta` → tool planning output
|
|
|
|
---
|
|
|
|
## Caveats
|
|
|
|
<Accordion title="Minimum Thinking Budget">
|
|
**Severity**: Low
|
|
**Behavior**: `reasoning.max_tokens` must be >= 1
|
|
**Impact**: Very low impact, conversion happens automatically
|
|
**Code**: `chat.go:104-130`
|
|
</Accordion>
|
|
|
|
<Accordion title="Top P Renamed">
|
|
**Severity**: Low
|
|
**Behavior**: `top_p` parameter renamed to `p`
|
|
**Impact**: Parameter name changes internally
|
|
**Code**: `chat.go:99`
|
|
</Accordion>
|
|
|
|
<Accordion title="Strict Tool Mode Dropped">
|
|
**Severity**: Low
|
|
**Behavior**: `strict: true` in tool definitions silently dropped
|
|
**Impact**: No schema validation enforcement
|
|
**Code**: `chat.go:168-185`
|
|
</Accordion>
|
|
|
|
<Accordion title="Tool Arguments Format">
|
|
**Severity**: Low
|
|
**Behavior**: Tool arguments are already strings, no JSON serialization needed
|
|
**Impact**: Minimal - Cohere v2 API expects string format
|
|
**Code**: `chat.go:70-78`
|
|
</Accordion>
|
|
|
|
---
|
|
|
|
# 2. Responses API
|
|
|
|
The Responses API uses the same underlying `/v2/chat` endpoint but converts between OpenAI's Responses format and Cohere's format.
|
|
|
|
## Request Parameters
|
|
|
|
### Parameter Mapping
|
|
|
|
| Parameter | Transformation |
|
|
|-----------|----------------|
|
|
| `max_output_tokens` | Renamed to `max_tokens` |
|
|
| `temperature`, `top_p` → `p` | Direct pass-through for temperature; `top_p` renamed to `p` |
|
|
| `instructions` | Becomes system message |
|
|
| `text.format` | Converted to `response_format` |
|
|
| `tools` | Schema restructured (see [Chat Completions](#1-chat-completions)) |
|
|
| `tool_choice` | Type mapped (see [Chat Completions](#1-chat-completions)) |
|
|
| `reasoning` | Mapped to `thinking` (see [Reasoning / Thinking](#reasoning--thinking)) |
|
|
| `stop` | Via `extra_params`, renamed to `stop_sequences` |
|
|
| `top_k` | Via `extra_params` (Cohere-specific) |
|
|
| `frequency_penalty`, `presence_penalty` | Via `extra_params` |
|
|
|
|
### Extra Parameters
|
|
|
|
Use `extra_params` (SDK) or pass directly in request body (Gateway):
|
|
|
|
<Tabs>
|
|
<Tab title="Gateway">
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8080/v1/responses \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "cohere/command-r-plus",
|
|
"input": "Hello, how are you?",
|
|
"top_k": 40,
|
|
"stop": [".", "!"]
|
|
}'
|
|
```
|
|
|
|
</Tab>
|
|
<Tab title="Go SDK">
|
|
|
|
```go
|
|
resp, err := client.ResponsesRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostResponsesRequest{
|
|
Provider: schemas.Cohere,
|
|
Model: "cohere/command-r-plus",
|
|
Input: messages,
|
|
Params: &schemas.ResponsesParameters{
|
|
ExtraParams: map[string]interface{}{
|
|
"top_k": 40,
|
|
"stop": []string{".", "!"},
|
|
},
|
|
},
|
|
})
|
|
```
|
|
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
## Input & Instructions
|
|
|
|
- **Input**: String converted to user message or array converted to messages
|
|
- **Instructions**: Becomes system message (prepended to messages)
|
|
|
|
## Tool Support
|
|
|
|
Supported types: `function`
|
|
|
|
Tool conversions same as [Chat Completions](#1-chat-completions).
|
|
|
|
## Response Conversion
|
|
|
|
- `text` → `message` | `tool_use` → `function_call`
|
|
- `input_tokens` / `output_tokens` preserved
|
|
- Token details with cached tokens support
|
|
|
|
## Streaming
|
|
|
|
Event sequence: `message-start` → `content-start` → `content-delta` → `content-end` → `message-end`
|
|
|
|
Special handling:
|
|
- Tool call arguments accumulated across chunks
|
|
- Synthetic `output_item.added` events emitted for text/reasoning
|
|
- Stable item IDs generated as `msg_{messageID}_item_{outputIndex}`
|
|
|
|
---
|
|
|
|
# 3. Embeddings
|
|
|
|
## Request Parameters
|
|
|
|
### Parameter Mapping
|
|
|
|
| Parameter | Transformation |
|
|
|-----------|----------------|
|
|
| `input` (text or array) | Converted to `texts` array |
|
|
| `dimensions` | Renamed to `output_dimension` |
|
|
| `input_type` | Via `extra_params` (required, defaults to `"search_document"`) |
|
|
| `embedding_types` | Via `extra_params` (array of embedding types) |
|
|
| `truncate` | Via `extra_params` (how to handle long inputs) |
|
|
| `max_tokens` | Via `extra_params` (max tokens to embed per input) |
|
|
|
|
### Extra Parameters
|
|
|
|
Use `extra_params` for Cohere-specific embedding options:
|
|
|
|
<Tabs>
|
|
<Tab title="Gateway">
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8080/v1/embeddings \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "cohere/embed-english-v3.0",
|
|
"input": ["text to embed"],
|
|
"input_type": "search_query",
|
|
"embedding_types": ["float"],
|
|
"truncate": "START"
|
|
}'
|
|
```
|
|
|
|
</Tab>
|
|
<Tab title="Go SDK">
|
|
|
|
```go
|
|
resp, err := client.EmbeddingRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostEmbeddingRequest{
|
|
Provider: schemas.Cohere,
|
|
Model: "cohere/embed-english-v3.0",
|
|
Input: &schemas.EmbeddingInput{
|
|
Texts: []string{"text to embed"},
|
|
},
|
|
Params: &schemas.EmbeddingParameters{
|
|
Dimensions: schemas.Ptr(1024),
|
|
ExtraParams: map[string]interface{}{
|
|
"input_type": "search_query",
|
|
"embedding_types": []string{"float"},
|
|
"truncate": "START",
|
|
},
|
|
},
|
|
})
|
|
```
|
|
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
### Critical Notes
|
|
|
|
- **Input Type Required**: Cohere v3+ models require `input_type` parameter (defaults to `"search_document"`)
|
|
- **Embedding Types**: Specify which embedding types to return (e.g., `"float"`, `"int8"`)
|
|
|
|
## Response Conversion
|
|
|
|
- `embeddings.float` → `data[].embedding`
|
|
- `meta.tokens` → usage information
|
|
- Multiple embedding types handled
|
|
|
|
---
|
|
|
|
# 4. List Models
|
|
|
|
**Request**: GET `/v1/models?page_size={defaultPageSize}`
|
|
|
|
**Field mapping**: Model data converted to standard format
|
|
|
|
**Pagination**: Cursor-based with `next_page_token`
|
|
|
|
**Note**: `endpoint` and `default_only` filters available via `extra_params` |