---
title: "Cohere"
description: "Cohere API conversion guide - parameter mapping, message handling, reasoning/thinking, and tool conversion"
icon: "c"
---
## Overview
Cohere has a different API structure from OpenAI's format. Bifrost performs conversions including:
- **Parameter renaming** - e.g., `max_completion_tokens` → `max_tokens`, `top_p` → `p`, `stop` → `stop_sequences`
- **Message content conversion** - String and content block formats handled
- **Tool conversion** - Tool definitions and tool choice mapped to Cohere format
- **Thinking/Reasoning transformation** - `reasoning` parameters mapped to Cohere's `thinking` structure
- **Response format conversion** - JSON schema handling adapted to Cohere's format
### Supported Operations
| Operation | Non-Streaming | Streaming | Endpoint |
|-----------|---------------|-----------|----------|
| Chat Completions | ✅ | ✅ | `/v2/chat` |
| Responses API | ✅ | ✅ | `/v2/chat` |
| Embeddings | ✅ | - | `/v2/embed` |
| List Models | ✅ | - | `/v1/models` |
| Text Completions | ❌ | ❌ | - |
| Image Generation | ❌ | ❌ | - |
| Speech (TTS) | ❌ | ❌ | - |
| Transcriptions (STT) | ❌ | ❌ | - |
| Files | ❌ | ❌ | - |
| Batch | ❌ | ❌ | - |
**Unsupported Operations** (❌): Text Completions, Image Generation, Speech, Transcriptions, Files, and Batch are not supported by the upstream Cohere API. These return `UnsupportedOperationError`.
---
# 1. Chat Completions
## Request Parameters
### Parameter Mapping
| Parameter | Transformation |
|-----------|----------------|
| `max_completion_tokens` | Renamed to `max_tokens` |
| `temperature`, `top_p` → `p` | Direct pass-through for temperature; `top_p` renamed to `p` |
| `stop` | Renamed to `stop_sequences` |
| `frequency_penalty`, `presence_penalty` | Direct pass-through |
| `response_format` | Converted to structured format (see [Response Format](#response-format)) |
| `tools` | Schema structure adapted (see [Tool Conversion](#tool-conversion)) |
| `tool_choice` | Type mapped (see [Tool Conversion](#tool-conversion)) |
| `reasoning` | Mapped to `thinking` (see [Reasoning / Thinking](#reasoning--thinking)) |
| `user` | Via `extra_params` (not directly supported in Cohere v2 API) |
| `top_k` | Via `extra_params` (Cohere-specific) |
### Dropped Parameters
The following parameters are silently ignored: `logit_bias`, `logprobs`, `top_logprobs`, `seed`, `parallel_tool_calls`, `service_tier`
### Extra Parameters
Use `extra_params` (SDK) or pass directly in request body (Gateway) for Cohere-specific fields:
```bash
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "cohere/command-r-plus",
"messages": [{"role": "user", "content": "Hello"}],
"top_k": 40,
"safety_mode": "STRICT",
"log_probs": true,
"strict_tool_choice": false
}'
```
```go
resp, err := client.ChatCompletionRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostChatRequest{
Provider: schemas.Cohere,
Model: "cohere/command-r-plus",
Input: messages,
Params: &schemas.ChatParameters{
ExtraParams: map[string]interface{}{
"top_k": 40,
"safety_mode": "STRICT",
"log_probs": true,
"strict_tool_choice": false,
},
},
})
```
## Reasoning / Thinking
**Documentation**: See [Bifrost Reasoning Reference](/providers/reasoning)
### Parameter Mapping
- `reasoning.effort` → `thinking.type` (mapped to `"enabled"` or `"disabled"`)
- `reasoning.max_tokens` → `thinking.token_budget` (token budget for thinking)
### Critical Constraints
- **Minimum budget**: 1 token required; requests with 0 tokens will be converted to disabled
- **Dynamic budget**: `-1` is converted to `1` automatically
### Example
```json
// Request
{"reasoning": {"effort": "high", "max_tokens": 2048}}
// Cohere conversion
{"thinking": {"type": "enabled", "token_budget": 2048}}
```
## Message Conversion
### Content Handling
- **String content**: Messages can have simple string content
- **Content blocks**: Messages can have arrays of content blocks (text, images, thinking)
- **Image conversion**: `image_url` blocks with URL are supported
- **Tool calls**: Converted from message assistant tool calls to Cohere format
- **Tool messages**: Tool call results are passed with `tool_call_id`
## Tool Conversion
Tool definitions are adapted to Cohere format with the following mappings:
- Function `name` → `name` (unchanged)
- Function `parameters` → `parameters` (flexible JSON format)
- Strict mode (`strict: true`) is silently dropped (not supported)
Tool choice mapping:
- `"none"` → `"NONE"`
- `"auto"` or `"required"` → `"REQUIRED"` or `"AUTO"`
- Specific tool selection → `"REQUIRED"` (Cohere uses function-level selection)
## Response Format
Supported formats:
- `text` - Plain text response
- `json_object` - Structured JSON response
- `json_schema` - JSON with schema validation (converted to `json_object`)
Schema is passed through `response_format.json_schema` field.
## Response Conversion
### Field Mapping
- `finish_reason`: `COMPLETE` / `STOP_SEQUENCE` → `stop`, `MAX_TOKENS` → `length`, `TOOL_CALL` → `tool_calls`
- `input_tokens` → `prompt_tokens` | `output_tokens` → `completion_tokens`
- `cached_tokens` → `prompt_tokens_details.cached_tokens` (if present)
- Tool call arguments converted from string → string (no conversion needed, Cohere uses string format)
## Streaming
Event sequence: `message-start` → `content-start` → `content-delta` → `content-end` → `message-end`
Delta types:
- `content-delta` with text → message content
- `content-delta` with thinking → reasoning text
- `tool-call-start/delta/end` → tool call events
- `tool-plan-delta` → tool planning output
---
## Caveats
**Severity**: Low
**Behavior**: `reasoning.max_tokens` must be >= 1
**Impact**: Very low impact, conversion happens automatically
**Code**: `chat.go:104-130`
**Severity**: Low
**Behavior**: `top_p` parameter renamed to `p`
**Impact**: Parameter name changes internally
**Code**: `chat.go:99`
**Severity**: Low
**Behavior**: `strict: true` in tool definitions silently dropped
**Impact**: No schema validation enforcement
**Code**: `chat.go:168-185`
**Severity**: Low
**Behavior**: Tool arguments are already strings, no JSON serialization needed
**Impact**: Minimal - Cohere v2 API expects string format
**Code**: `chat.go:70-78`
---
# 2. Responses API
The Responses API uses the same underlying `/v2/chat` endpoint but converts between OpenAI's Responses format and Cohere's format.
## Request Parameters
### Parameter Mapping
| Parameter | Transformation |
|-----------|----------------|
| `max_output_tokens` | Renamed to `max_tokens` |
| `temperature`, `top_p` → `p` | Direct pass-through for temperature; `top_p` renamed to `p` |
| `instructions` | Becomes system message |
| `text.format` | Converted to `response_format` |
| `tools` | Schema restructured (see [Chat Completions](#1-chat-completions)) |
| `tool_choice` | Type mapped (see [Chat Completions](#1-chat-completions)) |
| `reasoning` | Mapped to `thinking` (see [Reasoning / Thinking](#reasoning--thinking)) |
| `stop` | Via `extra_params`, renamed to `stop_sequences` |
| `top_k` | Via `extra_params` (Cohere-specific) |
| `frequency_penalty`, `presence_penalty` | Via `extra_params` |
### Extra Parameters
Use `extra_params` (SDK) or pass directly in request body (Gateway):
```bash
curl -X POST http://localhost:8080/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "cohere/command-r-plus",
"input": "Hello, how are you?",
"top_k": 40,
"stop": [".", "!"]
}'
```
```go
resp, err := client.ResponsesRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostResponsesRequest{
Provider: schemas.Cohere,
Model: "cohere/command-r-plus",
Input: messages,
Params: &schemas.ResponsesParameters{
ExtraParams: map[string]interface{}{
"top_k": 40,
"stop": []string{".", "!"},
},
},
})
```
## Input & Instructions
- **Input**: String converted to user message or array converted to messages
- **Instructions**: Becomes system message (prepended to messages)
## Tool Support
Supported types: `function`
Tool conversions same as [Chat Completions](#1-chat-completions).
## Response Conversion
- `text` → `message` | `tool_use` → `function_call`
- `input_tokens` / `output_tokens` preserved
- Token details with cached tokens support
## Streaming
Event sequence: `message-start` → `content-start` → `content-delta` → `content-end` → `message-end`
Special handling:
- Tool call arguments accumulated across chunks
- Synthetic `output_item.added` events emitted for text/reasoning
- Stable item IDs generated as `msg_{messageID}_item_{outputIndex}`
---
# 3. Embeddings
## Request Parameters
### Parameter Mapping
| Parameter | Transformation |
|-----------|----------------|
| `input` (text or array) | Converted to `texts` array |
| `dimensions` | Renamed to `output_dimension` |
| `input_type` | Via `extra_params` (required, defaults to `"search_document"`) |
| `embedding_types` | Via `extra_params` (array of embedding types) |
| `truncate` | Via `extra_params` (how to handle long inputs) |
| `max_tokens` | Via `extra_params` (max tokens to embed per input) |
### Extra Parameters
Use `extra_params` for Cohere-specific embedding options:
```bash
curl -X POST http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "cohere/embed-english-v3.0",
"input": ["text to embed"],
"input_type": "search_query",
"embedding_types": ["float"],
"truncate": "START"
}'
```
```go
resp, err := client.EmbeddingRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostEmbeddingRequest{
Provider: schemas.Cohere,
Model: "cohere/embed-english-v3.0",
Input: &schemas.EmbeddingInput{
Texts: []string{"text to embed"},
},
Params: &schemas.EmbeddingParameters{
Dimensions: schemas.Ptr(1024),
ExtraParams: map[string]interface{}{
"input_type": "search_query",
"embedding_types": []string{"float"},
"truncate": "START",
},
},
})
```
### Critical Notes
- **Input Type Required**: Cohere v3+ models require `input_type` parameter (defaults to `"search_document"`)
- **Embedding Types**: Specify which embedding types to return (e.g., `"float"`, `"int8"`)
## Response Conversion
- `embeddings.float` → `data[].embedding`
- `meta.tokens` → usage information
- Multiple embedding types handled
---
# 4. List Models
**Request**: GET `/v1/models?page_size={defaultPageSize}`
**Field mapping**: Model data converted to standard format
**Pagination**: Cursor-based with `next_page_token`
**Note**: `endpoint` and `default_only` filters available via `extra_params`