--- title: "Cohere" description: "Cohere API conversion guide - parameter mapping, message handling, reasoning/thinking, and tool conversion" icon: "c" --- ## Overview Cohere has a different API structure from OpenAI's format. Bifrost performs conversions including: - **Parameter renaming** - e.g., `max_completion_tokens` → `max_tokens`, `top_p` → `p`, `stop` → `stop_sequences` - **Message content conversion** - String and content block formats handled - **Tool conversion** - Tool definitions and tool choice mapped to Cohere format - **Thinking/Reasoning transformation** - `reasoning` parameters mapped to Cohere's `thinking` structure - **Response format conversion** - JSON schema handling adapted to Cohere's format ### Supported Operations | Operation | Non-Streaming | Streaming | Endpoint | |-----------|---------------|-----------|----------| | Chat Completions | ✅ | ✅ | `/v2/chat` | | Responses API | ✅ | ✅ | `/v2/chat` | | Embeddings | ✅ | - | `/v2/embed` | | List Models | ✅ | - | `/v1/models` | | Text Completions | ❌ | ❌ | - | | Image Generation | ❌ | ❌ | - | | Speech (TTS) | ❌ | ❌ | - | | Transcriptions (STT) | ❌ | ❌ | - | | Files | ❌ | ❌ | - | | Batch | ❌ | ❌ | - | **Unsupported Operations** (❌): Text Completions, Image Generation, Speech, Transcriptions, Files, and Batch are not supported by the upstream Cohere API. These return `UnsupportedOperationError`. --- # 1. Chat Completions ## Request Parameters ### Parameter Mapping | Parameter | Transformation | |-----------|----------------| | `max_completion_tokens` | Renamed to `max_tokens` | | `temperature`, `top_p` → `p` | Direct pass-through for temperature; `top_p` renamed to `p` | | `stop` | Renamed to `stop_sequences` | | `frequency_penalty`, `presence_penalty` | Direct pass-through | | `response_format` | Converted to structured format (see [Response Format](#response-format)) | | `tools` | Schema structure adapted (see [Tool Conversion](#tool-conversion)) | | `tool_choice` | Type mapped (see [Tool Conversion](#tool-conversion)) | | `reasoning` | Mapped to `thinking` (see [Reasoning / Thinking](#reasoning--thinking)) | | `user` | Via `extra_params` (not directly supported in Cohere v2 API) | | `top_k` | Via `extra_params` (Cohere-specific) | ### Dropped Parameters The following parameters are silently ignored: `logit_bias`, `logprobs`, `top_logprobs`, `seed`, `parallel_tool_calls`, `service_tier` ### Extra Parameters Use `extra_params` (SDK) or pass directly in request body (Gateway) for Cohere-specific fields: ```bash curl -X POST http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "cohere/command-r-plus", "messages": [{"role": "user", "content": "Hello"}], "top_k": 40, "safety_mode": "STRICT", "log_probs": true, "strict_tool_choice": false }' ``` ```go resp, err := client.ChatCompletionRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostChatRequest{ Provider: schemas.Cohere, Model: "cohere/command-r-plus", Input: messages, Params: &schemas.ChatParameters{ ExtraParams: map[string]interface{}{ "top_k": 40, "safety_mode": "STRICT", "log_probs": true, "strict_tool_choice": false, }, }, }) ``` ## Reasoning / Thinking **Documentation**: See [Bifrost Reasoning Reference](/providers/reasoning) ### Parameter Mapping - `reasoning.effort` → `thinking.type` (mapped to `"enabled"` or `"disabled"`) - `reasoning.max_tokens` → `thinking.token_budget` (token budget for thinking) ### Critical Constraints - **Minimum budget**: 1 token required; requests with 0 tokens will be converted to disabled - **Dynamic budget**: `-1` is converted to `1` automatically ### Example ```json // Request {"reasoning": {"effort": "high", "max_tokens": 2048}} // Cohere conversion {"thinking": {"type": "enabled", "token_budget": 2048}} ``` ## Message Conversion ### Content Handling - **String content**: Messages can have simple string content - **Content blocks**: Messages can have arrays of content blocks (text, images, thinking) - **Image conversion**: `image_url` blocks with URL are supported - **Tool calls**: Converted from message assistant tool calls to Cohere format - **Tool messages**: Tool call results are passed with `tool_call_id` ## Tool Conversion Tool definitions are adapted to Cohere format with the following mappings: - Function `name` → `name` (unchanged) - Function `parameters` → `parameters` (flexible JSON format) - Strict mode (`strict: true`) is silently dropped (not supported) Tool choice mapping: - `"none"` → `"NONE"` - `"auto"` or `"required"` → `"REQUIRED"` or `"AUTO"` - Specific tool selection → `"REQUIRED"` (Cohere uses function-level selection) ## Response Format Supported formats: - `text` - Plain text response - `json_object` - Structured JSON response - `json_schema` - JSON with schema validation (converted to `json_object`) Schema is passed through `response_format.json_schema` field. ## Response Conversion ### Field Mapping - `finish_reason`: `COMPLETE` / `STOP_SEQUENCE` → `stop`, `MAX_TOKENS` → `length`, `TOOL_CALL` → `tool_calls` - `input_tokens` → `prompt_tokens` | `output_tokens` → `completion_tokens` - `cached_tokens` → `prompt_tokens_details.cached_tokens` (if present) - Tool call arguments converted from string → string (no conversion needed, Cohere uses string format) ## Streaming Event sequence: `message-start` → `content-start` → `content-delta` → `content-end` → `message-end` Delta types: - `content-delta` with text → message content - `content-delta` with thinking → reasoning text - `tool-call-start/delta/end` → tool call events - `tool-plan-delta` → tool planning output --- ## Caveats **Severity**: Low **Behavior**: `reasoning.max_tokens` must be >= 1 **Impact**: Very low impact, conversion happens automatically **Code**: `chat.go:104-130` **Severity**: Low **Behavior**: `top_p` parameter renamed to `p` **Impact**: Parameter name changes internally **Code**: `chat.go:99` **Severity**: Low **Behavior**: `strict: true` in tool definitions silently dropped **Impact**: No schema validation enforcement **Code**: `chat.go:168-185` **Severity**: Low **Behavior**: Tool arguments are already strings, no JSON serialization needed **Impact**: Minimal - Cohere v2 API expects string format **Code**: `chat.go:70-78` --- # 2. Responses API The Responses API uses the same underlying `/v2/chat` endpoint but converts between OpenAI's Responses format and Cohere's format. ## Request Parameters ### Parameter Mapping | Parameter | Transformation | |-----------|----------------| | `max_output_tokens` | Renamed to `max_tokens` | | `temperature`, `top_p` → `p` | Direct pass-through for temperature; `top_p` renamed to `p` | | `instructions` | Becomes system message | | `text.format` | Converted to `response_format` | | `tools` | Schema restructured (see [Chat Completions](#1-chat-completions)) | | `tool_choice` | Type mapped (see [Chat Completions](#1-chat-completions)) | | `reasoning` | Mapped to `thinking` (see [Reasoning / Thinking](#reasoning--thinking)) | | `stop` | Via `extra_params`, renamed to `stop_sequences` | | `top_k` | Via `extra_params` (Cohere-specific) | | `frequency_penalty`, `presence_penalty` | Via `extra_params` | ### Extra Parameters Use `extra_params` (SDK) or pass directly in request body (Gateway): ```bash curl -X POST http://localhost:8080/v1/responses \ -H "Content-Type: application/json" \ -d '{ "model": "cohere/command-r-plus", "input": "Hello, how are you?", "top_k": 40, "stop": [".", "!"] }' ``` ```go resp, err := client.ResponsesRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostResponsesRequest{ Provider: schemas.Cohere, Model: "cohere/command-r-plus", Input: messages, Params: &schemas.ResponsesParameters{ ExtraParams: map[string]interface{}{ "top_k": 40, "stop": []string{".", "!"}, }, }, }) ``` ## Input & Instructions - **Input**: String converted to user message or array converted to messages - **Instructions**: Becomes system message (prepended to messages) ## Tool Support Supported types: `function` Tool conversions same as [Chat Completions](#1-chat-completions). ## Response Conversion - `text` → `message` | `tool_use` → `function_call` - `input_tokens` / `output_tokens` preserved - Token details with cached tokens support ## Streaming Event sequence: `message-start` → `content-start` → `content-delta` → `content-end` → `message-end` Special handling: - Tool call arguments accumulated across chunks - Synthetic `output_item.added` events emitted for text/reasoning - Stable item IDs generated as `msg_{messageID}_item_{outputIndex}` --- # 3. Embeddings ## Request Parameters ### Parameter Mapping | Parameter | Transformation | |-----------|----------------| | `input` (text or array) | Converted to `texts` array | | `dimensions` | Renamed to `output_dimension` | | `input_type` | Via `extra_params` (required, defaults to `"search_document"`) | | `embedding_types` | Via `extra_params` (array of embedding types) | | `truncate` | Via `extra_params` (how to handle long inputs) | | `max_tokens` | Via `extra_params` (max tokens to embed per input) | ### Extra Parameters Use `extra_params` for Cohere-specific embedding options: ```bash curl -X POST http://localhost:8080/v1/embeddings \ -H "Content-Type: application/json" \ -d '{ "model": "cohere/embed-english-v3.0", "input": ["text to embed"], "input_type": "search_query", "embedding_types": ["float"], "truncate": "START" }' ``` ```go resp, err := client.EmbeddingRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostEmbeddingRequest{ Provider: schemas.Cohere, Model: "cohere/embed-english-v3.0", Input: &schemas.EmbeddingInput{ Texts: []string{"text to embed"}, }, Params: &schemas.EmbeddingParameters{ Dimensions: schemas.Ptr(1024), ExtraParams: map[string]interface{}{ "input_type": "search_query", "embedding_types": []string{"float"}, "truncate": "START", }, }, }) ``` ### Critical Notes - **Input Type Required**: Cohere v3+ models require `input_type` parameter (defaults to `"search_document"`) - **Embedding Types**: Specify which embedding types to return (e.g., `"float"`, `"int8"`) ## Response Conversion - `embeddings.float` → `data[].embedding` - `meta.tokens` → usage information - Multiple embedding types handled --- # 4. List Models **Request**: GET `/v1/models?page_size={defaultPageSize}` **Field mapping**: Model data converted to standard format **Pagination**: Cursor-based with `next_page_token` **Note**: `endpoint` and `default_only` filters available via `extra_params`