bifrost/docs/providers/reasoning.mdx

---
title: "Reasoning"
description: "Cross-provider reference for reasoning and thinking capabilities in AI models"
icon: "brain"
---

## Overview

Reasoning (also called "thinking" in some providers) allows AI models to show their step-by-step thought process before providing a final answer. This feature is available across multiple providers with different implementations.

<Info>
Bifrost normalizes all provider-specific reasoning formats to a consistent OpenAI-compatible structure using `reasoning` in requests and `reasoning_details` in responses.
</Info>

---

## Provider Support Matrix

| Provider | Request Field | Response Field | Min Budget | Effort Levels | Streaming |
|----------|--------------|----------------|------------|---------------|-----------|
| OpenAI | `reasoning` | `reasoning_details` | None | `minimal`, `low`, `medium`, `high` | ✅ |
| Anthropic | `thinking` | Content blocks | **1024 tokens** | `enabled` only | ✅ |
| Bedrock (Anthropic) | `thinking` | Content blocks | **1024 tokens** | `enabled` only | ✅ |
| Gemini 2.5+ | `thinking_config` | `thought` parts | 1024 | Budget-only | ✅ |
| Gemini 3.0+ | `thinking_config` | `thought` parts | 1024 | `minimal`, `low`, `medium`, `high` + Budget | ✅ |

---

## Request Configuration

### Chat Completions API

<Tabs>
<Tab title="JSON">

```json
{
  "model": "provider/model-name",
  "messages": [...],
  "reasoning": {
    "effort": "high",
    "max_tokens": 4096
  }
}
```

</Tab>
<Tab title="Go SDK">

```go
package main

import (
	"github.com/maximhq/bifrost"
	"github.com/maximhq/bifrost/core/schemas"
)

chatReq := &schemas.BifrostChatRequest{
	Provider: schemas.OpenAI,
	Model:    "gpt-4o",
	Input: []schemas.ChatMessage{
		{
			Role: schemas.ChatMessageRoleUser,
			Content: &schemas.ChatMessageContent{
				ContentStr: schemas.Ptr("Explain quantum computing"),
			},
		},
	},
	Params: &schemas.ChatParameters{
		MaxCompletionTokens: schemas.Ptr(4096),
		Reasoning: &schemas.ChatReasoning{
			Effort:    schemas.Ptr("high"),
			MaxTokens: schemas.Ptr(4096),
		},
	},
}
```

</Tab>
</Tabs>

### Responses API

<Tabs>
<Tab title="JSON">

```json
{
  "model": "provider/model-name",
  "input": [...],
  "reasoning": {
    "effort": "high",
    "max_tokens": 4096,
    "summary": "detailed"
  }
}
```

</Tab>
<Tab title="Go SDK">

```go
package main

import (
	"github.com/maximhq/bifrost/core/schemas"
)

responsesReq := &schemas.BifrostResponsesRequest{
	Provider: schemas.Anthropic,
	Model:    "claude-3-5-sonnet-20241022",
	Input: []schemas.ResponsesMessage{
		{
			Role: schemas.Ptr(schemas.ResponsesInputMessageRoleUser),
			Content: &schemas.ResponsesMessageContent{
				ContentStr: schemas.Ptr("Explain quantum computing"),
			},
		},
	},
	Params: &schemas.ResponsesParameters{
		MaxOutputTokens: schemas.Ptr(4096),
		Reasoning: &schemas.ResponsesParametersReasoning{
			Effort:    schemas.Ptr("high"),
			MaxTokens: schemas.Ptr(4096),
			Summary:   schemas.Ptr("detailed"),
		},
	},
}
```

</Tab>
</Tabs>

<Note>
Responses API supports both `effort` + `max_tokens` (like Chat Completions) and adds the optional `summary` parameter for output summarization.
</Note>

### Parameter Reference

#### Chat Completions API Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `effort` | `string` | Reasoning intensity level |
| `max_tokens` | `int` | Maximum tokens for reasoning (budget) |

#### Responses API Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `effort` | `string` | Reasoning intensity level |
| `max_tokens` | `int` | Maximum tokens for reasoning (budget) |
| `summary` | `string` | Summary level: `brief`, `detailed`, or `json` |

<Note>
**Responses API** accepts the same `effort` and `max_tokens` parameters as Chat Completions, but adds an optional `summary` parameter for reasoning output summarization.
</Note>

---

## Provider-Specific Conversions

### OpenAI

OpenAI uses effort-based reasoning only. Bifrost applies priority logic:

1. If `reasoning.effort` is provided → use it directly
2. Else if `reasoning.max_tokens` is provided → estimate effort from it
3. The `max_tokens` field is cleared before sending to OpenAI

**Conversion Examples**:

<Tabs>
<Tab title="Effort (JSON)">

```json
// Bifrost Request (with effort)
{
  "reasoning": {
    "effort": "high"
  }
}

// OpenAI Request Sent
{
  "reasoning": {
    "effort": "high"
  }
}
```

</Tab>
<Tab title="Effort (Go)">

```go
// Bifrost request with effort (native field)
chatReq := &schemas.BifrostChatRequest{
	Provider: schemas.OpenAI,
	Model:    "gpt-4o",
	Input:    messages,
	Params: &schemas.ChatParameters{
		MaxCompletionTokens: schemas.Ptr(4096),
		Reasoning: &schemas.ChatReasoning{
			Effort: schemas.Ptr("high"),
		},
	},
}

// OpenAI receives effort directly, max_tokens is cleared
```

</Tab>
<Tab title="Max Tokens (JSON)">

```json
// Bifrost Request (with max_tokens only)
{
  "max_completion_tokens": 4096,
  "reasoning": {
    "max_tokens": 3000
  }
}

// Estimation: ratio = 3000/4096 ≈ 0.73 → "high"
// OpenAI Request Sent
{
  "reasoning": {
    "effort": "high"
  }
}
```

</Tab>
<Tab title="Max Tokens (Go)">

```go
// Bifrost request with max_tokens only
chatReq := &schemas.BifrostChatRequest{
	Provider: schemas.OpenAI,
	Model:    "gpt-4o",
	Input:    messages,
	Params: &schemas.ChatParameters{
		MaxCompletionTokens: schemas.Ptr(4096),
		Reasoning: &schemas.ChatReasoning{
			MaxTokens: schemas.Ptr(3000),
		},
	},
}

// Bifrost estimates effort from max_tokens
// ratio = 3000/4096 ≈ 0.73 → effort = "high"
// OpenAI receives effort, max_tokens cleared
```

</Tab>
</Tabs>

**Supported Effort Levels**: `minimal`, `low`, `medium`, `high`

<Note>
When `minimal` is encountered, it's converted to `low` for non-OpenAI providers. OpenAI receives only: `low`, `medium`, `high`.
</Note>

---

### Anthropic

Anthropic uses a `thinking` parameter with different structure.

<Tabs>
<Tab title="Request Conversion (JSON)">

```json
// Bifrost Request
{
  "reasoning": {
    "effort": "high",
    "max_tokens": 4096
  }
}

// Anthropic Request
{
  "thinking": {
    "type": "enabled",
    "budget_tokens": 4096
  }
}
```

</Tab>
<Tab title="Request Conversion (Go)">

```go
// Using Bifrost Go SDK
chatReq := &schemas.BifrostChatRequest{
  Provider: schemas.Anthropic,
  Model:    "claude-3-5-sonnet-20241022",
  Input:    messages,
  Params: &schemas.ChatParameters{
    MaxCompletionTokens: schemas.Ptr(4096),
    Reasoning: &schemas.ChatReasoning{
      MaxTokens: schemas.Ptr(4096), // Anthropic native field
    },
  },
}

// Bifrost converts to Anthropic format:
// {
//   "thinking": {
//     "type": "enabled",
//     "budget_tokens": 4096
//   }
// }
```

</Tab>
<Tab title="Response Conversion (JSON)">

```json
// Anthropic Response (content blocks)
{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "EqoBCkgIAR..."
    },
    {
      "type": "text",
      "text": "The answer is 42."
    }
  ]
}

// Bifrost Response
{
  "choices": [{
    "message": {
      "content": "The answer is 42.",
      "reasoning": "Let me analyze this step by step...",
      "reasoning_details": [{
        "index": 0,
        "type": "text",
        "text": "Let me analyze this step by step...",
        "signature": "EqoBCkgIAR..."
      }]
    }
  }]
}
```

</Tab>
<Tab title="Response Conversion (Go)">

```go
// After calling Bifrost Chat Completions with reasoning
resp, err := client.ChatCompletionRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), chatReq)
if err != nil {
  log.Fatal(err)
}

// Extract reasoning from response
choice := resp.Choices[0]
message := choice.Message

// Access combined reasoning text
reasoningText := message.Reasoning

// Access detailed reasoning blocks
for i, details := range message.ReasoningDetails {
  fmt.Printf("Block %d: %s\n", i, details.Text)
  if details.Signature != "" {
    fmt.Printf("  Signature: %s\n", details.Signature)
  }
}
```

</Tab>
</Tabs>

**Conversion Rules**:

| Bifrost | Anthropic | Notes |
|---------|-----------|-------|
| `reasoning.effort` | `thinking.type` | Always mapped to `"enabled"` |
| `reasoning.max_tokens` | `thinking.budget_tokens` | Token budget for reasoning |

<Warning>
**Critical Constraint**: Anthropic requires `reasoning.max_tokens >= 1024`. Requests with lower values will **fail with an error**.
</Warning>

**Dynamic Budget Handling**:

| Input Value | Converted To |
|-------------|--------------|
| `-1` (dynamic) | `1024` (minimum default) |
| `< 1024` | **Error** |
| `>= 1024` | Pass-through |

**Code Reference**: `core/providers/anthropic/chat.go:104-134`

---

### Bedrock (Anthropic Models)

Bedrock uses the same structure as Anthropic for Claude models.

<Tabs>
<Tab title="Request (JSON)">

```json
// Bifrost Request
{
  "reasoning": {
    "effort": "high",
    "max_tokens": 4096
  }
}

// Bedrock Request (for Anthropic/Claude models)
{
  "additionalModelRequestFields": {
    "reasoning_config": {
      "type": "enabled",
      "budget_tokens": 4096
    }
  }
}
```

</Tab>
<Tab title="Request (Go)">

```go
// Using Bifrost Go SDK with Bedrock provider
chatReq := &schemas.BifrostChatRequest{
  Provider: schemas.Bedrock,
  Model:    "us.anthropic.claude-3-5-sonnet-20241022-v2:0",
  Input:    messages,
  Params: &schemas.ChatParameters{
    MaxCompletionTokens: schemas.Ptr(4096),
    Reasoning: &schemas.ChatReasoning{
      MaxTokens: schemas.Ptr(4096), // Bedrock Anthropic native field
    },
  },
}

// Bifrost converts to Bedrock format with reasoning_config
```

</Tab>
</Tabs>

<Note>
The same 1024 minimum token budget constraint applies to Bedrock Anthropic models. Attempts to set `max_tokens` below 1024 will result in an error.
</Note>

**Code Reference**: `core/providers/bedrock/utils.go:34-47`

---

### Bedrock (Nova Models)

Bedrock Nova models use an effort-based approach similar to OpenAI.

<Tabs>
<Tab title="Request Conversion (JSON)">

```json
// Bifrost Request
{
  "reasoning": {
    "effort": "high",
    "max_tokens": 4096
  }
}

// Bedrock Request (for Nova models)
{
  "additionalModelRequestFields": {
    "reasoningConfig": {
      "type": "enabled",
      "maxReasoningEffort": "high"
    }
  }
}
```

</Tab>
<Tab title="Request Conversion (Go)">

```go
// Using Bifrost Go SDK with Bedrock Nova
chatReq := &schemas.BifrostChatRequest{
  Provider: schemas.Bedrock,
  Model:    "us.amazon.nova-pro-v1:0",
  Input:    messages,
  Params: &schemas.ChatParameters{
    MaxCompletionTokens: schemas.Ptr(4096),
    Reasoning: &schemas.ChatReasoning{
      Effort: schemas.Ptr("high"), // Nova native field
    },
  },
}

// Bifrost converts to Bedrock Nova format:
// reasoningConfig: {
//   type: "enabled",
//   maxReasoningEffort: "high"
// }
```

</Tab>
<Tab title="Effort Levels">

| Bifrost Effort | Nova Effort | Configuration |
|---|---|---|
| `minimal`, `low` | `"low"` | Normal parameters allowed |
| `medium` | `"medium"` | Normal parameters allowed |
| `high` | `"high"` | Clears `maxTokens`, `temperature`, `topP` |

</Tab>
</Tabs>

**Key Differences from Anthropic**:

- No minimum token budget constraint
- Uses effort levels instead of token budgets
- High effort mode automatically clears conflicting parameters

**Code Reference**: `core/providers/bedrock/utils.go:48-89`

---

### Gemini

Gemini uses `thinking_config` with dual support for both token budgets and effort levels, depending on the model version.

#### Model Version Support

| Gemini Version | `thinkingBudget` | `thinkingLevel` | Notes |
|----------------|------------------|-----------------|-------|
| **2.5+** | ✅ | ❌ | Budget-only models |
| **3.0+** | ✅ | ✅ | Support both budget and level |

<Warning>
**Important**: Only ONE parameter (`thinkingBudget` or `thinkingLevel`) should be sent to Gemini at a time. When both `reasoning.max_tokens` and `reasoning.effort` are provided in a Bifrost request, `max_tokens` takes priority and is converted to `thinkingBudget`.
</Warning>

#### Priority Rules

When both `reasoning.max_tokens` and `reasoning.effort` are present:

```
1. If max_tokens is provided → USE thinkingBudget (ignores effort)
2. Else if effort is provided:
   - Gemini 3.0+ → USE thinkingLevel (more native)
   - Gemini 2.5 → CONVERT effort to thinkingBudget
3. Else → disable reasoning
```

<Tabs>
<Tab title="Budget Priority (JSON)">

```json
// Bifrost Request - Both fields provided
{
  "model": "gemini-3.0-flash",
  "reasoning": {
    "effort": "high",        // Ignored
    "max_tokens": 4096      // Takes priority
  }
}

// Gemini 3.0+ Request - Only budget sent
{
  "generation_config": {
    "thinking_config": {
      "include_thoughts": true,
      "thinking_budget": 4096
    }
  }
}
```

</Tab>
<Tab title="Effort to Level (Gemini 3.0+)">

```json
// Bifrost Request - Effort only
{
  "model": "gemini-3.0-flash",
  "reasoning": {
    "effort": "high"
  }
}

// Gemini 3.0+ Request - Converted to level
{
  "generation_config": {
    "thinking_config": {
      "include_thoughts": true,
      "thinking_level": "high"
    }
  }
}
```

</Tab>
<Tab title="Effort to Budget (Gemini 2.5)">

```json
// Bifrost Request - Effort only
{
  "model": "gemini-2.5-flash",
  "max_completion_tokens": 4096,
  "reasoning": {
    "effort": "high"
  }
}

// Gemini 2.5 Request - Converted to budget
// Calculation: 1024 + (0.80 × (4096 - 1024)) = 3482
{
  "generation_config": {
    "thinking_config": {
      "include_thoughts": true,
      "thinking_budget": 3482
    }
  }
}
```

</Tab>
</Tabs>

#### Model-Specific Level Conversions

Gemini Pro models have stricter constraints on thinking levels:

| Bifrost Effort | Non-Pro Models | Pro Models | Notes |
|----------------|----------------|------------|-------|
| `"none"` | Empty string | Empty string | Disables thinking |
| `"minimal"` | `"minimal"` | `"low"` | Pro doesn't support minimal |
| `"low"` | `"low"` | `"low"` | Supported on all |
| `"medium"` | `"medium"` | `"high"` | Pro doesn't support medium |
| `"high"` | `"high"` | `"high"` | Supported on all |

**Example**:
```go
// For "gemini-3.0-flash-thinking-exp" (non-Pro)
effort: "medium" → thinkingLevel: "medium"

// For "gemini-3.0-pro" (Pro model)
effort: "medium" → thinkingLevel: "high"  // Converted up
```

#### Special Values

| Value | Field | Behavior | Use Case |
|-------|-------|----------|----------|
| `0` | `max_tokens` | `thinking_budget: 0`, `include_thoughts: false` | Explicitly disable reasoning |
| `-1` | `max_tokens` | `thinking_budget: -1` | **Dynamic budget** (Gemini decides) |
| `"none"` | `effort` | `thinking_budget: 0`, `include_thoughts: false` | Disable reasoning |

<Tabs>
<Tab title="Dynamic Budget (JSON)">

```json
// Bifrost Request - Dynamic budget
{
  "reasoning": {
    "max_tokens": -1
  }
}

// Gemini Request - Sent as-is
{
  "generation_config": {
    "thinking_config": {
      "include_thoughts": true,
      "thinking_budget": -1
    }
  }
}
```

</Tab>
<Tab title="Disable Reasoning (JSON)">

```json
// Bifrost Request - Method 1
{
  "reasoning": {
    "max_tokens": 0
  }
}

// Bifrost Request - Method 2
{
  "reasoning": {
    "effort": "none"
  }
}

// Gemini Request - Both become
{
  "generation_config": {
    "thinking_config": {
      "include_thoughts": false,
      "thinking_budget": 0
    }
  }
}
```

</Tab>
<Tab title="Go SDK Examples">

```go
// Using Bifrost Go SDK with Gemini
// Example 1: Dynamic budget
chatReq := &schemas.BifrostChatRequest{
  Provider: schemas.Gemini,
  Model:    "gemini-2.0-flash-thinking-exp-1219",
  Input:    messages,
  Params: &schemas.ChatParameters{
    MaxCompletionTokens: schemas.Ptr(4096),
    Reasoning: &schemas.ChatReasoning{
      MaxTokens: schemas.Ptr(-1), // Let Gemini decide
    },
  },
}

// Example 2: Effort-based for Gemini 3.0+
chatReq := &schemas.BifrostChatRequest{
  Provider: schemas.Gemini,
  Model:    "gemini-3.0-flash",
  Input:    messages,
  Params: &schemas.ChatParameters{
    MaxCompletionTokens: schemas.Ptr(4096),
    Reasoning: &schemas.ChatReasoning{
      Effort: schemas.Ptr("high"), // Converts to thinkingLevel
    },
  },
}

// Example 3: Budget-based (all versions)
chatReq := &schemas.BifrostChatRequest{
  Provider: schemas.Gemini,
  Model:    "gemini-2.5-flash",
  Input:    messages,
  Params: &schemas.ChatParameters{
    MaxCompletionTokens: schemas.Ptr(4096),
    Reasoning: &schemas.ChatReasoning{
      MaxTokens: schemas.Ptr(3000), // Direct budget
    },
  },
}
```

</Tab>
</Tabs>

#### Response Conversion

<Tabs>
<Tab title="Response (JSON)">

```json
// Gemini Response
{
  "candidates": [{
    "content": {
      "parts": [
        {
          "thought": true,
          "text": "Analyzing the problem..."
        },
        {
          "text": "The answer is 42."
        }
      ]
    }
  }]
}

// Bifrost Response
{
  "choices": [{
    "message": {
      "content": "The answer is 42.",
      "reasoning": "Analyzing the problem...",
      "reasoning_details": [{
        "index": 0,
        "type": "text",
        "text": "Analyzing the problem..."
      }]
    }
  }]
}
```

</Tab>
<Tab title="Response (Go)">

```go
// After calling Bifrost Chat Completions with Gemini
resp, err := client.ChatCompletionRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), chatReq)
if err != nil {
  log.Fatal(err)
}

// Extract reasoning from response
choice := resp.Choices[0]
message := choice.Message

// Access combined reasoning text
fmt.Printf("Reasoning: %s\n", message.Reasoning)

// Access detailed reasoning blocks
for i, details := range message.ReasoningDetails {
  if details.Type == "text" {
    fmt.Printf("Thinking block %d:\n%s\n", i, details.Text)
  }
}

// Access final answer
fmt.Printf("Answer:\n%s\n", message.Content)
```

</Tab>
</Tabs>

#### Conversion Summary

**Bifrost → Gemini (Request)**:

| Input | Gemini 2.5 | Gemini 3.0+ | Note |
|-------|------------|-------------|------|
| `max_tokens: 4096` | `thinking_budget: 4096` | `thinking_budget: 4096` | Direct pass-through |
| `max_tokens: -1` | `thinking_budget: -1` | `thinking_budget: -1` | Dynamic budget |
| `max_tokens: 0` | `thinking_budget: 0` | `thinking_budget: 0` | Disabled |
| `effort: "high"` only | `thinking_budget: 3482`* | `thinking_level: "high"` | Estimated or native |
| `effort: "medium"` only | `thinking_budget: 2330`* | `thinking_level: "medium"` or `"high"`** | Estimated or native |
| Both `effort` + `max_tokens` | Uses `max_tokens` | Uses `max_tokens` | Priority rule |

\* Assumes `max_completion_tokens: 8192` (default), uses estimation formula
\*\* Pro models convert `"medium"` to `"high"`

**Gemini → Bifrost (Response)**:

| Gemini Field | Bifrost Field | Conversion |
|--------------|---------------|------------|
| `thinking_budget` | `reasoning.max_tokens` | Direct mapping |
| `thinking_level` | `reasoning.effort` | Level → effort mapping |
| `thought: true` parts | `reasoning_details[]` | Array of reasoning blocks |

**Code References**:
- `core/providers/gemini/utils.go` (Chat Completions)
- `core/providers/gemini/responses.go` (Responses API)
- `core/providers/gemini/types.go` (Constants)

---

## Two Reasoning Methods: Effort vs. Max Tokens

Bifrost supports two distinct reasoning models across different providers:

### Reasoning Model Types

| Model | Providers | Request Field | Native Format |
|-------|-----------|---------------|---------------|
| **Effort-Based** | OpenAI, AWS Bedrock Nova | `reasoning.effort` | `reasoning_effort` (Chat) / `effort` (Responses) |
| **Max-Tokens-Based** | Anthropic, Cohere, Gemini | `reasoning.max_tokens` | `thinking.budget_tokens` |

**Important**: Both effort and max_tokens can be specified in a single request. Bifrost uses a **priority hierarchy** to determine which field is used.

### Priority Logic: Native vs. Estimated

When both `effort` and `max_tokens` are present in a request, Bifrost prioritizes the **native compatible field** for the target provider:

#### **For Max-Tokens-Based Providers** (Anthropic, Cohere, Gemini)

```
1. If reasoning.max_tokens is provided → USE IT (native field)
2. Else if reasoning.effort is provided → ESTIMATE max_tokens from effort
3. Else → disable reasoning
```

**Example** (Cohere):
```json
// Request with both fields
{
  "reasoning": {
    "effort": "high",
    "max_tokens": 2000
  }
}
```

**Result**: Uses `max_tokens: 2000` directly, ignores `effort`

#### **For Effort-Based Providers** (OpenAI, AWS Bedrock Nova)

```
1. If reasoning.effort is provided → USE IT (native field)
2. Else if reasoning.max_tokens is provided → ESTIMATE effort from max_tokens
3. Else → disable reasoning
```

**Example** (OpenAI Chat Completions):
```json
// Request with both fields
{
  "reasoning": {
    "effort": "high",
    "max_tokens": 2000
  }
}
```

**Result**: Uses `effort: "high"` directly, strips `max_tokens` from JSON

<Accordion title="Why Priority Matters">

**Reason 1: Accuracy** - Native fields provide direct control without estimation loss

**Reason 2: Consistency** - Using native fields ensures the exact user intent is preserved

**Reason 3: Performance** - Avoids unnecessary conversions when native field is already provided

</Accordion>

---

## Estimator Functions

Bifrost provides two estimator functions to convert between reasoning methods. These are used when the native field is not available.

### Function 1: Effort → Max Tokens

**Function**: `GetBudgetTokensFromReasoningEffort()`

**File**: `core/providers/utils/utils.go:1350-1387`

**Signature**:
```go
func GetBudgetTokensFromReasoningEffort(
    effort string,           // "minimal", "low", "medium", "high"
    minBudgetTokens int,     // Provider-specific minimum (e.g., 1024 for Anthropic)
    maxTokens int,           // Total completion tokens available
) (int, error)
```

**Algorithm**:

```
1. Define ratio for effort level:
   - "minimal"  → 2.5%  (0.025)
   - "low"      → 15%   (0.15)
   - "medium"   → 42.5% (0.425)
   - "high"     → 80%   (0.80)

2. Calculate budget:
   budget = minBudgetTokens + (ratio × (maxTokens - minBudgetTokens))

3. Clamp to valid range:
   if budget < minBudgetTokens → budget = minBudgetTokens
   if budget > maxTokens → budget = maxTokens
```

**Conversion Examples** (with `minBudgetTokens=1024`, `maxTokens=4096`):

| Effort | Ratio | Calculation | Result |
|--------|-------|-------------|--------|
| `minimal` | 2.5% | 1024 + 0.025 × 3072 | 1101 → 1024* |
| `low` | 15% | 1024 + 0.15 × 3072 | 1485 |
| `medium` | 42.5% | 1024 + 0.425 × 3072 | 2330 |
| `high` | 80% | 1024 + 0.80 × 3072 | 3482 |

<Note>
*When result is below minimum, clamped to minBudgetTokens (for Anthropic minimum of 1024)
</Note>

**Error Handling**:
```go
if minBudgetTokens > maxTokens {
    return 0, fmt.Errorf("max_tokens must be > minBudgetTokens")
}
```

**Code Example**:
```go
// Cohere: Convert effort to token budget
budgetTokens, err := providerUtils.GetBudgetTokensFromReasoningEffort(
    "high",                    // effort
    1,                         // Cohere min
    4096,                      // max completion tokens
)
// Returns: 3277 tokens
```

### Function 2: Max Tokens → Effort

**Function**: `GetReasoningEffortFromBudgetTokens()`

**File**: `core/providers/utils/utils.go:1308-1345`

**Signature**:
```go
func GetReasoningEffortFromBudgetTokens(
    budgetTokens int,        // Reasoning token budget
    minBudgetTokens int,     // Provider-specific minimum
    maxTokens int,           // Total completion tokens available
) string                     // Returns: "low", "medium", "high"
```

**Algorithm**:

```
1. Normalize budget to valid range:
   if budget < min → budget = min
   if budget > max → budget = max

2. Calculate ratio:
   ratio = (budgetTokens - minBudgetTokens) / (maxTokens - minBudgetTokens)

3. Map ratio to effort level:
   if ratio ≤ 0.25  → "low"
   if ratio ≤ 0.60  → "medium"
   if ratio > 0.60  → "high"
```

**Conversion Examples** (with `minBudgetTokens=1024`, `maxTokens=4096`):

| Budget Tokens | Ratio | Effort |
|---|---|---|
| 1024 | 0% | `low` |
| 1101 | 2.5% | `low` |
| 1500 | 15.6% | `low` |
| 1900 | 28.6% | `medium` |
| 2500 | 48.1% | `medium` |
| 3000 | 64.5% | `high` |
| 3400 | 77.6% | `high` |

**Defensive Defaults**:
```go
if budgetTokens <= 0 {
    return "none"
}
if maxTokens <= 0 {
    return "medium"  // Safe default
}
if maxTokens <= minBudgetTokens {
    return "high"    // Can't calculate ratio
}
```

**Code Example**:
```go
// Convert Anthropic budget back to effort for display
effort := providerUtils.GetReasoningEffortFromBudgetTokens(
    3000,   // budget tokens from Anthropic response
    1024,   // Anthropic minimum
    4096,   // max tokens
)
// Returns: "high"
```

---

## Provider-Specific Constants

Different providers have different constraints on reasoning budget:

### Min Budget Constants

| Provider | File | MinBudgetTokens | Reason |
|----------|------|---|---|
| Anthropic | `core/providers/anthropic/types.go` | **1024** | Anthropic API requirement |
| Bedrock Anthropic | `core/providers/bedrock/types.go` | **1024** | Same as Anthropic |
| Bedrock Nova | `core/providers/bedrock/types.go` | 1 | More flexible |
| Cohere | `core/providers/cohere/types.go` | 1 | Flexible |
| Gemini | `core/providers/gemini/types.go` | 1024 | Default minimum for conversions |

### Default Completion Tokens (for ratio calculation)

When `max_completion_tokens` is not provided, these defaults are used for ratio calculations:

| Provider | Default | File |
|----------|---------|------|
| OpenAI, Anthropic, Cohere, Bedrock | 4096 | `core/providers/*/types.go` |
| Gemini | 8192 | `core/providers/gemini/types.go` |

---

## Effort-to-Token Conversion Examples

### Example 1: Estimate tokens from effort (Anthropic)

<Tabs>
<Tab title="JSON">

**Input**:
```json
{
  "model": "anthropic/claude-3-5-sonnet",
  "max_completion_tokens": 2000,
  "reasoning": {
    "effort": "high"
  }
}
```

**Conversion Process**:
1. `effort = "high"` → `ratio = 0.80`
2. `minBudgetTokens = 1024` (Anthropic)
3. `maxCompletionTokens = 2000`
4. `budget = 1024 + (0.80 × (2000 - 1024))`
5. `budget = 1024 + (0.80 × 976)`
6. `budget = 1024 + 780`
7. **Result: 1804 tokens**

**Anthropic Request Generated**:
```json
{
  "thinking": {
    "type": "enabled",
    "budget_tokens": 1804
  }
}
```

</Tab>
<Tab title="Go SDK">

```go
import (
  "github.com/maximhq/bifrost/core/providers/utils"
  "github.com/maximhq/bifrost/core/schemas"
)

// Using Bifrost Go SDK
chatReq := &schemas.BifrostChatRequest{
  Provider: schemas.Anthropic,
  Model:    "claude-3-5-sonnet-20241022",
  Input:    messages,
  Params: &schemas.ChatParameters{
    MaxCompletionTokens: schemas.Ptr(2000),
    Reasoning: &schemas.ChatReasoning{
      Effort: schemas.Ptr("high"), // Effort provided, max_tokens not set
    },
  },
}

// Bifrost automatically converts effort to budget tokens:
// 1. Get ratio for "high": 0.80
// 2. Calculate: 1024 + (0.80 × (2000 - 1024)) = 1804
// 3. Send to Anthropic with budget_tokens: 1804

// Alternatively, manually call the estimator function:
budgetTokens, _ := utils.GetBudgetTokensFromReasoningEffort(
  "high",     // effort
  1024,       // Anthropic minimum
  2000,       // max completion tokens
)
// Returns: 1804
```

</Tab>
</Tabs>

### Example 2: Estimate effort from tokens (Bedrock Nova)

<Tabs>
<Tab title="JSON">

**Input**:
```json
{
  "model": "bedrock/us.amazon.nova-pro-v1:0",
  "max_completion_tokens": 4096,
  "reasoning": {
    "max_tokens": 2000
  }
}
```

**Conversion Process**:
1. `budgetTokens = 2000`
2. `minBudgetTokens = 1` (Nova)
3. `maxCompletionTokens = 4096`
4. `ratio = (2000 - 1) / (4096 - 1)`
5. `ratio = 1999 / 4095`
6. `ratio = 0.488` (48.8%)
7. Since `0.25 < 0.488 ≤ 0.60` → **Result: "medium"**

**Bedrock Nova Request Generated**:
```json
{
  "reasoningConfig": {
    "type": "enabled",
    "maxReasoningEffort": "medium"
  }
}
```

</Tab>
<Tab title="Go SDK">

```go
import (
  "github.com/maximhq/bifrost/core/providers/utils"
  "github.com/maximhq/bifrost/core/schemas"
)

// Using Bifrost Go SDK with max_tokens (not effort)
chatReq := &schemas.BifrostChatRequest{
  Provider: schemas.Bedrock,
  Model:    "us.amazon.nova-pro-v1:0",
  Input:    messages,
  Params: &schemas.ChatParameters{
    MaxCompletionTokens: schemas.Ptr(4096),
    Reasoning: &schemas.ChatReasoning{
      MaxTokens: schemas.Ptr(2000), // Max tokens provided, effort not set
    },
  },
}

// Bifrost automatically estimates effort from max_tokens:
// 1. Calculate ratio: (2000 - 1) / (4096 - 1) = 0.488
// 2. Since 0.25 < 0.488 ≤ 0.60 → "medium"
// 3. Send to Bedrock Nova with effort: "medium"

// Alternatively, manually call the estimator function:
effort := utils.GetReasoningEffortFromBudgetTokens(
  2000,  // budget tokens
  1,     // Nova minimum
  4096,  // max completion tokens
)
// Returns: "medium"
```

</Tab>
</Tabs>

### Example 3: Both fields provided (priority used)

<Tabs>
<Tab title="JSON">

**Input**:
```json
{
  "model": "anthropic/claude-3-5-sonnet",
  "max_completion_tokens": 4096,
  "reasoning": {
    "effort": "medium",
    "max_tokens": 2500
  }
}
```

**Logic for Max-Tokens-Based Provider**:
1. Check: Is `max_tokens` provided? → **YES**
2. Use `max_tokens` directly (ignore `effort`)
3. Validate: `2500 >= 1024`? → **YES**

**Anthropic Request Generated**:
```json
{
  "thinking": {
    "type": "enabled",
    "budget_tokens": 2500
  }
}
```

**Note**: The `effort: "medium"` is completely ignored because `max_tokens` takes priority.

</Tab>
<Tab title="Go SDK">

```go
import "github.com/maximhq/bifrost/core/schemas"

// Using Bifrost Go SDK with BOTH effort and max_tokens
chatReq := &schemas.BifrostChatRequest{
  Provider: schemas.Anthropic,
  Model:    "claude-3-5-sonnet-20241022",
  Input:    messages,
  Params: &schemas.ChatParameters{
    MaxCompletionTokens: schemas.Ptr(4096),
    Reasoning: &schemas.ChatReasoning{
      Effort:    schemas.Ptr("medium"),   // Provided but ignored
      MaxTokens: schemas.Ptr(2500),       // This takes priority
    },
  },
}

// Bifrost Priority Logic:
// 1. For max-tokens-based providers (Anthropic):
//    → Check if max_tokens is provided? YES
//    → Use it directly: 2500
//    → Ignore effort: "medium"
//    → Validate: 2500 >= 1024? YES ✓
// 2. Send to Anthropic with budget_tokens: 2500

// Result: effort is completely ignored, max_tokens is used
```

</Tab>
</Tabs>

---

## Response Format

### Bifrost Standard Response

All providers return reasoning in a normalized `reasoning_details` array:

```json
{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Final response text",
      "reasoning_details": [
        {
          "index": 0,
          "type": "text",
          "text": "Step-by-step reasoning content...",
          "signature": "optional_signature_for_verification"
        }
      ]
    }
  }]
}
```

### Reasoning Details Fields

| Field | Type | Description | Present In |
|-------|------|-------------|------------|
| `index` | `int` | Position in reasoning sequence | All |
| `type` | `string` | Content type (`text`, `encrypted`, `summary`) | All |
| `text` | `string` | Reasoning content | Chat Completions |
| `summary` | `string` | Reasoning summary | Responses API |
| `signature` | `string` | Cryptographic signature for verification | Anthropic, Bedrock |

### Type Mappings

| Reasoning Type | When Used | Source |
|---|---|---|
| `reasoning.text` | Direct thinking/reasoning content | Anthropic, Gemini, Bedrock |
| `reasoning.encrypted` | Signature-verified reasoning | Anthropic, Bedrock Nova |
| `reasoning.summary` | Summarized reasoning (Responses API) | All providers |

<Note>
**OpenAI Implementation**: OpenAI (both Chat Completions and Responses API) is effort-based, following the standard priority logic: if `effort` is provided, it's used directly; if only `max_tokens` is provided, effort is estimated from it. The `max_tokens` field is then cleared before JSON serialization via `MarshalJSON` (`core/providers/openai/types.go:383-453`), since OpenAI's APIs don't accept it.
</Note>

---

## Streaming

### Stream Event Types

| Provider | Reasoning Event | Signature Event |
|----------|-----------------|-----------------|
| OpenAI | `reasoning` (top-level) | N/A |
| Anthropic | `thinking_delta` | `signature_delta` |
| Bedrock | `thinking_delta` | `signature_delta` |
| Gemini | `thought` (in content) | `thought_signature` |

### Anthropic Streaming Example

```
// Stream events
event: content_block_start
data: {"type": "content_block_start", "content_block": {"type": "thinking"}}

event: content_block_delta
data: {"type": "content_block_delta", "delta": {"type": "thinking_delta", "thinking": "Let me"}}

event: content_block_delta
data: {"type": "content_block_delta", "delta": {"type": "thinking_delta", "thinking": " analyze..."}}

event: content_block_delta
data: {"type": "content_block_delta", "delta": {"type": "signature_delta", "signature": "EqoB..."}}

event: content_block_stop
data: {"type": "content_block_stop"}
```

### Bifrost Stream Response

```json
// Thinking delta
{
  "choices": [{
    "delta": {
      "reasoning_details": [{
        "index": 0,
        "type": "text",
        "text": "Let me analyze..."
      }]
    }
  }]
}

// Signature delta
{
  "choices": [{
    "delta": {
      "reasoning_details": [{
        "index": 0,
        "signature": "EqoB..."
      }]
    }
  }]
}
```

---

## Caveats Summary

<Accordion title="Minimum Budget (Anthropic/Bedrock)">
**Severity**: High
**Behavior**: `reasoning.max_tokens` must be >= 1024
**Impact**: Requests with lower values fail with error
**Workaround**: Always set max_tokens >= 1024 for Anthropic/Bedrock
</Accordion>

<Accordion title="Dynamic Budget Not Supported">
**Severity**: Medium
**Behavior**: `reasoning.max_tokens = -1` converted to `1024`
**Impact**: Dynamic budgeting not available on Anthropic/Bedrock
**Workaround**: Set explicit token budget
</Accordion>

<Accordion title="Effort Level Normalization">
**Severity**: Low
**Behavior**: OpenAI's `minimal` converted to `low` when routing to other providers
**Impact**: Slightly different reasoning behavior
</Accordion>

<Accordion title="Signature Field Provider-Specific">
**Severity**: Low
**Behavior**: `signature` field only present in Anthropic/Bedrock responses
**Impact**: Signature-based verification only available for these providers
</Accordion>

<Accordion title="Thinking Type Always Enabled">
**Severity**: Low
**Behavior**: Anthropic's `thinking.type` always set to `"enabled"` regardless of effort
**Impact**: Cannot disable thinking once reasoning param is present
</Accordion>

<Accordion title="Gemini: Only One Parameter Sent">
**Severity**: Medium
**Behavior**: When both `effort` and `max_tokens` are provided, only `thinkingBudget` is sent to Gemini (effort is dropped)
**Impact**: Effort value is completely ignored when max_tokens is present
**Workaround**: Provide only the parameter you want to use
</Accordion>

<Accordion title="Gemini: Model Version Differences">
**Severity**: Medium
**Behavior**: Gemini 2.5 only supports `thinkingBudget`, while 3.0+ supports both `thinkingBudget` and `thinkingLevel`
**Impact**: Effort-only requests on 2.5 are converted to budget; on 3.0+ they use native levels
**Note**: Bifrost automatically detects version and uses appropriate conversion
</Accordion>

<Accordion title="Gemini Pro: Limited Level Support">
**Severity**: Low
**Behavior**: Pro models only support "low" and "high" thinking levels
**Impact**: `"minimal"` → `"low"`, `"medium"` → `"high"` for Pro models
**Note**: Non-Pro models support all four levels: minimal, low, medium, high
</Accordion>

---

## Complete Provider Comparison

### Reasoning Model

| Provider | Model Type | Budget Type | Min Budget | Signature Support |
|----------|-----------|-------------|------------|------------------|
| OpenAI | Effort-based | Effort-based | None | ❌ |
| Anthropic | Thinking blocks | Token budget | **1024** | ✅ |
| Bedrock (Anthropic) | Reasoning config | Token budget | **1024** | ✅ |
| Bedrock (Nova) | Reasoning config | Effort-based | None | ❌ |
| Gemini 2.5+ | Thinking config | Token budget | 1024 | ✅ |
| Gemini 3.0+ | Thinking config | Dual (budget + level) | 1024 | ✅ |

### Parameter Support

| Provider | `effort` | `max_tokens` | `summary` | Streaming |
|----------|----------|------------|----------|-----------|
| OpenAI | ✅ (4 levels) | ✅ | ❌ | ✅ |
| Anthropic | ❌ (binary) | ✅ | ✅ | ✅ |
| Bedrock (Anthropic) | ❌ (binary) | ✅ | ✅ | ✅ |
| Bedrock (Nova) | ✅ (3 levels) | ⚠️ (ignored) | ❌ | ✅ |
| Gemini 2.5+ | ⚠️ (converts to budget) | ✅ | ❌ | ✅ |
| Gemini 3.0+ | ✅ (4 levels) | ✅ | ❌ | ✅ |

---

## Troubleshooting

### Anthropic: "reasoning.max_tokens must be >= 1024"

**Cause**: Attempting to use reasoning with `max_tokens < 1024`

**Solution**: Ensure `reasoning.max_tokens >= 1024` for Anthropic/Bedrock Anthropic models

```json
// ❌ Invalid
{"reasoning": {"effort": "high", "max_tokens": 500}}

// ✅ Valid
{"reasoning": {"effort": "high", "max_tokens": 1024}}
```

### OpenAI: Model doesn't support reasoning

**Cause**: Using an older model that doesn't support reasoning (e.g., `gpt-4-turbo`)

**Solution**: Use models with reasoning support: `gpt-4o`, `gpt-4o-mini` (o1 series with native reasoning)

### Bedrock Nova: `max_tokens` parameter being ignored

**Expected Behavior**: Bedrock Nova uses effort-based reasoning only

**Solution**: Provide `effort` parameter instead of `max_tokens` for Nova models

```json
// ✅ Correct for Nova
{"reasoning": {"effort": "high"}}
```

---