869 lines
31 KiB
Plaintext
869 lines
31 KiB
Plaintext
---
|
|
title: "Google Gemini"
|
|
description: "Google Gemini API conversion guide - request/response transformation, message conversion, tool handling, and streaming behavior"
|
|
icon: "diamond"
|
|
---
|
|
|
|
## Overview
|
|
|
|
Google Gemini's API has different structure from OpenAI. Bifrost performs extensive conversion including:
|
|
- **Role remapping** - "assistant" → "model", system messages integrated into main flow
|
|
- **Message grouping** - Consecutive tool responses merged into single user message
|
|
- **Parameter renaming** - e.g., `max_completion_tokens` → `maxOutputTokens`, `stop` → `stopSequences`
|
|
- **Function call handling** - Tool call ID preservation and thought signature support
|
|
- **Content modality** - Support for text, images, video, code execution, and thought content
|
|
- **Thinking/Reasoning** - Thinking configuration mapped to Bifrost reasoning structure
|
|
|
|
### Supported Operations
|
|
|
|
| Operation | Non-Streaming | Streaming | Endpoint |
|
|
|-----------|---------------|-----------|----------|
|
|
| Chat Completions | ✅ | ✅ | `/v1beta/models/{model}:generateContent` |
|
|
| Responses API | ✅ | ✅ | `/v1beta/models/{model}:generateContent` |
|
|
| Speech (TTS) | ✅ | ✅ | `/v1beta/models/{model}:generateContent` |
|
|
| Transcriptions (STT) | ✅ | ✅ | `/v1beta/models/{model}:generateContent` |
|
|
| Image Generation | ✅ | - | `/v1beta/models/{model}:generateContent` or `/v1beta/models/{model}:predict` (Imagen) |
|
|
| Image Edit | ✅ | - | `/v1beta/models/{model}:generateContent` or `/v1beta/models/{model}:predict` (Imagen) |
|
|
| Video Generation | ✅ | - | `/v1beta/models/{model}:predictLongRunning` |
|
|
| Image Variation | ❌ | - | Not supported |
|
|
| Embeddings | ✅ | - | `/v1beta/models/{model}:embedContent` |
|
|
| Files | ✅ | - | `/upload/storage/v1beta/files` |
|
|
| Batch | ✅ | - | `/v1beta/batchJobs` |
|
|
| List Models | ✅ | - | `/v1beta/models` |
|
|
|
|
---
|
|
|
|
## Authentication
|
|
|
|
Gemini supports API key authentication in addition to OAuth2 Bearer token authentication. The implementation conditionally uses the appropriate method based on the endpoint type.
|
|
|
|
### API Key Authentication
|
|
|
|
API key authentication is supported via two methods:
|
|
|
|
1. **Header Method** (standard Gemini endpoints):
|
|
- Format: `x-goog-api-key: YOUR_API_KEY` header
|
|
- Used for: Standard Gemini endpoints (e.g., `/v1beta/models/{model}:generateContent`)
|
|
|
|
2. **Query Parameter Method** (Imagen and custom endpoints):
|
|
- Format: `?key=YOUR_API_KEY` appended to request URLs
|
|
- Used for: Imagen models and custom endpoints
|
|
- Example: `https://generativelanguage.googleapis.com/v1beta/models/imagen-4.0-generate-001:predict?key=YOUR_API_KEY`
|
|
|
|
Bifrost automatically selects the appropriate authentication method based on the endpoint type.
|
|
|
|
---
|
|
|
|
# 1. Chat Completions
|
|
|
|
## Request Parameters
|
|
|
|
### Parameter Mapping
|
|
|
|
| Parameter | Transformation |
|
|
|-----------|----------------|
|
|
| `max_completion_tokens` | Renamed to `maxOutputTokens` |
|
|
| `temperature`, `top_p` | Direct pass-through |
|
|
| `stop` | Renamed to `stopSequences` |
|
|
| `response_format` | Converted to `responseMimeType` and `responseJsonSchema` |
|
|
| `tools` | Schema restructured (see [Tool Conversion](#tool-conversion)) |
|
|
| `tool_choice` | Mapped to `functionCallingConfig` (see [Tool Conversion](#tool-conversion)) |
|
|
| `reasoning` | Mapped to `thinkingConfig` (see [Reasoning / Thinking](#reasoning--thinking)) |
|
|
| `top_k` | Via `extra_params` (Gemini-specific) |
|
|
| `presence_penalty`, `frequency_penalty` | Via `extra_params` |
|
|
| `seed` | Via `extra_params` |
|
|
|
|
### Dropped Parameters
|
|
|
|
The following parameters are silently ignored: `logit_bias`, `logprobs`, `top_logprobs`, `parallel_tool_calls`, `service_tier`
|
|
|
|
### Extra Parameters
|
|
|
|
Use `extra_params` (SDK) or pass directly in request body (Gateway) for Gemini-specific fields:
|
|
|
|
<Tabs>
|
|
<Tab title="Gateway">
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8080/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "gemini/gemini-2.0-flash",
|
|
"messages": [{"role": "user", "content": "Hello"}],
|
|
"top_k": 40,
|
|
"stop_sequences": ["###"]
|
|
}'
|
|
```
|
|
|
|
</Tab>
|
|
<Tab title="Go SDK">
|
|
|
|
```go
|
|
resp, err := client.ChatCompletionRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostChatRequest{
|
|
Provider: schemas.Gemini,
|
|
Model: "gemini-2.0-flash",
|
|
Input: messages,
|
|
Params: &schemas.ChatParameters{
|
|
ExtraParams: map[string]interface{}{
|
|
"top_k": 40,
|
|
"stop_sequences": []string{"###"},
|
|
},
|
|
},
|
|
})
|
|
```
|
|
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
## Reasoning / Thinking
|
|
|
|
**Documentation**: See [Bifrost Reasoning Reference](/providers/reasoning)
|
|
|
|
### Parameter Mapping
|
|
|
|
- `reasoning.effort` → `thinkingConfig.thinkingLevel` ("low" → `LOW`, "high" → `HIGH`)
|
|
- `reasoning.max_tokens` → `thinkingConfig.thinkingBudget` (token budget for thinking)
|
|
- `reasoning` parameter triggers `thinkingConfig.includeThoughts = true`
|
|
|
|
### Supported Thinking Levels
|
|
|
|
- `"low"` / `"minimal"` → `LOW`
|
|
- `"medium"` / `"high"` → `HIGH`
|
|
- `null` or unspecified → Based on `max_tokens`: -1 (dynamic), 0 (disabled), or specific budget
|
|
|
|
### Example
|
|
|
|
```json
|
|
// Request
|
|
{"reasoning": {"effort": "high", "max_tokens": 10000}}
|
|
|
|
// Gemini conversion
|
|
{"thinkingConfig": {"includeThoughts": true, "thinkingLevel": "HIGH", "thinkingBudget": 10000}}
|
|
```
|
|
|
|
## Message Conversion
|
|
|
|
### Critical Caveats
|
|
|
|
- **Role remapping**: "assistant" → "model", "system" → part of user/model content flow
|
|
- **Consecutive tool responses**: Tool response messages merged into single user message with function response parts
|
|
- **Content flattening**: Multi-part content in single message preserved as parts array
|
|
|
|
### Image Conversion
|
|
|
|
- **URL images**: `{type: "image_url", image_url: {url: "..."}}` → `{type: "image", source: {type: "url", url: "..."}}`
|
|
- **Base64 images**: Data URL → `{type: "image", source: {type: "base64", media_type: "image/png", ...}}`
|
|
- **Video content**: Preserved with metadata (fps, start/end offset)
|
|
|
|
## Tool Conversion
|
|
|
|
Tool definitions are restructured with these mappings:
|
|
- `function.name` → `functionDeclarations.name` (preserved)
|
|
- `function.parameters` → `functionDeclarations.parameters` (Schema format)
|
|
- `function.description` → `functionDeclarations.description`
|
|
- `function.strict` → Dropped (not supported by Gemini)
|
|
|
|
### Tool Choice Mapping
|
|
|
|
| OpenAI | Gemini |
|
|
|--------|--------|
|
|
| `"auto"` | `AUTO` (default) |
|
|
| `"none"` | `NONE` |
|
|
| `"required"` | `ANY` |
|
|
| Specific tool | `ANY` with `allowedFunctionNames` |
|
|
|
|
## Response Conversion
|
|
|
|
### Field Mapping
|
|
|
|
- `finishReason` → `finish_reason`:
|
|
- `STOP` → `stop`
|
|
- `MAX_TOKENS` → `length`
|
|
- `SAFETY`, `RECITATION`, `LANGUAGE`, `BLOCKLIST`, `PROHIBITED_CONTENT`, `SPII`, `IMAGE_SAFETY` → `content_filter`
|
|
- `MALFORMED_FUNCTION_CALL`, `UNEXPECTED_TOOL_CALL` → `tool_calls`
|
|
|
|
- `candidates[0].content.parts[0].text` → `choices[0].message.content` (if single text block)
|
|
- `candidates[0].content.parts[].functionCall` → `choices[0].message.tool_calls`
|
|
- `promptTokenCount` → `usage.prompt_tokens`
|
|
- `candidatesTokenCount` → `usage.completion_tokens`
|
|
- `totalTokenCount` → `usage.total_tokens`
|
|
- `cachedContentTokenCount` → `usage.prompt_tokens_details.cached_tokens`
|
|
- `thoughtsTokenCount` → `usage.completion_tokens_details.reasoning_tokens`
|
|
- Thought content (from `text` parts with `thought: true`) → `reasoning` field in stream deltas
|
|
- Function call `args` (map) → JSON string `arguments`
|
|
|
|
## Streaming
|
|
|
|
Event structure:
|
|
- Streaming responses contain deltas in `delta.content` (text), `delta.reasoning` (thoughts), `delta.toolCalls` (function calls)
|
|
- Function responses appear as text content in the delta
|
|
- `finish_reason` only set on final chunk
|
|
- Usage metadata only included in final chunk
|
|
|
|
---
|
|
|
|
# 2. Responses API
|
|
|
|
The Responses API uses the same underlying `/generateContent` endpoint but converts between OpenAI's Responses format and Gemini's Messages format.
|
|
|
|
## Request Parameters
|
|
|
|
### Parameter Mapping
|
|
|
|
| Parameter | Transformation |
|
|
|-----------|----------------|
|
|
| `max_output_tokens` | Renamed to `maxOutputTokens` |
|
|
| `temperature`, `top_p` | Direct pass-through |
|
|
| `instructions` | Converted to system instruction text |
|
|
| `input` (string or array) | Converted to messages |
|
|
| `tools` | Schema restructured (see [Chat Completions](#1-chat-completions)) |
|
|
| `tool_choice` | Type mapped (see [Chat Completions](#1-chat-completions)) |
|
|
| `reasoning` | Mapped to `thinkingConfig` (see [Reasoning / Thinking](#reasoning--thinking)) |
|
|
| `text` | Maps to `responseMimeType` and `responseJsonSchema` |
|
|
| `stop` | Via `extra_params`, renamed to `stopSequences` |
|
|
| `top_k` | Via `extra_params` |
|
|
|
|
### Extra Parameters
|
|
|
|
Use `extra_params` (SDK) or pass directly in request body (Gateway):
|
|
|
|
<Tabs>
|
|
<Tab title="Gateway">
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8080/v1/responses \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "gemini/gemini-2.0-flash",
|
|
"input": "Hello, how are you?",
|
|
"instructions": "You are a helpful assistant.",
|
|
"top_k": 40
|
|
}'
|
|
```
|
|
|
|
</Tab>
|
|
<Tab title="Go SDK">
|
|
|
|
```go
|
|
resp, err := client.ResponsesRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostResponsesRequest{
|
|
Provider: schemas.Gemini,
|
|
Model: "gemini-2.0-flash",
|
|
Input: messages,
|
|
Params: &schemas.ResponsesParameters{
|
|
Instructions: schemas.Ptr("You are a helpful assistant."),
|
|
ExtraParams: map[string]interface{}{
|
|
"top_k": 40,
|
|
},
|
|
},
|
|
})
|
|
```
|
|
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
## Input & Instructions
|
|
|
|
- **Input**: String wrapped as user message or array converted to messages
|
|
- **Instructions**: Becomes system instruction (single text block)
|
|
|
|
## Tool Support
|
|
|
|
Supported types: `function`, `computer_use_preview`, `web_search`, `mcp`
|
|
|
|
Tool conversions same as [Chat Completions](#1-chat-completions) with:
|
|
- Computer tools auto-configured (if specified in Bifrost request)
|
|
- Function-based tools always enabled
|
|
|
|
## Response Conversion
|
|
|
|
- `finishReason` → `status`: `STOP`/`MAX_TOKENS`/other → `completed` | `SAFETY` → `incomplete`
|
|
- Output items conversion:
|
|
- Text parts → `message` field
|
|
- Function calls → `function_call` field
|
|
- Thought content → `reasoning` field
|
|
- Usage fields preserved with cache tokens mapped to `*_tokens_details.cached_tokens`
|
|
|
|
## Streaming
|
|
|
|
Event structure: Similar to Chat Completions streaming
|
|
- `content_part.added` emitted for text and reasoning parts
|
|
- Item IDs generated as `msg_{responseID}_item_{outputIndex}`
|
|
|
|
---
|
|
|
|
# 3. Speech (Text-to-Speech)
|
|
|
|
Speech synthesis uses the underlying chat generation endpoint with audio response modality.
|
|
|
|
## Request Parameters
|
|
|
|
| Parameter | Transformation |
|
|
|-----------|----------------|
|
|
| `input` | Text to synthesize → `contents[0].parts[0].text` |
|
|
| `voice` | Voice name → `generationConfig.speechConfig.voiceConfig.prebuiltVoiceConfig.voiceName` |
|
|
| `response_format` | Only "wav" supported (default); auto-converted from PCM |
|
|
|
|
### Voice Configuration
|
|
|
|
**Single Voice**:
|
|
```json
|
|
{
|
|
"generationConfig": {
|
|
"responseModalities": ["AUDIO"],
|
|
"speechConfig": {
|
|
"voiceConfig": {
|
|
"prebuiltVoiceConfig": {
|
|
"voiceName": "Chant-Female"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Multi-Speaker**:
|
|
```json
|
|
{
|
|
"generationConfig": {
|
|
"responseModalities": ["AUDIO"],
|
|
"speechConfig": {
|
|
"multiSpeakerVoiceConfig": {
|
|
"speakerVoiceConfigs": [
|
|
{
|
|
"speaker": "Character A",
|
|
"voiceConfig": {
|
|
"prebuiltVoiceConfig": {
|
|
"voiceName": "Chant-Female"
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Response Conversion
|
|
|
|
- Audio data extracted from `candidates[0].content.parts[].inlineData`
|
|
- **Format conversion**: Gemini returns PCM audio (s16le, 24kHz, mono)
|
|
- **Auto-conversion**: PCM → WAV when `response_format: "wav"` (default)
|
|
- Raw audio returned if `response_format` is omitted or empty string
|
|
|
|
### Supported Voices
|
|
|
|
Common Gemini voices include:
|
|
- `Chant-Female` - Female voice
|
|
- `Chant-Male` - Male voice
|
|
- Additional voices depend on model capabilities
|
|
|
|
Check model documentation for complete list of supported voices.
|
|
|
|
---
|
|
|
|
# 4. Transcriptions (Speech-to-Text)
|
|
|
|
Transcriptions are implemented as chat completions with audio content and text prompts.
|
|
|
|
## Request Parameters
|
|
|
|
| Parameter | Transformation |
|
|
|-----------|----------------|
|
|
| `file` | Audio bytes → `contents[].parts[].inlineData` |
|
|
| `prompt` | Instructions → `contents[0].parts[0].text` (defaults to "Generate a transcript of the speech.") |
|
|
| `language` | Via `extra_params` (if supported by model) |
|
|
|
|
### Audio Input Handling
|
|
|
|
Audio is sent as inline data with auto-detected MIME type:
|
|
```json
|
|
{
|
|
"contents": [
|
|
{
|
|
"parts": [
|
|
{
|
|
"text": "<prompt text>"
|
|
},
|
|
{
|
|
"inlineData": {
|
|
"mimeType": "audio/wav",
|
|
"data": "<base64-encoded-audio>"
|
|
}
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Extra Parameters
|
|
|
|
Safety settings and caching can be configured:
|
|
|
|
<Tabs>
|
|
<Tab title="Gateway">
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8080/v1/audio/transcriptions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "gemini/gemini-2.0-flash",
|
|
"file": "<binary-audio-data>",
|
|
"prompt": "Transcribe this audio in the original language."
|
|
}'
|
|
```
|
|
|
|
</Tab>
|
|
<Tab title="Go SDK">
|
|
|
|
```go
|
|
resp, err := client.TranscriptionRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostTranscriptionRequest{
|
|
Provider: schemas.Gemini,
|
|
Model: "gemini-2.0-flash",
|
|
Input: &schemas.TranscriptionInput{
|
|
File: audioBytes,
|
|
},
|
|
Params: &schemas.TranscriptionParameters{
|
|
Prompt: schemas.Ptr("Transcribe this audio."),
|
|
ExtraParams: map[string]interface{}{
|
|
"safety_settings": [...],
|
|
},
|
|
},
|
|
})
|
|
```
|
|
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
## Response Conversion
|
|
|
|
- Transcribed text extracted from `candidates[0].content.parts[].text`
|
|
- `task` set to `"transcribe"`
|
|
- Usage metadata mapped:
|
|
- `promptTokenCount` → `input_tokens`
|
|
- `candidatesTokenCount` → `output_tokens`
|
|
- `totalTokenCount` → `total_tokens`
|
|
|
|
---
|
|
|
|
# 5. Embeddings
|
|
|
|
<Note>
|
|
Supports both single text and batch text embeddings via batch requests.
|
|
</Note>
|
|
|
|
**Request Parameters**:
|
|
- `input` → `requests[0].content.parts[0].text` (single text joins arrays with space)
|
|
- `dimensions` → `outputDimensionality`
|
|
- Extra task type and title via `extra_params`
|
|
|
|
**Response Mapping**:
|
|
- `embeddings[].values` → Bifrost embedding array
|
|
- `metadata.billableCharacterCount` → Usage prompt tokens (fallback)
|
|
- Token counts extracted from usage metadata
|
|
|
|
---
|
|
|
|
# 6. Batch API
|
|
|
|
**Request formats**: Inline requests array or file-based input
|
|
|
|
**Pagination**: Token-based with `pageToken`
|
|
|
|
**Endpoints**:
|
|
- POST `/v1beta/batchJobs` - Create
|
|
- GET `/v1beta/batchJobs?pageSize={limit}&pageToken={token}` - List
|
|
- GET `/v1beta/batchJobs/{batch_id}` - Retrieve
|
|
- POST `/v1beta/batchJobs/{batch_id}:cancel` - Cancel
|
|
|
|
**Response Structure**:
|
|
- Status mapping: `BATCH_STATE_PENDING`/`BATCH_STATE_RUNNING` → `in_progress`, `BATCH_STATE_SUCCEEDED` → `completed`, `BATCH_STATE_FAILED` → `failed`, `BATCH_STATE_CANCELLING` → `cancelling`, `BATCH_STATE_CANCELLED` → `cancelled`, `BATCH_STATE_EXPIRED` → `expired`
|
|
- Inline responses: Array in `dest.inlinedResponses`
|
|
- File-based responses: JSONL file in `dest.fileName`
|
|
|
|
**Note**: RFC3339 timestamps converted to Unix timestamps
|
|
|
|
---
|
|
|
|
# 7. Files API
|
|
|
|
<Note>
|
|
Supports file upload for batch processing and multimodal requests.
|
|
</Note>
|
|
|
|
**Upload**: Multipart/form-data with `file` (binary) and `filename` (optional)
|
|
|
|
**Field mapping**:
|
|
- `name` → `id`
|
|
- `displayName` → `filename`
|
|
- `sizeBytes` → `size_bytes`
|
|
- `mimeType` → `content_type`
|
|
- `createTime` (RFC3339) → Converted to Unix timestamp
|
|
|
|
**Endpoints**:
|
|
- POST `/upload/storage/v1beta/files` - Upload
|
|
- GET `/v1beta/files?limit={limit}&pageToken={token}` (cursor pagination)
|
|
- GET `/v1beta/files/{file_id}` - Retrieve
|
|
- DELETE `/v1beta/files/{file_id}` - Delete
|
|
- GET `/v1beta/files/{file_id}/content` - Download
|
|
|
|
---
|
|
|
|
# 8. Image Generation
|
|
|
|
Gemini supports two image generation formats depending on the model:
|
|
|
|
1. **Standard Gemini Format**: Uses the `/v1beta/models/{model}:generateContent` endpoint
|
|
2. **Imagen Format**: Uses the `/v1beta/models/{model}:predict` endpoint for Imagen models (detected automatically)
|
|
|
|
### Parameter Mapping
|
|
|
|
| Parameter | Transformation |
|
|
|-----------|----------------|
|
|
| `prompt` | Text description of the image to generate |
|
|
| `n` | Number of images (mapped to `sampleCount` for Imagen, `candidateCount` for Gemini) |
|
|
| `size` | Image size in WxH format (e.g., `"1024x1024"`). Converted to Imagen's `imageSize` + `aspectRatio` format |
|
|
| `output_format` | Output format: `"png"`, `"jpeg"`, `"webp"`. Converted to MIME type for Imagen |
|
|
| `seed` | Seed for reproducible generation (passed directly) |
|
|
| `negative_prompt` | Negative prompt (passed directly) |
|
|
|
|
### Extra Parameters
|
|
|
|
Use `extra_params` (SDK) or pass directly in request body (Gateway) for Gemini-specific fields:
|
|
|
|
| Parameter | Type | Notes |
|
|
|-----------|------|-------|
|
|
| `personGeneration` | string | Person generation setting (Imagen only) |
|
|
| `language` | string | Language code (Imagen only) |
|
|
| `enhancePrompt` | bool | Prompt enhancement flag (Imagen only) |
|
|
| `safetySettings` / `safety_settings` | string/array | Safety settings configuration |
|
|
| `cachedContent` / `cached_content` | string | Cached content ID |
|
|
| `labels` | object | Custom labels map |
|
|
|
|
<Tabs>
|
|
<Tab title="Gateway">
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8080/v1/images/generations \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "gemini/imagen-4.0-generate-001",
|
|
"prompt": "A sunset over the mountains",
|
|
"size": "1024x1024",
|
|
"n": 2,
|
|
"output_format": "png"
|
|
}'
|
|
```
|
|
|
|
</Tab>
|
|
<Tab title="Go SDK">
|
|
|
|
```go
|
|
resp, err := client.ImageGenerationRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostImageGenerationRequest{
|
|
Provider: schemas.Gemini,
|
|
Model: "imagen-4.0-generate-001",
|
|
Input: &schemas.ImageGenerationInput{
|
|
Prompt: "A sunset over the mountains",
|
|
},
|
|
Params: &schemas.ImageGenerationParameters{
|
|
Size: schemas.Ptr("1024x1024"),
|
|
N: schemas.Ptr(2),
|
|
OutputFormat: schemas.Ptr("png"),
|
|
},
|
|
})
|
|
```
|
|
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
## Request Conversion
|
|
|
|
### Standard Gemini Format
|
|
|
|
- **Model mapping**: `bifrostReq.Model` → `req.Model`, with `bifrostReq.Input.Prompt` → `req.Contents[0].Parts[0].Text`
|
|
- **Response modality**: Set by bifrost internally to `generationConfig.responseModalities = ["IMAGE"]` to indicate image generation
|
|
- **Image count**: Specify number of images via `n` → `generationConfig.candidateCount`
|
|
- **Extra parameters**: Include `safetySettings`, `cachedContent`, and `labels` mapped directly
|
|
|
|
### Imagen Format
|
|
|
|
- **Prompt**: `bifrostReq.Prompt` → `req.Instances[0].Prompt`
|
|
- **Number of Images**: `n` → `req.Parameters.SampleCount`
|
|
- **Size Conversion**: `size` (WxH format) converted to:
|
|
- `imageSize`: `"1k"` (if dimensions ≤ 1024), `"2k"` (if dimensions ≤ 2048). Sizes larger than `"2k"` are not supported by Imagen models.
|
|
- `aspectRatio`: `"1:1"`, `"3:4"`, `"4:3"`, `"9:16"`, or `"16:9"` (based on width/height ratio)
|
|
- **Output Format**: `output_format` (`"png"`, `"jpeg"`) → `parameters.outputOptions.mimeType` (`"image/png"`, `"image/jpeg"`)
|
|
- **Seed & Negative Prompt**: Passed directly to `seed` and `parameters.negativePrompt`
|
|
- **Extra Parameters**: `personGeneration`, `language`, `enhancePrompt`, `safetySettings` mapped to parameters
|
|
|
|
## Response Conversion
|
|
|
|
### Standard Gemini Format
|
|
|
|
- **Image Data**: Extracts `InlineData` from `candidates[0].content.parts[]` with MIME type `image/*`
|
|
- **Output Format**: Converts MIME type (`image/png`, `image/jpeg`, `image/webp`) → file extension (`png`, `jpeg`, `webp`)
|
|
- **Usage**: Extracts token usage from `usageMetadata`
|
|
- **Multiple Images**: Each image part becomes an `ImageData` entry in the response array
|
|
|
|
### Imagen Format
|
|
|
|
- **Image Data**: Each `prediction` in `response.predictions[]` → `ImageData` with `b64_json` from `bytesBase64Encoded`
|
|
- **Output Format**: Converts `prediction.mimeType` → file extension for `outputFormat` field (Imagen doesnt support webp)
|
|
- **Index**: Each prediction gets an `index` (0, 1, 2, ...) in the response array
|
|
|
|
## Size Conversion
|
|
|
|
For Imagen format, size is converted between formats:
|
|
|
|
**Supported Image Sizes**: `"1k"` (≤1024), `"2k"` (≤2048)
|
|
|
|
**Supported Aspect Ratios**: `"1:1"`, `"3:4"`, `"4:3"`, `"9:16"`, `"16:9"`
|
|
|
|
## Endpoint Selection
|
|
|
|
The provider automatically selects the endpoint based on model name:
|
|
- **Imagen models** (detected via `schemas.IsImagenModel()`): Uses `/v1beta/models/{model}:predict` endpoint
|
|
- **Other models**: Uses `/v1beta/models/{model}:generateContent` endpoint with image response modality
|
|
|
|
## Streaming
|
|
|
|
Image generation streaming is not supported by Gemini.
|
|
|
|
---
|
|
|
|
# 9. Image Edit
|
|
|
|
<Warning>
|
|
Requests use **multipart/form-data**, not JSON.
|
|
</Warning>
|
|
|
|
Gemini supports image editing through two different APIs depending on the model:
|
|
|
|
1. **Standard Gemini Format**: Uses the `/v1beta/models/{model}:generateContent` endpoint (for Gemini models)
|
|
2. **Imagen Format**: Uses the `/v1beta/models/{model}:predict` endpoint (for Imagen models, detected automatically)
|
|
|
|
**Request Parameters**
|
|
|
|
| Parameter | Type | Required | Notes |
|
|
|-----------|------|----------|-------|
|
|
| `model` | string | ✅ | Model identifier (Gemini or Imagen model) |
|
|
| `prompt` | string | ✅ | Text description of the edit |
|
|
| `image[]` | binary | ✅ | Image file(s) to edit (supports multiple images) |
|
|
| `mask` | binary | ❌ | Mask image file |
|
|
| `type` | string | ❌ | Edit type: `"inpainting"`, `"outpainting"`, `"inpaint_removal"`, `"bgswap"` (Imagen only) |
|
|
| `n` | int | ❌ | Number of images to generate (1-10) |
|
|
| `output_format` | string | ❌ | Output format: `"png"`, `"webp"`, `"jpeg"` |
|
|
| `output_compression` | int | ❌ | Compression level (0-100%) |
|
|
| `seed` | int | ❌ | Seed for reproducibility (via `ExtraParams["seed"]`) |
|
|
| `negative_prompt` | string | ❌ | Negative prompt (via `ExtraParams["negativePrompt"]`) |
|
|
| `guidanceScale` | int | ❌ | Guidance scale (via `ExtraParams["guidanceScale"]`, Imagen only) |
|
|
| `baseSteps` | int | ❌ | Base steps (via `ExtraParams["baseSteps"]`, Imagen only) |
|
|
| `maskMode` | string | ❌ | Mask mode (via `ExtraParams["maskMode"]`, Imagen only): `"MASK_MODE_USER_PROVIDED"`, `"MASK_MODE_BACKGROUND"`, `"MASK_MODE_FOREGROUND"`, `"MASK_MODE_SEMANTIC"` |
|
|
| `dilation` | float | ❌ | Mask dilation (via `ExtraParams["dilation"]`, Imagen only): Range [0, 1] |
|
|
| `maskClasses` | int[] | ❌ | Mask classes (via `ExtraParams["maskClasses"]`, Imagen only): For `MASK_MODE_SEMANTIC` |
|
|
|
|
---
|
|
|
|
**Request Conversion**
|
|
|
|
### Standard Gemini Format (Non-Imagen Models)
|
|
|
|
- **Model & Prompt**: `bifrostReq.Model` → `req.Model`, `bifrostReq.Input.Prompt` → `req.Contents[0].Parts[0].Text`
|
|
- **Images**: Each image in `bifrostReq.Input.Images` is converted to a `Part` with:
|
|
- MIME type detection (`image/jpeg`, `image/webp`, `image/png`) with fallback to `image/png`
|
|
- Base64 encoding: `image.Image` → `Part.InlineData.Data` (base64 string)
|
|
- MIME type: `Part.InlineData.MIMEType`
|
|
- **Response Modality**: `GenerationConfig.ResponseModalities` is set to `[ModalityImage]` to indicate image generation
|
|
- **Extra Parameters**: Extracted from `ExtraParams`:
|
|
- `safetySettings` / `safety_settings` → `SafetySettings`
|
|
- `cachedContent` / `cached_content` → `CachedContent`
|
|
- `labels` → `Labels` (map[string]string)
|
|
|
|
### Imagen Format (Imagen Models)
|
|
|
|
- **Reference Images**: Each image in `bifrostReq.Input.Images` is converted to `ReferenceImage` with:
|
|
- `ReferenceType`: `"REFERENCE_TYPE_RAW"`
|
|
- `ReferenceID`: Sequential IDs starting from 1
|
|
- `ReferenceImage.BytesBase64Encoded`: Base64-encoded image data
|
|
- **Mask Configuration**: If `Params.Mask` is provided or `maskMode` is specified:
|
|
- Default `maskMode`: `"MASK_MODE_USER_PROVIDED"` when mask data is present
|
|
- `maskMode` can be overridden via `ExtraParams["maskMode"]`
|
|
- `dilation` extracted from `ExtraParams["dilation"]` (validated to range [0, 1])
|
|
- `maskClasses` extracted from `ExtraParams["maskClasses"]` (for `MASK_MODE_SEMANTIC`)
|
|
- Mask image (if provided) is base64-encoded and added as `ReferenceType: "REFERENCE_TYPE_MASK"`
|
|
- **Edit Mode Mapping**: `Params.Type` is mapped to `EditMode`:
|
|
- `"inpainting"` → `"EDIT_MODE_INPAINT_INSERTION"`
|
|
- `"outpainting"` → `"EDIT_MODE_OUTPAINT"`
|
|
- `"inpaint_removal"` → `"EDIT_MODE_INPAINT_REMOVAL"`
|
|
- `"bgswap"` → `"EDIT_MODE_BGSWAP"`
|
|
- If `Type` is not set, `editMode` can be specified directly via `ExtraParams["editMode"]`
|
|
- **Parameters**:
|
|
- `n` → `Parameters.SampleCount`
|
|
- `output_format` → `Parameters.OutputOptions.MimeType` (converted: `"png"` → `"image/png"`, etc.)
|
|
- `output_compression` → `Parameters.OutputOptions.CompressionQuality`
|
|
- `seed` (via `ExtraParams["seed"]`) → `Parameters.Seed`
|
|
- `negativePrompt` (via `ExtraParams["negativePrompt"]`) → `Parameters.NegativePrompt`
|
|
- `guidanceScale` (via `ExtraParams["guidanceScale"]`) → `Parameters.GuidanceScale`
|
|
- `baseSteps` (via `ExtraParams["baseSteps"]`) → `Parameters.BaseSteps`
|
|
- Additional Imagen-specific parameters: `addWatermark`, `includeRaiReason`, `includeSafetyAttributes`, `personGeneration`, `safetySetting`, `language`, `storageUri`
|
|
|
|
**Response Conversion**
|
|
|
|
- **Standard Gemini Format**: Uses the same response conversion as image generation (see Image Generation section)
|
|
- **Imagen Format**: Uses the same response conversion as Imagen image generation (see Image Generation section)
|
|
|
|
**Endpoint Selection**
|
|
|
|
The provider automatically selects the endpoint based on model name:
|
|
- **Imagen models** (detected via `schemas.IsImagenModel()`): Uses `/v1beta/models/{model}:predict` endpoint
|
|
- **Other models**: Uses `/v1beta/models/{model}:generateContent` endpoint with image response modality
|
|
|
|
**Streaming**
|
|
|
|
Image edit streaming is not supported by Gemini.
|
|
|
|
**Image Variation**
|
|
|
|
Image variation is not supported by Gemini.
|
|
|
|
---
|
|
|
|
# 10. List Models
|
|
|
|
**Request**: GET `/v1beta/models?pageSize={limit}&pageToken={token}` (no body)
|
|
|
|
**Field mapping**:
|
|
- `name` (remove "models/" prefix) → `id` (add "gemini/" prefix)
|
|
- `displayName` → `name`
|
|
- `description` → `description`
|
|
- `inputTokenLimit` → `max_input_tokens`
|
|
- `outputTokenLimit` → `max_output_tokens`
|
|
- Context length = `inputTokenLimit + outputTokenLimit`
|
|
|
|
**Pagination**: Token-based with `nextPageToken`
|
|
|
|
---
|
|
|
|
# 11. Video Generation
|
|
|
|
### Generate (`POST /v1/videos`)
|
|
|
|
Requests use **JSON body (`application/json`)**.
|
|
|
|
**Request Parameters**
|
|
|
|
| Parameter | Type | Required | Notes |
|
|
|-----------|------|----------|-------|
|
|
| `model` | string | ✅ | Veo model (e.g., `veo-3.1-generate-preview`) |
|
|
| `prompt` | string | ✅ | Text description of the video |
|
|
| `input_reference` | string | ❌ | Input image for image-to-video |
|
|
| `seconds` | string | ❌ | Duration → `durationSeconds` |
|
|
| `size` | string | ❌ | Resolution → aspect ratio (`1280x720` → `16:9`, `720x1280` → `9:16`) |
|
|
| `negative_prompt` | string | ❌ | What to avoid in the video |
|
|
| `seed` | int | ❌ | Seed for reproducibility |
|
|
| `audio` | bool | ❌ | Enable audio generation → `generateAudio` |
|
|
| `video_uri` | string | ❌ | GCS video URI for video extension |
|
|
|
|
**Extra Params** (any unrecognized JSON field is forwarded as `extra_params`)
|
|
|
|
| Key | Notes |
|
|
|-----|-------|
|
|
| `aspectRatio` | Override the aspect ratio directly (e.g., `"16:9"`, `"9:16"`). Takes precedence over `size` |
|
|
| `resolution` | Native Gemini resolution string |
|
|
| `sampleCount` | Number of samples to generate |
|
|
| `personGeneration` | Person generation policy |
|
|
| `numberOfVideos` | Number of videos to generate |
|
|
| `storageURI` | GCS bucket for output storage |
|
|
| `compressionQuality` | Output compression quality |
|
|
| `enhancePrompt` | Auto-enhance the prompt |
|
|
| `resizeMode` | How to handle size mismatches |
|
|
| `reference_images` | Style/asset reference image objects |
|
|
| `lastFrame` | Last frame image object for interpolation |
|
|
|
|
**Response**: [`BifrostVideoGenerationResponse`](https://github.com/maximhq/bifrost/blob/main/core/schemas/videos.go) — `id`, `status`, `videos[]`
|
|
|
|
If Gemini filters content for safety, `status` is `failed` and `content_filter` describes the reason.
|
|
|
|
**Job Statuses**: `in_progress` → `completed` / `failed`
|
|
|
|
### Retrieve / Download
|
|
|
|
| Operation | Endpoint | Notes |
|
|
|-----------|----------|-------|
|
|
| Get status | `GET /v1/videos/{id}` | Polls the long-running operation |
|
|
| Download | `GET /v1/videos/{id}/content` | Downloads from GCS URI or decodes base64 video |
|
|
|
|
Video Delete, List, and Remix are not supported.
|
|
|
|
---
|
|
|
|
## Content Type Support
|
|
|
|
Bifrost supports the following content modalities through Gemini:
|
|
|
|
| Content Type | Support | Notes |
|
|
|--------------|---------|-------|
|
|
| Text | ✅ | Full support |
|
|
| Images (URL/Base64) | ✅ | Converted to `{type: "image", source: {...}}` |
|
|
| Video | ✅ | With fps, start/end offset metadata |
|
|
| Audio | ⚠️ | Via file references only |
|
|
| PDF | ✅ | Via file references |
|
|
| Code Execution | ✅ | Auto-executed with results returned |
|
|
| Thinking/Reasoning | ✅ | Thought parts marked with `thought: true` |
|
|
| Function Calls | ✅ | With optional thought signatures |
|
|
|
|
---
|
|
|
|
## Caveats
|
|
|
|
<Accordion title="Tool Response Grouping">
|
|
**Severity**: High
|
|
**Behavior**: Consecutive tool response messages merged into single user message
|
|
**Impact**: Message count and structure changes
|
|
**Code**: `chat.go:627-678`
|
|
</Accordion>
|
|
|
|
<Accordion title="Thinking Content Handling">
|
|
**Severity**: Medium
|
|
**Behavior**: Thought content appears as `text` parts with `thought: true` flag
|
|
**Impact**: Requires checking `thought` flag to distinguish from regular text
|
|
**Code**: `chat.go:242-244, 302-304`
|
|
</Accordion>
|
|
|
|
<Accordion title="Function Call Arguments Serialization">
|
|
**Severity**: Low
|
|
**Behavior**: Tool call `args` (object) converted to `arguments` (JSON string)
|
|
**Impact**: Requires JSON parsing to access arguments
|
|
**Code**: `chat.go:101-106`
|
|
</Accordion>
|
|
|
|
<Accordion title="Thought Signature Base64 Encoding">
|
|
**Severity**: Low
|
|
**Behavior**: `thoughtSignature` base64 URL-safe encoded, auto-converted during unmarshal
|
|
**Impact**: Transparent to user; handled automatically
|
|
**Code**: `types.go:1048-1063`
|
|
</Accordion>
|
|
|
|
<Accordion title="Streaming Finish Reason Timing">
|
|
**Severity**: Medium
|
|
**Behavior**: `finish_reason` only present in final stream chunk with usage metadata
|
|
**Impact**: Cannot determine completion until end of stream
|
|
**Code**: `chat.go:206-208, 325-328`
|
|
</Accordion>
|
|
|
|
<Accordion title="Cached Content Token Reporting">
|
|
**Severity**: Low
|
|
**Behavior**: Cached tokens reported in `prompt_tokens_details.cached_tokens`, cannot distinguish cache creation vs read
|
|
**Impact**: Billing estimates may be approximate
|
|
**Code**: `utils.go:270-274`
|
|
</Accordion>
|
|
|
|
<Accordion title="System Instruction Integration">
|
|
**Severity**: Medium
|
|
**Behavior**: System instructions become `systemInstruction` field (separate from messages), not included in message array
|
|
**Impact**: Structure differs from OpenAI's system message approach
|
|
**Code**: `responses.go:34-46`
|
|
</Accordion>
|
|
|