Files
Beyhan Oğur 880f412e2c first commit
2026-04-26 21:52:23 +03:00

1119 lines
40 KiB
Plaintext

---
title: "Vertex AI"
description: "Google Vertex AI API conversion guide - multi-model support, OAuth2 authentication, project/region configuration"
icon: "v"
---
## Overview
Vertex AI is Google's unified ML platform providing access to Google's Gemini models, Anthropic Claude models, and other third-party LLMs through a single API. Bifrost performs conversions including:
- **Multi-model support** - Unified interface for Gemini, Anthropic, and third-party models
- **OAuth2 authentication** - Service account credentials with automatic token refresh
- **Project and region management** - Automatic endpoint construction from GCP project/region
- **Model routing** - Automatic provider detection (Gemini vs Anthropic) based on model name
- **Request conversion** - Conversion to underlying provider format (Gemini or Anthropic)
- **Embeddings support** - Vector generation with task type and truncation options
- **Model discovery** - Paginated model listing with deployment information
### Supported Operations
| Operation | Non-Streaming | Streaming | Endpoint |
| -------------------- | ------------- | --------- | ----------------------------------------- |
| Chat Completions | ✅ | ✅ | `/generate` |
| Responses API | ✅ | ✅ | `/messages` |
| Embeddings | ✅ | - | `/embeddings` |
| Image Generation | ✅ | - | `/generateContent` or `/predict` (Imagen) |
| Image Edit | ✅ | - | `/generateContent` or `/predict` (Imagen) |
| Video Generation | ✅ | - | `/predictLongRunning` (Veo models only) |
| Image Variation | ❌ | - | Not supported |
| List Models | ✅ | - | `/models` |
| Text Completions | ❌ | ❌ | - |
| Speech (TTS) | ❌ | ❌ | - |
| Transcriptions (STT) | ❌ | ❌ | - |
| Files | ❌ | ❌ | - |
| Batch | ❌ | ❌ | - |
<Note>
**Unsupported Operations** (❌): Text Completions, Speech, Transcriptions, Files, and Batch are not supported by Vertex AI. These return `UnsupportedOperationError`.
**Vertex-specific**: Endpoints vary by model type. Responses API available for both Gemini and Anthropic models.
</Note>
---
## Setup & Configuration
Vertex AI requires Google Cloud project configuration and authentication credentials. Three authentication methods are supported.
<Note>
The `aliases` field (mapping model names to fine-tuned model IDs or endpoint
identifiers) requires **v1.5.0-prerelease2 or later**. On v1.4.x, use
`deployments` inside `vertex_key_config` instead — see the [v1.5.0 Migration
Guide](/migration-guides/v1.5.0#breaking-change-9-provider-deployments-removed-migrate-to-aliases)
for details.
</Note>
### 1. Service Account JSON (Recommended for Production)
Provide a credential JSON string in `auth_credentials`. The JSON must contain a `type` field. Supported types: `service_account` (most common), `impersonated_service_account`, `authorized_user`, `external_account`, `external_account_authorized_user`.
<Tabs>
<Tab title="Web UI">
<Frame>
<img
src="/media/ui-vertex-service-account-auth-setup.png"
alt="Google Vertex AI Service Account (JSON) authentication setup in the Bifrost Web UI showing Project ID, Region, and Auth Credentials fields"
/>
</Frame>
1. Navigate to **"Model Providers"** → **"Configurations"** → **"Google Vertex"**
2. Click **"Add Key"** (or edit an existing key)
3. Under **Authentication Method**, select **"Service Account (JSON)"**
4. Set **Project ID**: Your Google Cloud project ID
5. Set **Project Number** (Required only for fine-tuned models): Your GCP project number; leave blank for standard models
6. Set **Region**: e.g., `us-central1`
7. Set **Auth Credentials**: Paste your service account JSON or reference an env var (e.g., `env.VERTEX_CREDENTIALS`)
8. Configure **Aliases**: Map model names to fine-tuned model IDs (if using fine-tuned models)
9. Save
</Tab>
<Tab title="API">
```bash
# Step 1: Create the provider
curl -X POST http://localhost:8080/api/providers \
-H "Content-Type: application/json" \
-d '{"provider": "vertex"}'
# Step 2: Create a key (Service Account JSON)
curl -X POST http://localhost:8080/api/providers/vertex/keys \
-H "Content-Type: application/json" \
-d '{
"name": "vertex-sa-key",
"value": "",
"models": ["*"],
"weight": 1.0,
"vertex_key_config": {
"project_id": "env.VERTEX_PROJECT_ID",
"region": "us-central1",
"auth_credentials": "env.VERTEX_CREDENTIALS"
}
}'
```
<Note>
**On v1.4.x**, two differences apply: - Pass `keys` directly in the `POST
/api/providers` body — there is no separate `/api/providers/{provider}/keys`
endpoint. - Use `deployments` inside `vertex_key_config` instead of the
top-level `aliases` field for fine-tuned model mappings.
</Note>
</Tab>
<Tab title="config.json">
```json
{
"providers": {
"vertex": {
"keys": [
{
"name": "vertex-sa-key",
"value": "",
"models": ["*"],
"weight": 1.0,
"vertex_key_config": {
"project_id": "env.VERTEX_PROJECT_ID",
"region": "us-central1",
"auth_credentials": "env.VERTEX_CREDENTIALS"
}
}
]
}
}
}
```
<Note>
On **v1.4.x**, use `deployments` inside `vertex_key_config` instead of the
top-level `aliases` field for fine-tuned model mappings.
</Note>
</Tab>
<Tab title="Go SDK">
```go
func (a *MyAccount) GetKeysForProvider(ctx *context.Context, provider schemas.ModelProvider) ([]schemas.Key, error) {
switch provider {
case schemas.Vertex:
return []schemas.Key{
{
Value: schemas.EnvVar{}, // Leave empty when using service account credentials
Models: []string{"*"},
Weight: 1.0,
VertexKeyConfig: &schemas.VertexKeyConfig{
ProjectID: *schemas.NewEnvVar("env.VERTEX_PROJECT_ID"),
Region: *schemas.NewEnvVar("us-central1"),
AuthCredentials: *schemas.NewEnvVar("env.VERTEX_CREDENTIALS"), // full service account JSON
},
},
}, nil
}
return nil, fmt.Errorf("provider %s not supported", provider)
}
```
</Tab>
</Tabs>
### 2. Application Default Credentials
Leave `auth_credentials` empty. Bifrost calls `google.FindDefaultCredentials()` — Google's ADC library — which resolves credentials in this order:
1. `GOOGLE_APPLICATION_CREDENTIALS` env var (path to a JSON credential file)
2. Application default credential file (`~/.config/gcloud/application_default_credentials.json`, written by `gcloud auth application-default login`)
3. GCE/GKE/Cloud Run/App Engine metadata server (attached service account or Workload Identity)
<Tabs>
<Tab title="Web UI">
<Frame>
<img
src="/media/ui-vertex-default-service-account-auth-setup.png"
alt="Google Vertex AI Application Default Credentials setup in the Bifrost Web UI showing Project ID and Region fields with no credential inputs"
/>
</Frame>
1. Navigate to **"Model Providers"** → **"Configurations"** → **"Google Vertex"**
2. Click **"Add Key"** (or edit an existing key)
3. Under **Authentication Method**, select **"Service Account (Attached)"**
4. Set **Project ID**: Your Google Cloud project ID
5. Set **Project Number** (Required only for fine-tuned models): Your GCP project number; leave blank for standard models
6. Set **Region**: e.g., `us-central1`
7. Configure **Aliases** if needed
8. Save
Ensure `GOOGLE_APPLICATION_CREDENTIALS` is set in your environment, or that Workload Identity / gcloud is configured.
</Tab>
<Tab title="API">
```bash
# Step 1: Create the provider
curl -X POST http://localhost:8080/api/providers \
-H "Content-Type: application/json" \
-d '{"provider": "vertex"}'
# Step 2: Create a key (Application Default Credentials)
curl -X POST http://localhost:8080/api/providers/vertex/keys \
-H "Content-Type: application/json" \
-d '{
"name": "vertex-adc-key",
"value": "",
"models": ["*"],
"weight": 1.0,
"vertex_key_config": {
"project_id": "env.VERTEX_PROJECT_ID",
"region": "us-central1",
"auth_credentials": ""
}
}'
```
<Note>
**On v1.4.x**, pass `keys` directly in the `POST /api/providers` body — there
is no separate `/api/providers/{provider}/keys` endpoint.
</Note>
</Tab>
<Tab title="config.json">
```json
{
"providers": {
"vertex": {
"keys": [
{
"name": "vertex-adc-key",
"value": "",
"models": ["*"],
"weight": 1.0,
"vertex_key_config": {
"project_id": "env.VERTEX_PROJECT_ID",
"region": "us-central1",
"auth_credentials": ""
}
}
]
}
}
}
```
</Tab>
<Tab title="Go SDK">
```go
func (a *MyAccount) GetKeysForProvider(ctx *context.Context, provider schemas.ModelProvider) ([]schemas.Key, error) {
switch provider {
case schemas.Vertex:
return []schemas.Key{
{
Value: schemas.EnvVar{},
Models: []string{"*"},
Weight: 1.0,
VertexKeyConfig: &schemas.VertexKeyConfig{
ProjectID: *schemas.NewEnvVar("env.VERTEX_PROJECT_ID"),
Region: *schemas.NewEnvVar("us-central1"),
// Leave AuthCredentials empty — uses Application Default Credentials
},
},
}, nil
}
return nil, fmt.Errorf("provider %s not supported", provider)
}
```
</Tab>
</Tabs>
### 3. API Key (Gemini and Fine-Tuned Models Only)
Set `value` to your Vertex API key. API key authentication is supported only for Gemini models and fine-tuned Gemini models. For Anthropic models on Vertex, use Service Account or Application Default Credentials.
<Tabs>
<Tab title="Web UI">
<Frame>
<img
src="/media/ui-vertex-api-key-auth-setup.png"
alt="Google Vertex AI API Key authentication setup in the Bifrost Web UI showing API Key, Project ID, Region, and Project Number fields"
/>
</Frame>
1. Navigate to **"Model Providers"** → **"Configurations"** → **"Google Vertex"**
2. Click **"Add Key"** (or edit an existing key)
3. Under **Authentication Method**, select **"API Key"**
4. Set **API Key**: Your Vertex AI API key
5. Set **Project ID**: Your Google Cloud project ID
6. Set **Project Number** (Required only for fine-tuned models): Your GCP project number; leave blank for standard models
7. Set **Region**: e.g., `us-central1`
8. Configure **Aliases**: Map short names to fine-tuned model IDs (e.g., `my-model` → `123456789`)
9. Save
</Tab>
<Tab title="API">
```bash
# Step 1: Create the provider
curl -X POST http://localhost:8080/api/providers \
-H "Content-Type: application/json" \
-d '{"provider": "vertex"}'
# Step 2: Create a key (API Key — Gemini + fine-tuned models)
curl -X POST http://localhost:8080/api/providers/vertex/keys \
-H "Content-Type: application/json" \
-d '{
"name": "vertex-api-key",
"value": "env.VERTEX_API_KEY",
"models": ["gemini-pro", "gemini-2.0-flash", "my-fine-tuned-model"],
"weight": 1.0,
"aliases": {
"my-fine-tuned-model": "123456789"
},
"vertex_key_config": {
"project_id": "env.VERTEX_PROJECT_ID",
"project_number": "env.VERTEX_PROJECT_NUMBER",
"region": "us-central1"
}
}'
```
<Note>
**On v1.4.x**, two differences apply:
- Pass `keys` directly in the `POST /api/providers` body — there is no separate `/api/providers/{provider}/keys` endpoint.
- Replace the top-level `aliases` with `"deployments"` inside `vertex_key_config`:
```json
"vertex_key_config": {
"project_id": "env.VERTEX_PROJECT_ID",
"region": "us-central1",
"deployments": {
"my-fine-tuned-model": "123456789"
}
}
```
</Note>
</Tab>
<Tab title="config.json">
```json
{
"providers": {
"vertex": {
"keys": [
{
"name": "vertex-api-key",
"value": "env.VERTEX_API_KEY",
"models": ["gemini-pro", "gemini-2.0-flash", "my-fine-tuned-model"],
"weight": 1.0,
"aliases": {
"my-fine-tuned-model": "123456789"
},
"vertex_key_config": {
"project_id": "env.VERTEX_PROJECT_ID",
"project_number": "env.VERTEX_PROJECT_NUMBER",
"region": "us-central1"
}
}
]
}
}
}
```
<Note>
On **v1.4.x**, use `deployments` inside `vertex_key_config` instead of the
top-level `aliases` field.
</Note>
</Tab>
<Tab title="Go SDK">
```go
func (a *MyAccount) GetKeysForProvider(ctx *context.Context, provider schemas.ModelProvider) ([]schemas.Key, error) {
switch provider {
case schemas.Vertex:
return []schemas.Key{
{
Value: *schemas.NewEnvVar("env.VERTEX_API_KEY"), // only when using Gemini or fine-tuned models
Models: []string{"gemini-pro", "gemini-2.0-flash", "my-fine-tuned-model"},
Weight: 1.0,
Aliases: schemas.KeyAliases{
"my-fine-tuned-model": "123456789",
},
VertexKeyConfig: &schemas.VertexKeyConfig{
ProjectID: *schemas.NewEnvVar("env.VERTEX_PROJECT_ID"),
ProjectNumber: *schemas.NewEnvVar("env.VERTEX_PROJECT_NUMBER"), // required for fine-tuned models
Region: *schemas.NewEnvVar("us-central1"),
},
},
}, nil
}
return nil, fmt.Errorf("provider %s not supported", provider)
}
```
</Tab>
</Tabs>
<Note>
Vertex AI support for fine-tuned models is currently in beta. Requests to
non-Gemini fine-tuned models may fail, so please test and report any issues.
</Note>
**`vertex_key_config` fields:**
| Field | Required | Description |
| ------------------ | -------- | ------------------------------------------------------ |
| `project_id` | Yes | Google Cloud project ID |
| `region` | Yes | GCP region (e.g., `us-central1`, `eu-west1`, `global`) |
| `auth_credentials` | No | Service account JSON string (leave empty for ADC) |
| `project_number` | No | GCP project number (required for fine-tuned models) |
**Key-level fields:**
| Field | Required | Description |
| --------- | -------- | ----------------------------------------------------------------------------------------- |
| `value` | No | Vertex API key (Gemini and fine-tuned models only; leave empty for Service Account / ADC) |
| `aliases` | No | Map model names to fine-tuned model IDs or endpoint identifiers (v1.5.0-prerelease2+) |
| `models` | Yes | Models this key can serve; use `["*"]` to allow all |
---
## Beta Headers
For Anthropic models on Vertex AI, Bifrost validates `anthropic-beta` headers and drops unsupported headers from the request.
**Supported**: `computer-use-*`, `compact-*`, `context-management-*`, `interleaved-thinking-*`, `context-1m-*`
**Not supported**: `structured-outputs-*`, `advanced-tool-use-*`, `mcp-client-*`, `prompt-caching-scope-*`, `files-api-*`, `skills-*`, `fast-mode-*`, `redact-thinking-*`
You can override these defaults per provider via the **Beta Headers** tab in provider configuration or via [`beta_header_overrides`](/quickstart/gateway/provider-configuration#beta-header-overrides). See the full support matrix in the [Anthropic provider docs](/providers/supported-providers/anthropic#beta-headers).
<Frame>
<img
src="/media/vertex-ai-setting-anthropic-beta-headers.png"
alt="Vertex AI Beta Headers configuration tab showing supported and unsupported Anthropic beta features with override options"
/>
</Frame>
---
# 1. Chat Completions
## Request Parameters
### Core Parameter Mapping
| Parameter | Vertex Handling | Notes |
| ---------------- | ------------------------- | ---------------------------------------------------- |
| `model` | Maps to Vertex model ID | Region-specific endpoint constructed automatically |
| All other params | Model-specific conversion | Converted per underlying provider (Gemini/Anthropic) |
### Key Configuration
The key configuration for Vertex requires Google Cloud credentials:
```json
{
"vertex_key_config": {
"project_id": "my-gcp-project",
"region": "us-central1",
"auth_credentials": "{service-account-json}"
}
}
```
**Configuration Details**:
- `project_id` - GCP project ID (required)
- `region` - GCP region for API endpoints (required)
- Examples: `us-central1`, `us-west1`, `eu-west1`, `global`
- `auth_credentials` - Service account JSON credentials (optional if using default credentials)
### Authentication Methods
1. **Service Account JSON** (recommended for production)
```json
{ "auth_credentials": "{full-service-account-json}" }
```
2. **Application Default Credentials** (for local development)
- Requires `GOOGLE_APPLICATION_CREDENTIALS` environment variable
- Leave `auth_credentials` empty
## Gemini Models
When using Google's Gemini models, Bifrost converts requests to Gemini's API format.
### Parameter Mapping for Gemini
All Gemini-compatible parameters are supported. Special handling includes:
- **System prompts**: Converted to Gemini's system message format
- **Tool usage**: Mapped to Gemini's function calling format
- **Streaming**: Uses Gemini's streaming protocol
Refer to [Gemini documentation](/providers/supported-providers/gemini) for detailed conversion details.
## Anthropic Models (Claude)
When using Anthropic models through Vertex AI, Bifrost converts requests to Anthropic's message format.
### Parameter Mapping for Anthropic
All Anthropic-standard parameters are supported:
- **Reasoning/Thinking**: `reasoning` parameters converted to `thinking` structure
- **System messages**: Extracted and placed in separate `system` field
- **Tool message grouping**: Consecutive tool messages merged
- **API version**: Automatically set to `vertex-2023-10-16` for Anthropic models
Refer to [Anthropic documentation](/providers/supported-providers/anthropic) for detailed conversion details.
### Special Notes for Vertex + Anthropic
- Responses API uses special `/v1/messages` endpoint
- `anthropic_version` automatically set to `vertex-2023-10-16`
- Minimum reasoning budget: 1024 tokens
- Model field removed from request (Vertex uses different identification)
## Region Selection
The region determines the API endpoint:
| Region | Endpoint | Purpose |
| ------------- | --------------------------------------- | ------------------------- |
| `us-central1` | `us-central1-aiplatform.googleapis.com` | US Central |
| `us-west1` | `us-west1-aiplatform.googleapis.com` | US West |
| `eu-west1` | `eu-west1-aiplatform.googleapis.com` | Europe West |
| `global` | `aiplatform.googleapis.com` | Global (no region prefix) |
Availability varies by region. Check [GCP documentation](https://cloud.google.com/vertex-ai/docs/general/locations) for model availability.
## Streaming
Streaming format depends on model type:
- **Gemini models**: Standard Gemini streaming with server-sent events
- **Anthropic models**: Anthropic message streaming format
---
# 2. Responses API
The Responses API is available for both Anthropic (Claude) and Gemini models on Vertex AI.
## Request Parameters
### Core Parameter Mapping
| Parameter | Vertex Handling | Notes |
| ------------------- | ---------------------------- | --------------------------------- |
| `instructions` | Becomes system message | Model-specific conversion |
| `input` | Converted to messages | String or array support |
| `max_output_tokens` | Model-specific field mapping | Gemini vs Anthropic conversion |
| All other params | Model-specific conversion | Converted per underlying provider |
### Gemini Models
For Gemini models, conversion follows Gemini's Responses API format.
### Anthropic Models (Claude)
For Anthropic models, conversion follows Anthropic's message format:
- `instructions` becomes system message
- `reasoning` mapped to `thinking` structure
### Configuration
<Tabs>
<Tab title="Gateway">
```bash
curl -X POST http://localhost:8080/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "vertex/claude-3-5-sonnet",
"input": "What is AI?",
"instructions": "You are a helpful assistant",
"project_id": "my-gcp-project",
"region": "us-central1"
}' \
-H "X-Goog-Authorization: Bearer {token}"
```
</Tab>
<Tab title="Go SDK">
```go
resp, err := client.ResponsesRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostResponsesRequest{
Provider: schemas.Vertex,
Model: "claude-3-5-sonnet",
Input: messages,
Params: &schemas.ResponsesParameters{
Instructions: schemas.Ptr("You are a helpful assistant"),
},
})
```
</Tab>
</Tabs>
### Special Handling
- Endpoint: `/v1/messages` (Anthropic format)
- `anthropic_version` set to `vertex-2023-10-16` automatically
- Model and region fields removed from request
- Raw request body passthrough supported
Refer to [Anthropic Responses API](/providers/supported-providers/anthropic#2-responses-api) for parameter details.
---
# 3. Embeddings
Embeddings are supported for Gemini and other models that support embedding generation.
## Request Parameters
### Core Parameters
| Parameter | Vertex Mapping | Notes |
| ------------ | --------------------------------- | -------------------- |
| `input` | `instances[].content` | Text to embed |
| `dimensions` | `parameters.outputDimensionality` | Optional output size |
### Advanced Parameters
Use `extra_params` for embedding-specific options:
<Tabs>
<Tab title="Gateway">
```bash
curl -X POST http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-004",
"input": ["text to embed"],
"dimensions": 256,
"task_type": "RETRIEVAL_DOCUMENT",
"title": "Document title",
"project_id": "my-gcp-project",
"region": "us-central1",
"autoTruncate": true
}'
```
</Tab>
<Tab title="Go SDK">
```go
resp, err := client.EmbeddingRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostEmbeddingRequest{
Provider: schemas.Vertex,
Model: "text-embedding-004",
Input: &schemas.EmbeddingInput{
Texts: []string{"text to embed"},
},
Params: &schemas.EmbeddingParameters{
Dimensions: schemas.Ptr(256),
ExtraParams: map[string]interface{}{
"task_type": "RETRIEVAL_DOCUMENT",
"title": "Document title",
"autoTruncate": true,
},
},
})
```
</Tab>
</Tabs>
#### Embedding Parameters
| Parameter | Type | Description |
| -------------- | ------- | ------------------------------------------------------------------------------------------------------------------------- |
| `task_type` | string | Task type hint: `RETRIEVAL_QUERY`, `RETRIEVAL_DOCUMENT`, `SEMANTIC_SIMILARITY`, `CLASSIFICATION`, `CLUSTERING` (optional) |
| `title` | string | Optional title to help model produce better embeddings (used with task_type) |
| `autoTruncate` | boolean | Auto-truncate input to max tokens (defaults to true) |
### Task Type Effects
Different task types optimize embeddings for specific use cases:
- `RETRIEVAL_DOCUMENT` - Optimized for documents in retrieval systems
- `RETRIEVAL_QUERY` - Optimized for queries searching documents
- `SEMANTIC_SIMILARITY` - Optimized for semantic similarity tasks
- `CLASSIFICATION` - For classification tasks
- `CLUSTERING` - For clustering tasks
## Response Conversion
Embeddings response includes vectors and truncation information:
```json
{
"embeddings": [
{
"values": [0.1234, -0.5678, ...],
"statistics": {
"token_count": 15,
"truncated": false
}
}
]
}
```
**Response Fields**:
- `values` - Embedding vector as floats
- `statistics.token_count` - Input token count
- `statistics.truncated` - Whether input was truncated due to length
---
# 4. Image Generation
Image Generation is supported for Gemini and Imagen on Vertex AI. The provider automatically routes to the appropriate format based on the model type.
## Request Parameters
### Core Parameter Mapping
| Parameter | Vertex Handling | Notes |
| ---------------- | ------------------------------------- | ------------------------------------------------- |
| `model` | Mapped to deployment/model identifier | Model type detected automatically |
| `prompt` | Model-specific conversion | Converted per underlying provider (Gemini/Imagen) |
| All other params | Model-specific conversion | Converted per underlying provider |
### Model Type Detection
Vertex automatically detects the model type and uses the appropriate conversion:
1. **Gemini Models**: Uses Gemini format (same as [Gemini Image Generation](/providers/supported-providers/gemini#8-image-generation))
2. **Imagen Models**: Uses Imagen format (detected via `IsImagenModel()`)
### Configuration
<Tabs>
<Tab title="Gateway">
```bash
curl -X POST http://localhost:8080/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"model": "vertex/imagen-4.0-generate-001",
"prompt": "A sunset over the mountains",
"size": "1024x1024",
"n": 2,
"project_id": "my-gcp-project",
"region": "us-central1"
}' \
-H "X-Goog-Authorization: Bearer {token}"
```
</Tab>
<Tab title="Go SDK">
```go
resp, err := client.ImageGenerationRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostImageGenerationRequest{
Provider: schemas.Vertex,
Model: "imagen-4.0-generate-001",
Input: &schemas.ImageGenerationInput{
Prompt: "A sunset over the mountains",
},
Params: &schemas.ImageGenerationParameters{
Size: schemas.Ptr("1024x1024"),
N: schemas.Ptr(2),
},
})
```
</Tab>
</Tabs>
## Request Conversion
Vertex converts requests based on model type:
- **Gemini Models**: Uses `gemini.ToGeminiImageGenerationRequest()` - same conversion as standard Gemini (see [Gemini Image Generation](/providers/supported-providers/gemini#8-image-generation))
- **Imagen Models**: Uses `gemini.ToImagenImageGenerationRequest()` - Imagen-specific format with size/aspect ratio conversion
All request bodies are converted to `map[string]interface{}` and the `region` field is removed before sending to Vertex API.
## Response Conversion
- **Gemini Models**: Responses converted using `GenerateContentResponse.ToBifrostImageGenerationResponse()` - same as standard Gemini
- **Imagen Models**: Responses converted using `GeminiImagenResponse.ToBifrostImageGenerationResponse()` - Imagen-specific format
## Endpoint Selection
The provider automatically selects the endpoint based on model type:
- **Fine-tuned models**: `/v1beta1/projects/{projectNumber}/locations/{region}/endpoints/{deployment}:generateContent`
- **Imagen models**: `/v1/projects/{projectID}/locations/{region}/publishers/google/models/{model}:predict`
- **Gemini models**: `/v1/projects/{projectID}/locations/{region}/publishers/google/models/{model}:generateContent`
## Streaming
Image generation streaming is not supported by Vertex AI.
---
# 5. Image Edit
<Warning>Requests use **multipart/form-data**, not JSON.</Warning>
Image Edit is supported for Gemini and Imagen models on Vertex AI. The provider automatically routes to the appropriate format based on the model type.
**Request Parameters**
| Parameter | Type | Required | Notes |
| -------------------- | ------ | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `model` | string | ✅ | Model identifier (must be Gemini or Imagen model) |
| `prompt` | string | ✅ | Text description of the edit |
| `image[]` | binary | ✅ | Image file(s) to edit (supports multiple images) |
| `mask` | binary | ❌ | Mask image file |
| `type` | string | ❌ | Edit type: `"inpainting"`, `"outpainting"`, `"inpaint_removal"`, `"bgswap"` (Imagen only) |
| `n` | int | ❌ | Number of images to generate (1-10) |
| `output_format` | string | ❌ | Output format: `"png"`, `"webp"`, `"jpeg"` |
| `output_compression` | int | ❌ | Compression level (0-100%) |
| `seed` | int | ❌ | Seed for reproducibility (via `ExtraParams["seed"]`) |
| `negative_prompt` | string | ❌ | Negative prompt (via `ExtraParams["negativePrompt"]`) |
| `maskMode` | string | ❌ | Mask mode (via `ExtraParams["maskMode"]`, Imagen only): `"MASK_MODE_USER_PROVIDED"`, `"MASK_MODE_BACKGROUND"`, `"MASK_MODE_FOREGROUND"`, `"MASK_MODE_SEMANTIC"` |
| `dilation` | float | ❌ | Mask dilation (via `ExtraParams["dilation"]`, Imagen only): Range [0, 1] |
| `maskClasses` | int[] | ❌ | Mask classes (via `ExtraParams["maskClasses"]`, Imagen only): For `MASK_MODE_SEMANTIC` |
---
**Request Conversion**
Vertex uses the same conversion functions as Gemini:
1. **Gemini Models**: Uses `gemini.ToGeminiImageEditRequest()` - same conversion as standard Gemini (see [Gemini Image Edit](/providers/supported-providers/gemini#9-image-edit))
2. **Imagen Models**: Uses `gemini.ToImagenImageEditRequest()` - Imagen-specific format with edit mode mapping and mask configuration (see [Gemini Image Edit](/providers/supported-providers/gemini#9-image-edit))
**Model Validation**: Only Gemini and Imagen models are supported. Other models return `ConfigurationError`.
**Request Body Processing**:
- All request bodies are converted to `map[string]interface{}` for Vertex API compatibility
- The `region` field is removed before sending to Vertex API
- For Gemini models, unsupported fields are stripped via `stripVertexGeminiUnsupportedFields()` (removes `id` from function_call and function_response)
**Response Conversion**
- **Gemini Models**: Responses converted using `GenerateContentResponse.ToBifrostImageGenerationResponse()` - same as standard Gemini
- **Imagen Models**: Responses converted using `GeminiImagenResponse.ToBifrostImageGenerationResponse()` - Imagen-specific format
**Endpoint Selection**
The provider automatically selects the endpoint based on model type:
- **Gemini models**: `/v1/projects/{projectID}/locations/{region}/publishers/google/models/{model}:generateContent`
- **Imagen models**: `/v1/projects/{projectID}/locations/{region}/publishers/google/models/{model}:predict`
**Streaming**
Image edit streaming is not supported by Vertex AI.
**Image Variation**
Image variation is not supported by Vertex AI.
---
# 6. List Models
## Request Parameters
None required. Automatically uses project_id and region from key config.
## Response Conversion
Lists models available in the specified project and region with metadata and deployment information:
```json
{
"models": [
{
"name": "projects/{project}/locations/{region}/models/gemini-2.0-flash",
"display_name": "Gemini 2.0 Flash",
"description": "Fast multimodal model",
"version_id": "1",
"version_aliases": ["latest", "stable"],
"capabilities": [...],
"deployed_models": [...]
}
],
"next_page_token": "..."
}
```
## Custom vs Non-Custom Models
<Warning>
**Important**: Vertex AI's List Models API **only returns custom fine-tuned
models** that have been deployed to your project. It does NOT return standard
foundation models (Gemini, Claude, etc.).
</Warning>
To provide a complete model listing experience, Bifrost performs **multi-pass model discovery**:
### Three-Pass Model Discovery
1. **First Pass - Custom Models from API Response**
- Queries Vertex AI's List Models API
- Returns only custom fine-tuned models deployed to your project
- Custom models are identified by having deployment values that contain only digits
- Example: `"deployment": "1234567890"`
2. **Second Pass - Non-Custom Models from Aliases**
- Adds standard foundation models from your `aliases` configuration
- Non-custom models have alphanumeric deployment values (e.g., `gemini-pro`, `claude-3-5-sonnet`)
- Filters by the key-level `models` allowlist, if specified
- Example: `"deployment": "gemini-2.0-flash"`
3. **Third Pass - Allowed Models Not in Aliases**
- Adds models specified in `models` that weren't in the `aliases` map
- Ensures all explicitly allowed models appear in the list
- Uses the model name itself as the deployment value
- Skips digit-only model IDs (reserved for custom models)
### Model Filtering Logic
- **If `models` is empty and no aliases are configured**: No models are returned
- **If `models` is empty but aliases are configured**: Only aliased models are returned
- **If `models` is `["*"]`**: All models from all three passes are included (unrestricted)
- **If `models` is non-empty**: Only models/aliases whose request names appear in `models` are included
- **Duplicate Prevention**: Each model ID is tracked to prevent duplicates across passes
### Model Name Formatting
Non-custom models from aliases and allowed models are automatically formatted for display:
- `gemini-pro` → "Gemini Pro"
- `claude-3-5-sonnet` → "Claude 3 5 Sonnet"
- `gemini_2_flash` → "Gemini 2 Flash"
Formatting uses title case and converts hyphens/underscores to spaces.
### Example Configuration
<Tabs>
<Tab title="With Custom Models Only">
```json
{
"aliases": {
"my-gemini-ft": "1234567890",
"my-claude-ft": "9876543210"
},
"vertex_key_config": {
"project_id": "my-project",
"region": "us-central1"
}
}
```
This returns only your custom fine-tuned models from the API.
</Tab>
<Tab title="With Foundation Models">
```json
{
"aliases": {
"gemini-2.0-flash": "gemini-2.0-flash",
"claude-3-5-sonnet": "claude-3-5-sonnet-v2@20241022"
},
"vertex_key_config": {
"project_id": "my-project",
"region": "us-central1"
}
}
```
This returns both custom models AND foundation models from aliases.
</Tab>
<Tab title="With Allowed Models Filter">
```json
{
"models": ["gemini-2.0-flash", "claude-3-5-sonnet"],
"aliases": {
"gemini-2.0-flash": "gemini-2.0-flash",
"claude-3-5-sonnet": "claude-3-5-sonnet-v2@20241022",
"gemini-1.5-pro": "gemini-1.5-pro"
},
"vertex_key_config": {
"project_id": "my-project",
"region": "us-central1"
}
}
```
Only returns `gemini-2.0-flash` and `claude-3-5-sonnet`, excluding `gemini-1.5-pro`.
</Tab>
</Tabs>
### Pagination
Model listing is paginated automatically. If more than 100 models exist, `next_page_token` will be present. Bifrost handles pagination internally.
---
## Caveats
<Accordion title="Project ID and Region Required">
**Severity**: High **Behavior**: Both project_id and region required for all
operations **Impact**: Request fails without valid GCP project/region
configuration **Code**: `vertex.go:127-138`
</Accordion>
<Accordion title="OAuth2 Token Management">
**Severity**: Medium **Behavior**: Tokens cached and automatically refreshed
when expired **Impact**: First request slightly slower due to auth; cached for
subsequent requests **Code**: `vertex.go:34-55`
</Accordion>
<Accordion title="Anthropic Model Detection">
**Severity**: Medium **Behavior**: Automatic detection of Anthropic vs Gemini
models **Impact**: Different conversion logic applied transparently **Code**:
`vertex.go` chat/responses endpoints
</Accordion>
<Accordion title="Model-Specific Responses API Handling">
**Severity**: Low **Behavior**: Responses API automatically routes to
Anthropic or Gemini implementation based on model **Impact**: Different
conversion logic applied transparently per model **Code**:
`vertex.go:836-1080`
</Accordion>
<Accordion title="Anthropic Version Lock">
**Severity**: Low **Behavior**: `anthropic_version` always set to
`vertex-2023-10-16` for Claude **Impact**: Cannot override Anthropic version
for Claude on Vertex **Code**: `utils.go:33, 71`
</Accordion>
<Accordion title="Embeddings Precision Preservation">
**Severity**: Low **Behavior**: Vertex returns float64 embeddings, and Bifrost
preserves that precision in normalized embedding responses **Impact**: No
precision loss in the `/v1/embeddings` response path **Code**:
`embedding.go:84-91`
</Accordion>
<Accordion title="List Models API Returns Only Custom Models">
**Severity**: High **Behavior**: Vertex AI's List Models API only returns
custom fine-tuned models, NOT foundation models **Impact**: Bifrost performs
three-pass discovery to include foundation models from aliases and the
key-level `models` allowlist **Why**: This is a Vertex AI API limitation -
foundation models must be explicitly configured **Code**: `models.go:76-217`
</Accordion>
---
## Configuration
**HTTP Settings**: OAuth2 authentication with automatic token refresh | Region-specific endpoints | Max Connections 5000 | Max Idle 60 seconds
**Scope**: `https://www.googleapis.com/auth/cloud-platform`
**Endpoint Format**: `https://{region}-aiplatform.googleapis.com/v1/projects/{project}/locations/{region}/{resource}`
**Note**: For `global` region, endpoint is `https://aiplatform.googleapis.com/v1/projects/{project}/locations/global/{resource}`
## Video Generation
Vertex AI routes video generation through Gemini's Veo models using the `predictLongRunning` endpoint. All parameters are identical to [Gemini Video Generation](/providers/supported-providers/gemini#video-generation).
<Note>
Only Veo models are supported (e.g., `veo-2.0-generate-001`). Passing a
non-Veo model name returns a configuration error.
</Note>
**Supported Operations**
| Operation | Supported | Notes |
| --------- | --------- | ----------------------------- |
| Generate | ✅ | `POST /v1/videos` |
| Retrieve | ✅ | `GET /v1/videos/{id}` |
| Download | ✅ | `GET /v1/videos/{id}/content` |
| Delete | ❌ | Not supported |
| List | ❌ | Not supported |
| Remix | ❌ | Not supported |