1119 lines
40 KiB
Plaintext
1119 lines
40 KiB
Plaintext
---
|
|
title: "Vertex AI"
|
|
description: "Google Vertex AI API conversion guide - multi-model support, OAuth2 authentication, project/region configuration"
|
|
icon: "v"
|
|
---
|
|
|
|
## Overview
|
|
|
|
Vertex AI is Google's unified ML platform providing access to Google's Gemini models, Anthropic Claude models, and other third-party LLMs through a single API. Bifrost performs conversions including:
|
|
|
|
- **Multi-model support** - Unified interface for Gemini, Anthropic, and third-party models
|
|
- **OAuth2 authentication** - Service account credentials with automatic token refresh
|
|
- **Project and region management** - Automatic endpoint construction from GCP project/region
|
|
- **Model routing** - Automatic provider detection (Gemini vs Anthropic) based on model name
|
|
- **Request conversion** - Conversion to underlying provider format (Gemini or Anthropic)
|
|
- **Embeddings support** - Vector generation with task type and truncation options
|
|
- **Model discovery** - Paginated model listing with deployment information
|
|
|
|
### Supported Operations
|
|
|
|
| Operation | Non-Streaming | Streaming | Endpoint |
|
|
| -------------------- | ------------- | --------- | ----------------------------------------- |
|
|
| Chat Completions | ✅ | ✅ | `/generate` |
|
|
| Responses API | ✅ | ✅ | `/messages` |
|
|
| Embeddings | ✅ | - | `/embeddings` |
|
|
| Image Generation | ✅ | - | `/generateContent` or `/predict` (Imagen) |
|
|
| Image Edit | ✅ | - | `/generateContent` or `/predict` (Imagen) |
|
|
| Video Generation | ✅ | - | `/predictLongRunning` (Veo models only) |
|
|
| Image Variation | ❌ | - | Not supported |
|
|
| List Models | ✅ | - | `/models` |
|
|
| Text Completions | ❌ | ❌ | - |
|
|
| Speech (TTS) | ❌ | ❌ | - |
|
|
| Transcriptions (STT) | ❌ | ❌ | - |
|
|
| Files | ❌ | ❌ | - |
|
|
| Batch | ❌ | ❌ | - |
|
|
|
|
<Note>
|
|
**Unsupported Operations** (❌): Text Completions, Speech, Transcriptions, Files, and Batch are not supported by Vertex AI. These return `UnsupportedOperationError`.
|
|
|
|
**Vertex-specific**: Endpoints vary by model type. Responses API available for both Gemini and Anthropic models.
|
|
|
|
</Note>
|
|
|
|
---
|
|
|
|
## Setup & Configuration
|
|
|
|
Vertex AI requires Google Cloud project configuration and authentication credentials. Three authentication methods are supported.
|
|
|
|
<Note>
|
|
The `aliases` field (mapping model names to fine-tuned model IDs or endpoint
|
|
identifiers) requires **v1.5.0-prerelease2 or later**. On v1.4.x, use
|
|
`deployments` inside `vertex_key_config` instead — see the [v1.5.0 Migration
|
|
Guide](/migration-guides/v1.5.0#breaking-change-9-provider-deployments-removed-migrate-to-aliases)
|
|
for details.
|
|
</Note>
|
|
|
|
### 1. Service Account JSON (Recommended for Production)
|
|
|
|
Provide a credential JSON string in `auth_credentials`. The JSON must contain a `type` field. Supported types: `service_account` (most common), `impersonated_service_account`, `authorized_user`, `external_account`, `external_account_authorized_user`.
|
|
|
|
<Tabs>
|
|
|
|
<Tab title="Web UI">
|
|
|
|
<Frame>
|
|
<img
|
|
src="/media/ui-vertex-service-account-auth-setup.png"
|
|
alt="Google Vertex AI Service Account (JSON) authentication setup in the Bifrost Web UI showing Project ID, Region, and Auth Credentials fields"
|
|
/>
|
|
</Frame>
|
|
|
|
1. Navigate to **"Model Providers"** → **"Configurations"** → **"Google Vertex"**
|
|
2. Click **"Add Key"** (or edit an existing key)
|
|
3. Under **Authentication Method**, select **"Service Account (JSON)"**
|
|
4. Set **Project ID**: Your Google Cloud project ID
|
|
5. Set **Project Number** (Required only for fine-tuned models): Your GCP project number; leave blank for standard models
|
|
6. Set **Region**: e.g., `us-central1`
|
|
7. Set **Auth Credentials**: Paste your service account JSON or reference an env var (e.g., `env.VERTEX_CREDENTIALS`)
|
|
8. Configure **Aliases**: Map model names to fine-tuned model IDs (if using fine-tuned models)
|
|
9. Save
|
|
|
|
</Tab>
|
|
|
|
<Tab title="API">
|
|
|
|
```bash
|
|
# Step 1: Create the provider
|
|
curl -X POST http://localhost:8080/api/providers \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"provider": "vertex"}'
|
|
|
|
# Step 2: Create a key (Service Account JSON)
|
|
curl -X POST http://localhost:8080/api/providers/vertex/keys \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"name": "vertex-sa-key",
|
|
"value": "",
|
|
"models": ["*"],
|
|
"weight": 1.0,
|
|
"vertex_key_config": {
|
|
"project_id": "env.VERTEX_PROJECT_ID",
|
|
"region": "us-central1",
|
|
"auth_credentials": "env.VERTEX_CREDENTIALS"
|
|
}
|
|
}'
|
|
```
|
|
|
|
<Note>
|
|
**On v1.4.x**, two differences apply: - Pass `keys` directly in the `POST
|
|
/api/providers` body — there is no separate `/api/providers/{provider}/keys`
|
|
endpoint. - Use `deployments` inside `vertex_key_config` instead of the
|
|
top-level `aliases` field for fine-tuned model mappings.
|
|
</Note>
|
|
|
|
</Tab>
|
|
|
|
<Tab title="config.json">
|
|
|
|
```json
|
|
{
|
|
"providers": {
|
|
"vertex": {
|
|
"keys": [
|
|
{
|
|
"name": "vertex-sa-key",
|
|
"value": "",
|
|
"models": ["*"],
|
|
"weight": 1.0,
|
|
"vertex_key_config": {
|
|
"project_id": "env.VERTEX_PROJECT_ID",
|
|
"region": "us-central1",
|
|
"auth_credentials": "env.VERTEX_CREDENTIALS"
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
<Note>
|
|
On **v1.4.x**, use `deployments` inside `vertex_key_config` instead of the
|
|
top-level `aliases` field for fine-tuned model mappings.
|
|
</Note>
|
|
|
|
</Tab>
|
|
|
|
<Tab title="Go SDK">
|
|
|
|
```go
|
|
func (a *MyAccount) GetKeysForProvider(ctx *context.Context, provider schemas.ModelProvider) ([]schemas.Key, error) {
|
|
switch provider {
|
|
case schemas.Vertex:
|
|
return []schemas.Key{
|
|
{
|
|
Value: schemas.EnvVar{}, // Leave empty when using service account credentials
|
|
Models: []string{"*"},
|
|
Weight: 1.0,
|
|
VertexKeyConfig: &schemas.VertexKeyConfig{
|
|
ProjectID: *schemas.NewEnvVar("env.VERTEX_PROJECT_ID"),
|
|
Region: *schemas.NewEnvVar("us-central1"),
|
|
AuthCredentials: *schemas.NewEnvVar("env.VERTEX_CREDENTIALS"), // full service account JSON
|
|
},
|
|
},
|
|
}, nil
|
|
}
|
|
return nil, fmt.Errorf("provider %s not supported", provider)
|
|
}
|
|
```
|
|
|
|
</Tab>
|
|
|
|
</Tabs>
|
|
|
|
### 2. Application Default Credentials
|
|
|
|
Leave `auth_credentials` empty. Bifrost calls `google.FindDefaultCredentials()` — Google's ADC library — which resolves credentials in this order:
|
|
|
|
1. `GOOGLE_APPLICATION_CREDENTIALS` env var (path to a JSON credential file)
|
|
2. Application default credential file (`~/.config/gcloud/application_default_credentials.json`, written by `gcloud auth application-default login`)
|
|
3. GCE/GKE/Cloud Run/App Engine metadata server (attached service account or Workload Identity)
|
|
|
|
<Tabs>
|
|
|
|
<Tab title="Web UI">
|
|
|
|
<Frame>
|
|
<img
|
|
src="/media/ui-vertex-default-service-account-auth-setup.png"
|
|
alt="Google Vertex AI Application Default Credentials setup in the Bifrost Web UI showing Project ID and Region fields with no credential inputs"
|
|
/>
|
|
</Frame>
|
|
|
|
1. Navigate to **"Model Providers"** → **"Configurations"** → **"Google Vertex"**
|
|
2. Click **"Add Key"** (or edit an existing key)
|
|
3. Under **Authentication Method**, select **"Service Account (Attached)"**
|
|
4. Set **Project ID**: Your Google Cloud project ID
|
|
5. Set **Project Number** (Required only for fine-tuned models): Your GCP project number; leave blank for standard models
|
|
6. Set **Region**: e.g., `us-central1`
|
|
7. Configure **Aliases** if needed
|
|
8. Save
|
|
|
|
Ensure `GOOGLE_APPLICATION_CREDENTIALS` is set in your environment, or that Workload Identity / gcloud is configured.
|
|
|
|
</Tab>
|
|
|
|
<Tab title="API">
|
|
|
|
```bash
|
|
# Step 1: Create the provider
|
|
curl -X POST http://localhost:8080/api/providers \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"provider": "vertex"}'
|
|
|
|
# Step 2: Create a key (Application Default Credentials)
|
|
curl -X POST http://localhost:8080/api/providers/vertex/keys \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"name": "vertex-adc-key",
|
|
"value": "",
|
|
"models": ["*"],
|
|
"weight": 1.0,
|
|
"vertex_key_config": {
|
|
"project_id": "env.VERTEX_PROJECT_ID",
|
|
"region": "us-central1",
|
|
"auth_credentials": ""
|
|
}
|
|
}'
|
|
```
|
|
|
|
<Note>
|
|
**On v1.4.x**, pass `keys` directly in the `POST /api/providers` body — there
|
|
is no separate `/api/providers/{provider}/keys` endpoint.
|
|
</Note>
|
|
|
|
</Tab>
|
|
|
|
<Tab title="config.json">
|
|
|
|
```json
|
|
{
|
|
"providers": {
|
|
"vertex": {
|
|
"keys": [
|
|
{
|
|
"name": "vertex-adc-key",
|
|
"value": "",
|
|
"models": ["*"],
|
|
"weight": 1.0,
|
|
"vertex_key_config": {
|
|
"project_id": "env.VERTEX_PROJECT_ID",
|
|
"region": "us-central1",
|
|
"auth_credentials": ""
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
</Tab>
|
|
|
|
<Tab title="Go SDK">
|
|
|
|
```go
|
|
func (a *MyAccount) GetKeysForProvider(ctx *context.Context, provider schemas.ModelProvider) ([]schemas.Key, error) {
|
|
switch provider {
|
|
case schemas.Vertex:
|
|
return []schemas.Key{
|
|
{
|
|
Value: schemas.EnvVar{},
|
|
Models: []string{"*"},
|
|
Weight: 1.0,
|
|
VertexKeyConfig: &schemas.VertexKeyConfig{
|
|
ProjectID: *schemas.NewEnvVar("env.VERTEX_PROJECT_ID"),
|
|
Region: *schemas.NewEnvVar("us-central1"),
|
|
// Leave AuthCredentials empty — uses Application Default Credentials
|
|
},
|
|
},
|
|
}, nil
|
|
}
|
|
return nil, fmt.Errorf("provider %s not supported", provider)
|
|
}
|
|
```
|
|
|
|
</Tab>
|
|
|
|
</Tabs>
|
|
|
|
### 3. API Key (Gemini and Fine-Tuned Models Only)
|
|
|
|
Set `value` to your Vertex API key. API key authentication is supported only for Gemini models and fine-tuned Gemini models. For Anthropic models on Vertex, use Service Account or Application Default Credentials.
|
|
|
|
<Tabs>
|
|
|
|
<Tab title="Web UI">
|
|
|
|
<Frame>
|
|
<img
|
|
src="/media/ui-vertex-api-key-auth-setup.png"
|
|
alt="Google Vertex AI API Key authentication setup in the Bifrost Web UI showing API Key, Project ID, Region, and Project Number fields"
|
|
/>
|
|
</Frame>
|
|
|
|
1. Navigate to **"Model Providers"** → **"Configurations"** → **"Google Vertex"**
|
|
2. Click **"Add Key"** (or edit an existing key)
|
|
3. Under **Authentication Method**, select **"API Key"**
|
|
4. Set **API Key**: Your Vertex AI API key
|
|
5. Set **Project ID**: Your Google Cloud project ID
|
|
6. Set **Project Number** (Required only for fine-tuned models): Your GCP project number; leave blank for standard models
|
|
7. Set **Region**: e.g., `us-central1`
|
|
8. Configure **Aliases**: Map short names to fine-tuned model IDs (e.g., `my-model` → `123456789`)
|
|
9. Save
|
|
|
|
</Tab>
|
|
|
|
<Tab title="API">
|
|
|
|
```bash
|
|
# Step 1: Create the provider
|
|
curl -X POST http://localhost:8080/api/providers \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"provider": "vertex"}'
|
|
|
|
# Step 2: Create a key (API Key — Gemini + fine-tuned models)
|
|
curl -X POST http://localhost:8080/api/providers/vertex/keys \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"name": "vertex-api-key",
|
|
"value": "env.VERTEX_API_KEY",
|
|
"models": ["gemini-pro", "gemini-2.0-flash", "my-fine-tuned-model"],
|
|
"weight": 1.0,
|
|
"aliases": {
|
|
"my-fine-tuned-model": "123456789"
|
|
},
|
|
"vertex_key_config": {
|
|
"project_id": "env.VERTEX_PROJECT_ID",
|
|
"project_number": "env.VERTEX_PROJECT_NUMBER",
|
|
"region": "us-central1"
|
|
}
|
|
}'
|
|
```
|
|
|
|
<Note>
|
|
**On v1.4.x**, two differences apply:
|
|
- Pass `keys` directly in the `POST /api/providers` body — there is no separate `/api/providers/{provider}/keys` endpoint.
|
|
- Replace the top-level `aliases` with `"deployments"` inside `vertex_key_config`:
|
|
```json
|
|
"vertex_key_config": {
|
|
"project_id": "env.VERTEX_PROJECT_ID",
|
|
"region": "us-central1",
|
|
"deployments": {
|
|
"my-fine-tuned-model": "123456789"
|
|
}
|
|
}
|
|
```
|
|
</Note>
|
|
|
|
</Tab>
|
|
|
|
<Tab title="config.json">
|
|
|
|
```json
|
|
{
|
|
"providers": {
|
|
"vertex": {
|
|
"keys": [
|
|
{
|
|
"name": "vertex-api-key",
|
|
"value": "env.VERTEX_API_KEY",
|
|
"models": ["gemini-pro", "gemini-2.0-flash", "my-fine-tuned-model"],
|
|
"weight": 1.0,
|
|
"aliases": {
|
|
"my-fine-tuned-model": "123456789"
|
|
},
|
|
"vertex_key_config": {
|
|
"project_id": "env.VERTEX_PROJECT_ID",
|
|
"project_number": "env.VERTEX_PROJECT_NUMBER",
|
|
"region": "us-central1"
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
<Note>
|
|
On **v1.4.x**, use `deployments` inside `vertex_key_config` instead of the
|
|
top-level `aliases` field.
|
|
</Note>
|
|
|
|
</Tab>
|
|
|
|
<Tab title="Go SDK">
|
|
|
|
```go
|
|
func (a *MyAccount) GetKeysForProvider(ctx *context.Context, provider schemas.ModelProvider) ([]schemas.Key, error) {
|
|
switch provider {
|
|
case schemas.Vertex:
|
|
return []schemas.Key{
|
|
{
|
|
Value: *schemas.NewEnvVar("env.VERTEX_API_KEY"), // only when using Gemini or fine-tuned models
|
|
Models: []string{"gemini-pro", "gemini-2.0-flash", "my-fine-tuned-model"},
|
|
Weight: 1.0,
|
|
Aliases: schemas.KeyAliases{
|
|
"my-fine-tuned-model": "123456789",
|
|
},
|
|
VertexKeyConfig: &schemas.VertexKeyConfig{
|
|
ProjectID: *schemas.NewEnvVar("env.VERTEX_PROJECT_ID"),
|
|
ProjectNumber: *schemas.NewEnvVar("env.VERTEX_PROJECT_NUMBER"), // required for fine-tuned models
|
|
Region: *schemas.NewEnvVar("us-central1"),
|
|
},
|
|
},
|
|
}, nil
|
|
}
|
|
return nil, fmt.Errorf("provider %s not supported", provider)
|
|
}
|
|
```
|
|
|
|
</Tab>
|
|
|
|
</Tabs>
|
|
|
|
<Note>
|
|
Vertex AI support for fine-tuned models is currently in beta. Requests to
|
|
non-Gemini fine-tuned models may fail, so please test and report any issues.
|
|
</Note>
|
|
|
|
**`vertex_key_config` fields:**
|
|
|
|
| Field | Required | Description |
|
|
| ------------------ | -------- | ------------------------------------------------------ |
|
|
| `project_id` | Yes | Google Cloud project ID |
|
|
| `region` | Yes | GCP region (e.g., `us-central1`, `eu-west1`, `global`) |
|
|
| `auth_credentials` | No | Service account JSON string (leave empty for ADC) |
|
|
| `project_number` | No | GCP project number (required for fine-tuned models) |
|
|
|
|
**Key-level fields:**
|
|
|
|
| Field | Required | Description |
|
|
| --------- | -------- | ----------------------------------------------------------------------------------------- |
|
|
| `value` | No | Vertex API key (Gemini and fine-tuned models only; leave empty for Service Account / ADC) |
|
|
| `aliases` | No | Map model names to fine-tuned model IDs or endpoint identifiers (v1.5.0-prerelease2+) |
|
|
| `models` | Yes | Models this key can serve; use `["*"]` to allow all |
|
|
|
|
---
|
|
|
|
## Beta Headers
|
|
|
|
For Anthropic models on Vertex AI, Bifrost validates `anthropic-beta` headers and drops unsupported headers from the request.
|
|
|
|
**Supported**: `computer-use-*`, `compact-*`, `context-management-*`, `interleaved-thinking-*`, `context-1m-*`
|
|
|
|
**Not supported**: `structured-outputs-*`, `advanced-tool-use-*`, `mcp-client-*`, `prompt-caching-scope-*`, `files-api-*`, `skills-*`, `fast-mode-*`, `redact-thinking-*`
|
|
|
|
You can override these defaults per provider via the **Beta Headers** tab in provider configuration or via [`beta_header_overrides`](/quickstart/gateway/provider-configuration#beta-header-overrides). See the full support matrix in the [Anthropic provider docs](/providers/supported-providers/anthropic#beta-headers).
|
|
|
|
<Frame>
|
|
<img
|
|
src="/media/vertex-ai-setting-anthropic-beta-headers.png"
|
|
alt="Vertex AI Beta Headers configuration tab showing supported and unsupported Anthropic beta features with override options"
|
|
/>
|
|
</Frame>
|
|
|
|
---
|
|
|
|
# 1. Chat Completions
|
|
|
|
## Request Parameters
|
|
|
|
### Core Parameter Mapping
|
|
|
|
| Parameter | Vertex Handling | Notes |
|
|
| ---------------- | ------------------------- | ---------------------------------------------------- |
|
|
| `model` | Maps to Vertex model ID | Region-specific endpoint constructed automatically |
|
|
| All other params | Model-specific conversion | Converted per underlying provider (Gemini/Anthropic) |
|
|
|
|
### Key Configuration
|
|
|
|
The key configuration for Vertex requires Google Cloud credentials:
|
|
|
|
```json
|
|
{
|
|
"vertex_key_config": {
|
|
"project_id": "my-gcp-project",
|
|
"region": "us-central1",
|
|
"auth_credentials": "{service-account-json}"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Configuration Details**:
|
|
|
|
- `project_id` - GCP project ID (required)
|
|
- `region` - GCP region for API endpoints (required)
|
|
- Examples: `us-central1`, `us-west1`, `eu-west1`, `global`
|
|
- `auth_credentials` - Service account JSON credentials (optional if using default credentials)
|
|
|
|
### Authentication Methods
|
|
|
|
1. **Service Account JSON** (recommended for production)
|
|
|
|
```json
|
|
{ "auth_credentials": "{full-service-account-json}" }
|
|
```
|
|
|
|
2. **Application Default Credentials** (for local development)
|
|
- Requires `GOOGLE_APPLICATION_CREDENTIALS` environment variable
|
|
- Leave `auth_credentials` empty
|
|
|
|
## Gemini Models
|
|
|
|
When using Google's Gemini models, Bifrost converts requests to Gemini's API format.
|
|
|
|
### Parameter Mapping for Gemini
|
|
|
|
All Gemini-compatible parameters are supported. Special handling includes:
|
|
|
|
- **System prompts**: Converted to Gemini's system message format
|
|
- **Tool usage**: Mapped to Gemini's function calling format
|
|
- **Streaming**: Uses Gemini's streaming protocol
|
|
|
|
Refer to [Gemini documentation](/providers/supported-providers/gemini) for detailed conversion details.
|
|
|
|
## Anthropic Models (Claude)
|
|
|
|
When using Anthropic models through Vertex AI, Bifrost converts requests to Anthropic's message format.
|
|
|
|
### Parameter Mapping for Anthropic
|
|
|
|
All Anthropic-standard parameters are supported:
|
|
|
|
- **Reasoning/Thinking**: `reasoning` parameters converted to `thinking` structure
|
|
- **System messages**: Extracted and placed in separate `system` field
|
|
- **Tool message grouping**: Consecutive tool messages merged
|
|
- **API version**: Automatically set to `vertex-2023-10-16` for Anthropic models
|
|
|
|
Refer to [Anthropic documentation](/providers/supported-providers/anthropic) for detailed conversion details.
|
|
|
|
### Special Notes for Vertex + Anthropic
|
|
|
|
- Responses API uses special `/v1/messages` endpoint
|
|
- `anthropic_version` automatically set to `vertex-2023-10-16`
|
|
- Minimum reasoning budget: 1024 tokens
|
|
- Model field removed from request (Vertex uses different identification)
|
|
|
|
## Region Selection
|
|
|
|
The region determines the API endpoint:
|
|
|
|
| Region | Endpoint | Purpose |
|
|
| ------------- | --------------------------------------- | ------------------------- |
|
|
| `us-central1` | `us-central1-aiplatform.googleapis.com` | US Central |
|
|
| `us-west1` | `us-west1-aiplatform.googleapis.com` | US West |
|
|
| `eu-west1` | `eu-west1-aiplatform.googleapis.com` | Europe West |
|
|
| `global` | `aiplatform.googleapis.com` | Global (no region prefix) |
|
|
|
|
Availability varies by region. Check [GCP documentation](https://cloud.google.com/vertex-ai/docs/general/locations) for model availability.
|
|
|
|
## Streaming
|
|
|
|
Streaming format depends on model type:
|
|
|
|
- **Gemini models**: Standard Gemini streaming with server-sent events
|
|
- **Anthropic models**: Anthropic message streaming format
|
|
|
|
---
|
|
|
|
# 2. Responses API
|
|
|
|
The Responses API is available for both Anthropic (Claude) and Gemini models on Vertex AI.
|
|
|
|
## Request Parameters
|
|
|
|
### Core Parameter Mapping
|
|
|
|
| Parameter | Vertex Handling | Notes |
|
|
| ------------------- | ---------------------------- | --------------------------------- |
|
|
| `instructions` | Becomes system message | Model-specific conversion |
|
|
| `input` | Converted to messages | String or array support |
|
|
| `max_output_tokens` | Model-specific field mapping | Gemini vs Anthropic conversion |
|
|
| All other params | Model-specific conversion | Converted per underlying provider |
|
|
|
|
### Gemini Models
|
|
|
|
For Gemini models, conversion follows Gemini's Responses API format.
|
|
|
|
### Anthropic Models (Claude)
|
|
|
|
For Anthropic models, conversion follows Anthropic's message format:
|
|
|
|
- `instructions` becomes system message
|
|
- `reasoning` mapped to `thinking` structure
|
|
|
|
### Configuration
|
|
|
|
<Tabs>
|
|
<Tab title="Gateway">
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8080/v1/responses \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "vertex/claude-3-5-sonnet",
|
|
"input": "What is AI?",
|
|
"instructions": "You are a helpful assistant",
|
|
"project_id": "my-gcp-project",
|
|
"region": "us-central1"
|
|
}' \
|
|
-H "X-Goog-Authorization: Bearer {token}"
|
|
```
|
|
|
|
</Tab>
|
|
<Tab title="Go SDK">
|
|
|
|
```go
|
|
resp, err := client.ResponsesRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostResponsesRequest{
|
|
Provider: schemas.Vertex,
|
|
Model: "claude-3-5-sonnet",
|
|
Input: messages,
|
|
Params: &schemas.ResponsesParameters{
|
|
Instructions: schemas.Ptr("You are a helpful assistant"),
|
|
},
|
|
})
|
|
```
|
|
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
### Special Handling
|
|
|
|
- Endpoint: `/v1/messages` (Anthropic format)
|
|
- `anthropic_version` set to `vertex-2023-10-16` automatically
|
|
- Model and region fields removed from request
|
|
- Raw request body passthrough supported
|
|
|
|
Refer to [Anthropic Responses API](/providers/supported-providers/anthropic#2-responses-api) for parameter details.
|
|
|
|
---
|
|
|
|
# 3. Embeddings
|
|
|
|
Embeddings are supported for Gemini and other models that support embedding generation.
|
|
|
|
## Request Parameters
|
|
|
|
### Core Parameters
|
|
|
|
| Parameter | Vertex Mapping | Notes |
|
|
| ------------ | --------------------------------- | -------------------- |
|
|
| `input` | `instances[].content` | Text to embed |
|
|
| `dimensions` | `parameters.outputDimensionality` | Optional output size |
|
|
|
|
### Advanced Parameters
|
|
|
|
Use `extra_params` for embedding-specific options:
|
|
|
|
<Tabs>
|
|
<Tab title="Gateway">
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8080/v1/embeddings \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "text-embedding-004",
|
|
"input": ["text to embed"],
|
|
"dimensions": 256,
|
|
"task_type": "RETRIEVAL_DOCUMENT",
|
|
"title": "Document title",
|
|
"project_id": "my-gcp-project",
|
|
"region": "us-central1",
|
|
"autoTruncate": true
|
|
}'
|
|
```
|
|
|
|
</Tab>
|
|
<Tab title="Go SDK">
|
|
|
|
```go
|
|
resp, err := client.EmbeddingRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostEmbeddingRequest{
|
|
Provider: schemas.Vertex,
|
|
Model: "text-embedding-004",
|
|
Input: &schemas.EmbeddingInput{
|
|
Texts: []string{"text to embed"},
|
|
},
|
|
Params: &schemas.EmbeddingParameters{
|
|
Dimensions: schemas.Ptr(256),
|
|
ExtraParams: map[string]interface{}{
|
|
"task_type": "RETRIEVAL_DOCUMENT",
|
|
"title": "Document title",
|
|
"autoTruncate": true,
|
|
},
|
|
},
|
|
})
|
|
```
|
|
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
#### Embedding Parameters
|
|
|
|
| Parameter | Type | Description |
|
|
| -------------- | ------- | ------------------------------------------------------------------------------------------------------------------------- |
|
|
| `task_type` | string | Task type hint: `RETRIEVAL_QUERY`, `RETRIEVAL_DOCUMENT`, `SEMANTIC_SIMILARITY`, `CLASSIFICATION`, `CLUSTERING` (optional) |
|
|
| `title` | string | Optional title to help model produce better embeddings (used with task_type) |
|
|
| `autoTruncate` | boolean | Auto-truncate input to max tokens (defaults to true) |
|
|
|
|
### Task Type Effects
|
|
|
|
Different task types optimize embeddings for specific use cases:
|
|
|
|
- `RETRIEVAL_DOCUMENT` - Optimized for documents in retrieval systems
|
|
- `RETRIEVAL_QUERY` - Optimized for queries searching documents
|
|
- `SEMANTIC_SIMILARITY` - Optimized for semantic similarity tasks
|
|
- `CLASSIFICATION` - For classification tasks
|
|
- `CLUSTERING` - For clustering tasks
|
|
|
|
## Response Conversion
|
|
|
|
Embeddings response includes vectors and truncation information:
|
|
|
|
```json
|
|
{
|
|
"embeddings": [
|
|
{
|
|
"values": [0.1234, -0.5678, ...],
|
|
"statistics": {
|
|
"token_count": 15,
|
|
"truncated": false
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Response Fields**:
|
|
|
|
- `values` - Embedding vector as floats
|
|
- `statistics.token_count` - Input token count
|
|
- `statistics.truncated` - Whether input was truncated due to length
|
|
|
|
---
|
|
|
|
# 4. Image Generation
|
|
|
|
Image Generation is supported for Gemini and Imagen on Vertex AI. The provider automatically routes to the appropriate format based on the model type.
|
|
|
|
## Request Parameters
|
|
|
|
### Core Parameter Mapping
|
|
|
|
| Parameter | Vertex Handling | Notes |
|
|
| ---------------- | ------------------------------------- | ------------------------------------------------- |
|
|
| `model` | Mapped to deployment/model identifier | Model type detected automatically |
|
|
| `prompt` | Model-specific conversion | Converted per underlying provider (Gemini/Imagen) |
|
|
| All other params | Model-specific conversion | Converted per underlying provider |
|
|
|
|
### Model Type Detection
|
|
|
|
Vertex automatically detects the model type and uses the appropriate conversion:
|
|
|
|
1. **Gemini Models**: Uses Gemini format (same as [Gemini Image Generation](/providers/supported-providers/gemini#8-image-generation))
|
|
2. **Imagen Models**: Uses Imagen format (detected via `IsImagenModel()`)
|
|
|
|
### Configuration
|
|
|
|
<Tabs>
|
|
<Tab title="Gateway">
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8080/v1/images/generations \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "vertex/imagen-4.0-generate-001",
|
|
"prompt": "A sunset over the mountains",
|
|
"size": "1024x1024",
|
|
"n": 2,
|
|
"project_id": "my-gcp-project",
|
|
"region": "us-central1"
|
|
}' \
|
|
-H "X-Goog-Authorization: Bearer {token}"
|
|
```
|
|
|
|
</Tab>
|
|
<Tab title="Go SDK">
|
|
|
|
```go
|
|
resp, err := client.ImageGenerationRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostImageGenerationRequest{
|
|
Provider: schemas.Vertex,
|
|
Model: "imagen-4.0-generate-001",
|
|
Input: &schemas.ImageGenerationInput{
|
|
Prompt: "A sunset over the mountains",
|
|
},
|
|
Params: &schemas.ImageGenerationParameters{
|
|
Size: schemas.Ptr("1024x1024"),
|
|
N: schemas.Ptr(2),
|
|
},
|
|
})
|
|
```
|
|
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
## Request Conversion
|
|
|
|
Vertex converts requests based on model type:
|
|
|
|
- **Gemini Models**: Uses `gemini.ToGeminiImageGenerationRequest()` - same conversion as standard Gemini (see [Gemini Image Generation](/providers/supported-providers/gemini#8-image-generation))
|
|
- **Imagen Models**: Uses `gemini.ToImagenImageGenerationRequest()` - Imagen-specific format with size/aspect ratio conversion
|
|
|
|
All request bodies are converted to `map[string]interface{}` and the `region` field is removed before sending to Vertex API.
|
|
|
|
## Response Conversion
|
|
|
|
- **Gemini Models**: Responses converted using `GenerateContentResponse.ToBifrostImageGenerationResponse()` - same as standard Gemini
|
|
- **Imagen Models**: Responses converted using `GeminiImagenResponse.ToBifrostImageGenerationResponse()` - Imagen-specific format
|
|
|
|
## Endpoint Selection
|
|
|
|
The provider automatically selects the endpoint based on model type:
|
|
|
|
- **Fine-tuned models**: `/v1beta1/projects/{projectNumber}/locations/{region}/endpoints/{deployment}:generateContent`
|
|
- **Imagen models**: `/v1/projects/{projectID}/locations/{region}/publishers/google/models/{model}:predict`
|
|
- **Gemini models**: `/v1/projects/{projectID}/locations/{region}/publishers/google/models/{model}:generateContent`
|
|
|
|
## Streaming
|
|
|
|
Image generation streaming is not supported by Vertex AI.
|
|
|
|
---
|
|
|
|
# 5. Image Edit
|
|
|
|
<Warning>Requests use **multipart/form-data**, not JSON.</Warning>
|
|
|
|
Image Edit is supported for Gemini and Imagen models on Vertex AI. The provider automatically routes to the appropriate format based on the model type.
|
|
|
|
**Request Parameters**
|
|
|
|
| Parameter | Type | Required | Notes |
|
|
| -------------------- | ------ | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
| `model` | string | ✅ | Model identifier (must be Gemini or Imagen model) |
|
|
| `prompt` | string | ✅ | Text description of the edit |
|
|
| `image[]` | binary | ✅ | Image file(s) to edit (supports multiple images) |
|
|
| `mask` | binary | ❌ | Mask image file |
|
|
| `type` | string | ❌ | Edit type: `"inpainting"`, `"outpainting"`, `"inpaint_removal"`, `"bgswap"` (Imagen only) |
|
|
| `n` | int | ❌ | Number of images to generate (1-10) |
|
|
| `output_format` | string | ❌ | Output format: `"png"`, `"webp"`, `"jpeg"` |
|
|
| `output_compression` | int | ❌ | Compression level (0-100%) |
|
|
| `seed` | int | ❌ | Seed for reproducibility (via `ExtraParams["seed"]`) |
|
|
| `negative_prompt` | string | ❌ | Negative prompt (via `ExtraParams["negativePrompt"]`) |
|
|
| `maskMode` | string | ❌ | Mask mode (via `ExtraParams["maskMode"]`, Imagen only): `"MASK_MODE_USER_PROVIDED"`, `"MASK_MODE_BACKGROUND"`, `"MASK_MODE_FOREGROUND"`, `"MASK_MODE_SEMANTIC"` |
|
|
| `dilation` | float | ❌ | Mask dilation (via `ExtraParams["dilation"]`, Imagen only): Range [0, 1] |
|
|
| `maskClasses` | int[] | ❌ | Mask classes (via `ExtraParams["maskClasses"]`, Imagen only): For `MASK_MODE_SEMANTIC` |
|
|
|
|
---
|
|
|
|
**Request Conversion**
|
|
|
|
Vertex uses the same conversion functions as Gemini:
|
|
|
|
1. **Gemini Models**: Uses `gemini.ToGeminiImageEditRequest()` - same conversion as standard Gemini (see [Gemini Image Edit](/providers/supported-providers/gemini#9-image-edit))
|
|
2. **Imagen Models**: Uses `gemini.ToImagenImageEditRequest()` - Imagen-specific format with edit mode mapping and mask configuration (see [Gemini Image Edit](/providers/supported-providers/gemini#9-image-edit))
|
|
|
|
**Model Validation**: Only Gemini and Imagen models are supported. Other models return `ConfigurationError`.
|
|
|
|
**Request Body Processing**:
|
|
|
|
- All request bodies are converted to `map[string]interface{}` for Vertex API compatibility
|
|
- The `region` field is removed before sending to Vertex API
|
|
- For Gemini models, unsupported fields are stripped via `stripVertexGeminiUnsupportedFields()` (removes `id` from function_call and function_response)
|
|
|
|
**Response Conversion**
|
|
|
|
- **Gemini Models**: Responses converted using `GenerateContentResponse.ToBifrostImageGenerationResponse()` - same as standard Gemini
|
|
- **Imagen Models**: Responses converted using `GeminiImagenResponse.ToBifrostImageGenerationResponse()` - Imagen-specific format
|
|
|
|
**Endpoint Selection**
|
|
|
|
The provider automatically selects the endpoint based on model type:
|
|
|
|
- **Gemini models**: `/v1/projects/{projectID}/locations/{region}/publishers/google/models/{model}:generateContent`
|
|
- **Imagen models**: `/v1/projects/{projectID}/locations/{region}/publishers/google/models/{model}:predict`
|
|
|
|
**Streaming**
|
|
|
|
Image edit streaming is not supported by Vertex AI.
|
|
|
|
**Image Variation**
|
|
|
|
Image variation is not supported by Vertex AI.
|
|
|
|
---
|
|
|
|
# 6. List Models
|
|
|
|
## Request Parameters
|
|
|
|
None required. Automatically uses project_id and region from key config.
|
|
|
|
## Response Conversion
|
|
|
|
Lists models available in the specified project and region with metadata and deployment information:
|
|
|
|
```json
|
|
{
|
|
"models": [
|
|
{
|
|
"name": "projects/{project}/locations/{region}/models/gemini-2.0-flash",
|
|
"display_name": "Gemini 2.0 Flash",
|
|
"description": "Fast multimodal model",
|
|
"version_id": "1",
|
|
"version_aliases": ["latest", "stable"],
|
|
"capabilities": [...],
|
|
"deployed_models": [...]
|
|
}
|
|
],
|
|
"next_page_token": "..."
|
|
}
|
|
```
|
|
|
|
## Custom vs Non-Custom Models
|
|
|
|
<Warning>
|
|
**Important**: Vertex AI's List Models API **only returns custom fine-tuned
|
|
models** that have been deployed to your project. It does NOT return standard
|
|
foundation models (Gemini, Claude, etc.).
|
|
</Warning>
|
|
|
|
To provide a complete model listing experience, Bifrost performs **multi-pass model discovery**:
|
|
|
|
### Three-Pass Model Discovery
|
|
|
|
1. **First Pass - Custom Models from API Response**
|
|
- Queries Vertex AI's List Models API
|
|
- Returns only custom fine-tuned models deployed to your project
|
|
- Custom models are identified by having deployment values that contain only digits
|
|
- Example: `"deployment": "1234567890"`
|
|
|
|
2. **Second Pass - Non-Custom Models from Aliases**
|
|
- Adds standard foundation models from your `aliases` configuration
|
|
- Non-custom models have alphanumeric deployment values (e.g., `gemini-pro`, `claude-3-5-sonnet`)
|
|
- Filters by the key-level `models` allowlist, if specified
|
|
- Example: `"deployment": "gemini-2.0-flash"`
|
|
|
|
3. **Third Pass - Allowed Models Not in Aliases**
|
|
- Adds models specified in `models` that weren't in the `aliases` map
|
|
- Ensures all explicitly allowed models appear in the list
|
|
- Uses the model name itself as the deployment value
|
|
- Skips digit-only model IDs (reserved for custom models)
|
|
|
|
### Model Filtering Logic
|
|
|
|
- **If `models` is empty and no aliases are configured**: No models are returned
|
|
- **If `models` is empty but aliases are configured**: Only aliased models are returned
|
|
- **If `models` is `["*"]`**: All models from all three passes are included (unrestricted)
|
|
- **If `models` is non-empty**: Only models/aliases whose request names appear in `models` are included
|
|
- **Duplicate Prevention**: Each model ID is tracked to prevent duplicates across passes
|
|
|
|
### Model Name Formatting
|
|
|
|
Non-custom models from aliases and allowed models are automatically formatted for display:
|
|
|
|
- `gemini-pro` → "Gemini Pro"
|
|
- `claude-3-5-sonnet` → "Claude 3 5 Sonnet"
|
|
- `gemini_2_flash` → "Gemini 2 Flash"
|
|
|
|
Formatting uses title case and converts hyphens/underscores to spaces.
|
|
|
|
### Example Configuration
|
|
|
|
<Tabs>
|
|
<Tab title="With Custom Models Only">
|
|
|
|
```json
|
|
{
|
|
"aliases": {
|
|
"my-gemini-ft": "1234567890",
|
|
"my-claude-ft": "9876543210"
|
|
},
|
|
"vertex_key_config": {
|
|
"project_id": "my-project",
|
|
"region": "us-central1"
|
|
}
|
|
}
|
|
```
|
|
|
|
This returns only your custom fine-tuned models from the API.
|
|
|
|
</Tab>
|
|
<Tab title="With Foundation Models">
|
|
|
|
```json
|
|
{
|
|
"aliases": {
|
|
"gemini-2.0-flash": "gemini-2.0-flash",
|
|
"claude-3-5-sonnet": "claude-3-5-sonnet-v2@20241022"
|
|
},
|
|
"vertex_key_config": {
|
|
"project_id": "my-project",
|
|
"region": "us-central1"
|
|
}
|
|
}
|
|
```
|
|
|
|
This returns both custom models AND foundation models from aliases.
|
|
|
|
</Tab>
|
|
<Tab title="With Allowed Models Filter">
|
|
|
|
```json
|
|
{
|
|
"models": ["gemini-2.0-flash", "claude-3-5-sonnet"],
|
|
"aliases": {
|
|
"gemini-2.0-flash": "gemini-2.0-flash",
|
|
"claude-3-5-sonnet": "claude-3-5-sonnet-v2@20241022",
|
|
"gemini-1.5-pro": "gemini-1.5-pro"
|
|
},
|
|
"vertex_key_config": {
|
|
"project_id": "my-project",
|
|
"region": "us-central1"
|
|
}
|
|
}
|
|
```
|
|
|
|
Only returns `gemini-2.0-flash` and `claude-3-5-sonnet`, excluding `gemini-1.5-pro`.
|
|
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
### Pagination
|
|
|
|
Model listing is paginated automatically. If more than 100 models exist, `next_page_token` will be present. Bifrost handles pagination internally.
|
|
|
|
---
|
|
|
|
## Caveats
|
|
|
|
<Accordion title="Project ID and Region Required">
|
|
**Severity**: High **Behavior**: Both project_id and region required for all
|
|
operations **Impact**: Request fails without valid GCP project/region
|
|
configuration **Code**: `vertex.go:127-138`
|
|
</Accordion>
|
|
|
|
<Accordion title="OAuth2 Token Management">
|
|
**Severity**: Medium **Behavior**: Tokens cached and automatically refreshed
|
|
when expired **Impact**: First request slightly slower due to auth; cached for
|
|
subsequent requests **Code**: `vertex.go:34-55`
|
|
</Accordion>
|
|
|
|
<Accordion title="Anthropic Model Detection">
|
|
**Severity**: Medium **Behavior**: Automatic detection of Anthropic vs Gemini
|
|
models **Impact**: Different conversion logic applied transparently **Code**:
|
|
`vertex.go` chat/responses endpoints
|
|
</Accordion>
|
|
|
|
<Accordion title="Model-Specific Responses API Handling">
|
|
**Severity**: Low **Behavior**: Responses API automatically routes to
|
|
Anthropic or Gemini implementation based on model **Impact**: Different
|
|
conversion logic applied transparently per model **Code**:
|
|
`vertex.go:836-1080`
|
|
</Accordion>
|
|
|
|
<Accordion title="Anthropic Version Lock">
|
|
**Severity**: Low **Behavior**: `anthropic_version` always set to
|
|
`vertex-2023-10-16` for Claude **Impact**: Cannot override Anthropic version
|
|
for Claude on Vertex **Code**: `utils.go:33, 71`
|
|
</Accordion>
|
|
|
|
<Accordion title="Embeddings Precision Preservation">
|
|
**Severity**: Low **Behavior**: Vertex returns float64 embeddings, and Bifrost
|
|
preserves that precision in normalized embedding responses **Impact**: No
|
|
precision loss in the `/v1/embeddings` response path **Code**:
|
|
`embedding.go:84-91`
|
|
</Accordion>
|
|
|
|
<Accordion title="List Models API Returns Only Custom Models">
|
|
**Severity**: High **Behavior**: Vertex AI's List Models API only returns
|
|
custom fine-tuned models, NOT foundation models **Impact**: Bifrost performs
|
|
three-pass discovery to include foundation models from aliases and the
|
|
key-level `models` allowlist **Why**: This is a Vertex AI API limitation -
|
|
foundation models must be explicitly configured **Code**: `models.go:76-217`
|
|
</Accordion>
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
**HTTP Settings**: OAuth2 authentication with automatic token refresh | Region-specific endpoints | Max Connections 5000 | Max Idle 60 seconds
|
|
|
|
**Scope**: `https://www.googleapis.com/auth/cloud-platform`
|
|
|
|
**Endpoint Format**: `https://{region}-aiplatform.googleapis.com/v1/projects/{project}/locations/{region}/{resource}`
|
|
|
|
**Note**: For `global` region, endpoint is `https://aiplatform.googleapis.com/v1/projects/{project}/locations/global/{resource}`
|
|
|
|
## Video Generation
|
|
|
|
Vertex AI routes video generation through Gemini's Veo models using the `predictLongRunning` endpoint. All parameters are identical to [Gemini Video Generation](/providers/supported-providers/gemini#video-generation).
|
|
|
|
<Note>
|
|
Only Veo models are supported (e.g., `veo-2.0-generate-001`). Passing a
|
|
non-Veo model name returns a configuration error.
|
|
</Note>
|
|
|
|
**Supported Operations**
|
|
|
|
| Operation | Supported | Notes |
|
|
| --------- | --------- | ----------------------------- |
|
|
| Generate | ✅ | `POST /v1/videos` |
|
|
| Retrieve | ✅ | `GET /v1/videos/{id}` |
|
|
| Download | ✅ | `GET /v1/videos/{id}/content` |
|
|
| Delete | ❌ | Not supported |
|
|
| List | ❌ | Not supported |
|
|
| Remix | ❌ | Not supported |
|