571 lines
26 KiB
Plaintext
571 lines
26 KiB
Plaintext
---
|
|
title: "Hugging Face"
|
|
description: "Detailed guide on Hugging Face provider implementation specifics, including model aliases and unique request handling."
|
|
icon: "face-smiling-hands"
|
|
---
|
|
|
|
The Hugging Face provider in Bifrost (`core/providers/huggingface`) implements a complex integration that supports multiple inference providers (like `hf-inference`, `fal-ai`, `cerebras`, `sambanova`, etc.) through a unified interface.
|
|
|
|
## Overview
|
|
|
|
The Hugging Face provider implements custom logic for:
|
|
- **Multiple inference backends**: Routes requests to 19+ different inference providers
|
|
- **Dynamic model aliasing**: Transforms model IDs based on provider-specific mappings
|
|
- **Heterogeneous request formats**: Supports JSON, raw binary, and base64-encoded payloads
|
|
- **Provider-specific constraints**: Handles varying payload limits and format restrictions
|
|
|
|
## Supported Inference Providers
|
|
|
|
The Hugging Face provider supports routing to 20+ inference backends. Below is the current list of supported providers and their capabilities (as of December 2025):
|
|
|
|
| Provider | Chat | Embedding | Speech (TTS) | Transcription (ASR) | Image Generation | Image Generation (stream) | Image Edit | Image Edit (stream) |
|
|
|----------|------|-----------|--------------|---------------------|------------------|---------------------------|------------|---------------------|
|
|
| `hf-inference` | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ❌ |
|
|
| `cerebras` | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
| `cohere` | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
| `fal-ai` | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
|
| `featherless-ai` | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
| `fireworks` | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
| `groq` | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
| `hyperbolic` | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
| `nebius` | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ |
|
|
| `novita` | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
| `nscale` | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
| `ovhcloud-ai-endpoints` | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
| `public-ai` | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
| `replicate` | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
| `sambanova` | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
| `scaleway` | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
| `together` | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ |
|
|
| `z-ai` | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
|
|
|
<Note>Provider capabilities may change over time. For the most up-to-date information, refer to the [Hugging Face Inference Providers documentation](https://huggingface.co/docs/inference-providers/en/index#partners). Also checkmarks (✅) indicate capabilities supported by the inference provider itself.</Note>
|
|
|
|
<Info>All Chat-supported models automatically support Responses(`v1/responses`) as well via Bifrost's internal conversion logic.</Info>
|
|
|
|
## Model Aliases & Identification
|
|
|
|
Unlike standard providers where model IDs are direct strings (e.g., `gpt-4`), Hugging Face models in Bifrost are identified by a composite key to route requests to the correct inference backend.
|
|
|
|
**Format**: `huggingface/[inference_provider]/[model_id]`
|
|
|
|
- **inference_provider**: The backend service (e.g., `hf-inference`, `fal-ai`, `cerebras`).
|
|
- **model_id**: The actual model identifier on Hugging Face Hub (e.g., `meta-llama/Meta-Llama-3-8B-Instruct`).
|
|
|
|
**Example**: `huggingface/hf-inference/meta-llama/Meta-Llama-3-8B-Instruct`
|
|
|
|
This parsing logic is handled in `utils.go` and `models.go`, allowing Bifrost to dynamically route requests based on the model string.
|
|
|
|
## Request Handling Differences
|
|
|
|
The Hugging Face provider handles various tasks (Chat, Speech, Transcription) which often require different request structures depending on the underlying inference provider.
|
|
|
|
### Inference Provider Constraints
|
|
|
|
Different inference providers have specific limitations and requirements:
|
|
|
|
#### Payload Limit
|
|
HuggingFace API enforces a **2 MB request body limit** across all request types (Chat, Embedding, Speech, Transcription). This constraint applies to:
|
|
- JSON request payloads
|
|
- Raw audio bytes in transcription requests
|
|
- Any other request body data
|
|
|
|
**Impact**: Large audio files, extensive chat histories, or bulk embedding requests may need to be split or compressed before sending.
|
|
|
|
#### `fal-ai` Audio Format Restrictions
|
|
The `fal-ai` provider has strict audio format requirements:
|
|
- **Supported Format**: Only **MP3** (`audio/mpeg`) is accepted
|
|
- **Rejected Formats**: WAV (`audio/wav`) and other formats are explicitly rejected
|
|
- **Encoding**: Audio must be provided as a **base64-encoded Data URI** in the `audio_url` field
|
|
|
|
**Validation Logic** (from `core/providers/huggingface/transcription.go`):
|
|
```go
|
|
mimeType := getMimeTypeForAudioType(utils.DetectAudioMimeType(request.Input.File))
|
|
if mimeType == "audio/wav" {
|
|
return nil, fmt.Errorf("fal-ai provider does not support audio/wav format; please use a different format like mp3 or ogg")
|
|
}
|
|
encoded = fmt.Sprintf("data:%s;base64,%s", mimeType, encoded)
|
|
```
|
|
|
|
### Speech (Text-to-Speech)
|
|
|
|
For Text-to-Speech (TTS) requests, the implementation differs from a standard pipeline request:
|
|
|
|
- **No Pipeline Tag**: The `HuggingFaceSpeechRequest` struct does not include a `pipeline_tag` field in the JSON body, even though the model might be tagged as `text-to-speech` on the Hub.
|
|
- **Structure**:
|
|
```go
|
|
type HuggingFaceSpeechRequest struct {
|
|
Text string `json:"text"`
|
|
Provider string `json:"provider" validate:"required"`
|
|
Model string `json:"model" validate:"required"`
|
|
Parameters *HuggingFaceSpeechParameters `json:"parameters,omitempty"`
|
|
}
|
|
```
|
|
- **Implementation**: See `core/providers/huggingface/speech.go`.
|
|
|
|
### Transcription (Automatic Speech Recognition)
|
|
|
|
The Transcription implementation (`core/providers/huggingface/transcription.go`) exhibits a "pattern-breaking" behavior where the request format changes significantly based on the inference provider.
|
|
|
|
#### 1. `hf-inference` (Raw Bytes)
|
|
When using the standard `hf-inference` provider, the API expects the **raw audio bytes** directly in the request body, not a JSON object.
|
|
|
|
- **Content-Type**: Audio mime type (e.g., `audio/mpeg`).
|
|
- **Body**: Raw binary data from `request.Input.File`.
|
|
- **Payload Limit**: **Maximum 2 MB** for the raw audio bytes.
|
|
- **Logic**:
|
|
```go
|
|
// core/providers/huggingface/huggingface.go
|
|
if inferenceProvider == hfInference {
|
|
jsonData = request.Input.File // Raw bytes (max 2 MB)
|
|
isHFInferenceAudioRequest = true
|
|
}
|
|
```
|
|
- **URL Pattern**: `/hf-inference/models/{model_name}` (no `/pipeline/` suffix for ASR).
|
|
|
|
#### 2. `fal-ai` (JSON with Base64 Data URI)
|
|
When using `fal-ai` through HuggingFace provider, the API expects a **JSON body** containing the audio as a **base64-encoded Data URI**.
|
|
|
|
|
|
- **Content-Type**: `application/json`.
|
|
- **Body**: JSON object with `audio_url` field.
|
|
- **Audio Format Restriction**: **Only MP3** (`audio/mpeg`) is supported. WAV files are rejected.
|
|
- **Encoding**: Audio is base64-encoded and prefixed with a Data URI scheme.
|
|
- **Logic**:
|
|
```go
|
|
// core/providers/huggingface/transcription.go
|
|
encoded = base64.StdEncoding.EncodeToString(request.Input.File)
|
|
mimeType := getMimeTypeForAudioType(utils.DetectAudioMimeType(request.Input.File))
|
|
if mimeType == "audio/wav" {
|
|
return nil, fmt.Errorf("fal-ai provider does not support audio/wav format; please use a different format like mp3 or ogg")
|
|
}
|
|
encoded = fmt.Sprintf("data:%s;base64,%s", mimeType, encoded)
|
|
hfRequest = &HuggingFaceTranscriptionRequest{
|
|
AudioURL: encoded,
|
|
}
|
|
```
|
|
|
|
#### Dual Fields in `types.go`
|
|
To support these divergent requirements, the `HuggingFaceTranscriptionRequest` struct in `types.go` contains fields for both scenarios, which are used mutually exclusively:
|
|
|
|
```go
|
|
type HuggingFaceTranscriptionRequest struct {
|
|
Inputs []byte `json:"inputs,omitempty"` // For standard JSON providers (NOT hf-inference raw body)
|
|
AudioURL string `json:"audio_url,omitempty"` // For fal-ai (base64 Data URI, MP3 only)
|
|
Provider *string `json:"provider,omitempty"`
|
|
Model *string `json:"model,omitempty"`
|
|
Parameters *HuggingFaceTranscriptionRequestParameters `json:"parameters,omitempty"`
|
|
}
|
|
```
|
|
|
|
**Key Points**:
|
|
- `Inputs`: Used when JSON body is sent with raw bytes (most providers except `hf-inference` and `fal-ai`).
|
|
- `AudioURL`: Used exclusively for `fal-ai`, must be a base64-encoded Data URI with MP3 format.
|
|
- **Note**: For `hf-inference`, the entire request body is raw audio bytes—no JSON structure is used at all.
|
|
|
|
## Image Generation
|
|
|
|
The Hugging Face provider supports image generation through multiple inference providers, each with different request formats and capabilities.
|
|
|
|
### Supported Inference Providers
|
|
|
|
| Provider | Non-Streaming | Streaming | Notes |
|
|
|----------|--------------|-----------|-------|
|
|
| `hf-inference` | ✅ | ❌ | Simple prompt-only format, returns raw image bytes |
|
|
| `fal-ai` | ✅ | ✅ | Full parameter support, supports streaming via Server-Sent Events |
|
|
| `nebius` | ✅ | ❌ | Uses Nebius-specific format with width/height, LoRAs support |
|
|
| `together` | ✅ | ❌ | OpenAI-compatible format |
|
|
|
|
### Request Conversion
|
|
|
|
The provider automatically routes to the appropriate inference provider based on the model string format: `huggingface/{provider}/{model_id}`.
|
|
|
|
#### 1. `hf-inference`
|
|
|
|
The simplest format, only requires a prompt:
|
|
|
|
- **Request Structure**:
|
|
```go
|
|
type HuggingFaceHFInferenceImageGenerationRequest struct {
|
|
Inputs string `json:"inputs"` // The prompt text
|
|
}
|
|
```
|
|
- **Response**: Raw image bytes (PNG/JPEG), automatically base64-encoded in Bifrost response
|
|
- **Limitations**: No size, quality, or other parameter support
|
|
|
|
#### 2. `fal-ai`
|
|
|
|
The most feature-rich provider with extensive parameter support:
|
|
|
|
- **Request Structure**:
|
|
```go
|
|
type HuggingFaceFalAIImageGenerationRequest struct {
|
|
Prompt string `json:"prompt"`
|
|
NumImages *int `json:"num_images,omitempty"` // Maps from params.n
|
|
ResponseFormat *string `json:"response_format,omitempty"` // "url" or "b64_json"
|
|
ImageSize *HuggingFaceFalAISize `json:"image_size,omitempty"` // {width, height} from size
|
|
NegativePrompt *string `json:"negative_prompt,omitempty"`
|
|
GuidanceScale *float64 `json:"guidance_scale,omitempty"` // From extra_params
|
|
NumInferenceSteps *int `json:"num_inference_steps,omitempty"`
|
|
Seed *int `json:"seed,omitempty"`
|
|
OutputFormat *string `json:"output_format,omitempty"` // "png", "jpeg", "webp" (jpg→jpeg)
|
|
SyncMode *bool `json:"sync_mode,omitempty"` // Auto-set if response_format="b64_json"
|
|
EnableSafetyChecker *bool `json:"enable_safety_checker,omitempty"` // Auto-set if moderation="low"
|
|
Acceleration *string `json:"acceleration,omitempty"` // From extra_params
|
|
EnablePromptExpansion *bool `json:"enable_prompt_expansion,omitempty"` // From extra_params
|
|
}
|
|
```
|
|
- **Parameter Mappings**:
|
|
- `n` → `num_images`
|
|
- `size` (e.g., `"1024x1024"`) → `image_size: {width: 1024, height: 1024}`
|
|
- `output_format: "jpg"` → `output_format: "jpeg"` (normalized)
|
|
- `response_format: "b64_json"` → `sync_mode: true`
|
|
- `moderation: "low"` → `enable_safety_checker: false`
|
|
- **Response**: JSON with `images[]` array containing `url` and/or `b64_json` fields
|
|
- **Extra Parameters**: Supports `guidance_scale`, `acceleration`, `enable_prompt_expansion`, `enable_safety_checker` via `extra_params`
|
|
|
|
#### 3. `nebius`
|
|
|
|
Uses Nebius-specific format with support for LoRAs:
|
|
|
|
- **Request Structure**: Uses `NebiusImageGenerationRequest` (see Nebius provider docs)
|
|
- **Parameter Mappings**:
|
|
- `size` (e.g., `"1024x1024"`) → `width` and `height` integers
|
|
- `output_format` → `response_extension` (normalized: "jpeg" → "jpg")
|
|
- `seed`, `negative_prompt` → Passed directly
|
|
- `extra_params.num_inference_steps` → `num_inference_steps`
|
|
- `extra_params.guidance_scale` → `guidance_scale`
|
|
- `extra_params.loras` → `loras[]` array (supports both map and array formats)
|
|
- **Response**: Uses Nebius response format, converted to Bifrost format
|
|
|
|
#### 4. `together`
|
|
|
|
OpenAI-compatible format:
|
|
|
|
- **Request Structure**:
|
|
```go
|
|
type HuggingFaceTogetherImageGenerationRequest struct {
|
|
Prompt string `json:"prompt"`
|
|
Model string `json:"model"`
|
|
ResponseFormat *string `json:"response_format,omitempty"`
|
|
Size *string `json:"size,omitempty"` // Passed directly
|
|
N *int `json:"n,omitempty"`
|
|
Steps *int `json:"steps,omitempty"` // From num_inference_steps
|
|
}
|
|
```
|
|
- **Parameter Mappings**:
|
|
- `response_format: "b64_json"` → `response_format: "base64"`
|
|
- `num_inference_steps` → `steps`
|
|
- **Response**: OpenAI-compatible format with `data[]` array
|
|
|
|
### Response Conversion
|
|
|
|
Each provider's response is converted to Bifrost's unified `BifrostImageGenerationResponse` format:
|
|
|
|
- **hf-inference**: Raw bytes → base64-encoded in `b64_json`
|
|
- **fal-ai**: `images[]` array → `ImageData[]` with `url` and/or `b64_json`
|
|
- **nebius**: Uses Nebius converter → Bifrost format
|
|
- **together**: `data[]` array → `ImageData[]` with `b64_json` and/or `url`
|
|
|
|
### Image Generation Streaming
|
|
|
|
**Only `fal-ai` supports streaming** for HuggingFace image generation. Streaming uses Server-Sent Events (SSE) format.
|
|
|
|
#### Streaming Request Format
|
|
|
|
```go
|
|
type HuggingFaceFalAIImageStreamRequest struct {
|
|
Prompt string `json:"prompt"`
|
|
ResponseFormat *string `json:"response_format,omitempty"`
|
|
NumImages *int `json:"num_images,omitempty"`
|
|
ImageSize *HuggingFaceFalAISize `json:"image_size,omitempty"`
|
|
// ... same parameters as non-streaming
|
|
}
|
|
```
|
|
|
|
#### Streaming Response Format
|
|
|
|
- **Event Type**: Server-Sent Events with `data:` prefix
|
|
- **Chunk Format**: Each SSE event contains JSON with `images[]` array
|
|
- **Stream Processing**:
|
|
- Each image in `images[]` becomes a separate stream chunk
|
|
- Chunks have `type: "partial"` until stream completion
|
|
- Final chunk has `type: "completed"` with the last image data
|
|
- Images can be delivered as `url` (public URL) or `b64_json` (base64-encoded)
|
|
- **URL Pattern**: `/fal-ai/{model_id}/stream` (appended to base URL)
|
|
|
|
#### Streaming Behavior
|
|
|
|
- **Chunk Indexing**: Each chunk has an `Index` field (0, 1, 2, ...) and `ChunkIndex` for ordering
|
|
- **Completion**: Final chunk includes all image data from the last SSE event
|
|
- **Error Handling**: Errors in SSE format are parsed and sent as `BifrostError` chunks
|
|
|
|
|
|
### Example Usage
|
|
|
|
<Tabs>
|
|
<Tab title="Gateway - fal-ai">
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8080/v1/images/generations \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "huggingface/fal-ai/fal-ai/flux/dev",
|
|
"prompt": "A futuristic cityscape at sunset",
|
|
"size": "1024x1024",
|
|
"n": 2,
|
|
"output_format": "png",
|
|
"response_format": "url"
|
|
}'
|
|
```
|
|
|
|
</Tab>
|
|
<Tab title="Gateway - Streaming (fal-ai only)">
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8080/v1/images/generations \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "huggingface/fal-ai/fal-ai/flux/dev",
|
|
"prompt": "A futuristic cityscape at sunset",
|
|
"size": "1024x1024",
|
|
"stream": true
|
|
}'
|
|
```
|
|
|
|
</Tab>
|
|
<Tab title="Go SDK">
|
|
|
|
```go
|
|
resp, err := client.ImageGenerationRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostImageGenerationRequest{
|
|
Provider: schemas.HuggingFace,
|
|
Model: "huggingface/fal-ai/fal-ai/flux/dev",
|
|
Input: &schemas.ImageGenerationInput{
|
|
Prompt: "A futuristic cityscape at sunset",
|
|
},
|
|
Params: &schemas.ImageGenerationParameters{
|
|
Size: schemas.Ptr("1024x1024"),
|
|
N: schemas.Ptr(2),
|
|
OutputFormat: schemas.Ptr("png"),
|
|
ResponseFormat: schemas.Ptr("url"),
|
|
Seed: schemas.Ptr(42),
|
|
NegativePrompt: schemas.Ptr("blurry, low quality"),
|
|
NumInferenceSteps: schemas.Ptr(50),
|
|
ExtraParams: map[string]interface{}{
|
|
"guidance_scale": 7.5,
|
|
"acceleration": "t4",
|
|
"enable_prompt_expansion": true,
|
|
},
|
|
},
|
|
})
|
|
```
|
|
|
|
</Tab>
|
|
<Tab title="Go SDK - Streaming">
|
|
|
|
```go
|
|
streamChan, err := client.ImageGenerationStreamRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostImageGenerationRequest{
|
|
Provider: schemas.HuggingFace,
|
|
Model: "huggingface/fal-ai/fal-ai/flux/dev",
|
|
Input: &schemas.ImageGenerationInput{
|
|
Prompt: "A futuristic cityscape at sunset",
|
|
},
|
|
Params: &schemas.ImageGenerationParameters{
|
|
Size: schemas.Ptr("1024x1024"),
|
|
N: schemas.Ptr(2),
|
|
},
|
|
})
|
|
|
|
for stream := range streamChan {
|
|
if stream.BifrostImageGenerationStreamResponse != nil {
|
|
chunk := stream.BifrostImageGenerationStreamResponse
|
|
if chunk.URL != "" {
|
|
// Handle image URL
|
|
} else if chunk.B64JSON != "" {
|
|
// Handle base64 image data
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
### Provider-Specific Notes
|
|
|
|
- **fal-ai**:
|
|
- When `response_format="b64_json"`, `sync_mode` is automatically set to `true`
|
|
- When `moderation="low"`, `enable_safety_checker` is set to `false`
|
|
- `output_format: "jpg"` is normalized to `"jpeg"`
|
|
- **nebius**:
|
|
- `response_extension: "jpeg"` is normalized to `"jpg"` (Nebius inconsistency)
|
|
- LoRAs can be provided as `{"url": scale}` map or `[{"url": "...", "scale": ...}]` array
|
|
- **hf-inference**:
|
|
- Minimal format, only prompt supported
|
|
- Returns raw image bytes (automatically base64-encoded)
|
|
- **together**:
|
|
- OpenAI-compatible format
|
|
- `response_format: "b64_json"` is converted to `"base64"`
|
|
|
|
## Image Edit
|
|
|
|
<Warning>
|
|
Requests use **multipart/form-data**, not JSON.
|
|
</Warning>
|
|
|
|
**Only `fal-ai` supports image editing** for HuggingFace. Image edit requests are routed to fal-ai inference provider.
|
|
|
|
**Request Parameters**
|
|
|
|
| Parameter | Type | Required | Notes |
|
|
|-----------|------|----------|-------|
|
|
| `model` | string | ✅ | Model identifier (must be `huggingface/fal-ai/{model_id}`) |
|
|
| `prompt` | string | ✅ | Text description of the edit |
|
|
| `image[]` | binary | ✅ | Image file(s) to edit (supports multiple images for some models) |
|
|
| `n` | int | ❌ | Number of images to generate (1-10) |
|
|
| `size` | string | ❌ | Image size: `"WxH"` format (e.g., `"1024x1024"`) |
|
|
| `output_format` | string | ❌ | Output format: `"png"`, `"webp"`, `"jpeg"` (note: `"jpg"` is normalized to `"jpeg"`) |
|
|
| `seed` | int | ❌ | Seed for reproducibility (via `ExtraParams["seed"]`) |
|
|
| `num_inference_steps` | int | ❌ | Number of inference steps (via `ExtraParams["num_inference_steps"]`) |
|
|
| `guidance_scale` | float | ❌ | Guidance scale (via `ExtraParams["guidance_scale"]`) |
|
|
| `acceleration` | string | ❌ | Acceleration mode (via `ExtraParams["acceleration"]`) |
|
|
| `enable_safety_checker` | bool | ❌ | Enable safety checker (via `ExtraParams["enable_safety_checker"]`) |
|
|
| `use_image_urls` | bool | ❌ | Override image field selection (via `ExtraParams["use_image_urls"]`) |
|
|
|
|
---
|
|
|
|
**Request Conversion**
|
|
|
|
- **Model Validation**: Only `fal-ai` inference provider supports image edit. Other providers return `UnsupportedOperationError`.
|
|
- **Image Conversion**: Each image in `bifrostReq.Input.Images` is converted to a base64 data URL:
|
|
- Format: `data:{mimeType};base64,{base64Data}`
|
|
- MIME type detection: `image/jpeg`, `image/webp`, `image/png` (via `http.DetectContentType`)
|
|
- **Image Field Selection**: The provider uses different image fields based on model capabilities:
|
|
- **Multi-image models** (e.g., `fal-ai/flux-2/edit`, `fal-ai/flux-2-pro/edit`): Uses `image_urls` array field
|
|
- **Single-image models** (e.g., `fal-ai/flux-pro/kontext`, `fal-ai/flux/dev/image-to-image`): Uses `image_url` string field
|
|
- **Override**: `ExtraParams["use_image_urls"]` can override the automatic selection
|
|
- **Fallback**: For unknown models, uses `image_url` if single image, `image_urls` if multiple images
|
|
- **Parameter Mapping**:
|
|
- `prompt` → `Prompt`
|
|
- `n` → `NumImages`
|
|
- `size` → `ImageSize` (converted from `"WxH"` string to `{Width, Height}` object)
|
|
- `output_format` → `OutputFormat` (`"jpg"` normalized to `"jpeg"`)
|
|
- `seed` (via `ExtraParams["seed"]`) → `Seed`
|
|
- `num_inference_steps` (via `ExtraParams["num_inference_steps"]`) → `NumInferenceSteps`
|
|
- `guidance_scale` (via `ExtraParams["guidance_scale"]`) → `GuidanceScale`
|
|
- `acceleration` (via `ExtraParams["acceleration"]`) → `Acceleration`
|
|
- `enable_safety_checker` (via `ExtraParams["enable_safety_checker"]`) → `EnableSafetyChecker`
|
|
|
|
**Response Conversion**
|
|
|
|
- **Non-streaming**: Uses the same response conversion as image generation (see Image Generation section)
|
|
- **Streaming**: fal-ai streaming responses use Server-Sent Events (SSE) format:
|
|
- **Event Type**: Server-Sent Events with `data:` prefix
|
|
- **Chunk Format**: Each SSE event contains JSON with `images[]` array (or `data.images[]` in API envelope format)
|
|
- **Stream Processing**:
|
|
- Each image in `images[]` becomes a separate stream chunk
|
|
- Chunks have `type: "image_edit.partial_image"` until stream completion
|
|
- Final chunk has `type: "image_edit.completed"` with the last image data
|
|
- Images can be delivered as `url` (public URL) or `b64_json` (base64-encoded)
|
|
- **Response Structure**: Handles both API envelope format (`Data.Images`) and legacy flattened format (`Images`)
|
|
- **URL Pattern**: `/fal-ai/{model_id}/stream` (appended to base URL)
|
|
|
|
**Endpoint**: `/fal-ai/{model_id}` (non-streaming), `/fal-ai/{model_id}/stream` (streaming)
|
|
|
|
**Image Variation**
|
|
|
|
Image variation is not supported by HuggingFace.
|
|
|
|
## Raw JSON Body Handling
|
|
|
|
While most providers strictly serialize a struct to JSON, the Hugging Face provider's `Transcription` method demonstrates a hybrid approach depending on the inference provider:
|
|
|
|
### Embedding Requests
|
|
|
|
For embedding requests, different providers expect different field names:
|
|
|
|
- **Standard providers** (most): Use `input` field
|
|
- **`hf-inference`**: Uses `inputs` field (plural)
|
|
|
|
**Request Structure**:
|
|
```go
|
|
type HuggingFaceEmbeddingRequest struct {
|
|
Input interface{} `json:"input,omitempty"` // Used by all providers except hf-inference
|
|
Inputs interface{} `json:"inputs,omitempty"` // Used by hf-inference
|
|
Provider *string `json:"provider,omitempty"` // Identifies the inference backend
|
|
Model *string `json:"model,omitempty"`
|
|
// ... other fields
|
|
}
|
|
```
|
|
|
|
The converter in `embedding.go` populates both fields to ensure compatibility across providers.
|
|
|
|
### Differences in Inference Provider Constraints
|
|
|
|
This multi-mode approach allows the provider to support diverse API contracts within a single implementation structure, accommodating:
|
|
1. **Legacy endpoints** that expect raw binary data
|
|
2. **Modern JSON APIs** with different schema expectations
|
|
3. **Third-party providers** (like `fal-ai`) with custom requirements
|
|
4. **Performance optimizations** (raw bytes avoid JSON overhead for `hf-inference`)
|
|
|
|
This flexibility allows the provider to support diverse API contracts within a single implementation structure.
|
|
|
|
## Model Discovery & Caching
|
|
|
|
The provider implements sophisticated model discovery using the Hugging Face Hub API:
|
|
|
|
### List Models Flow
|
|
1. **Parallel Queries**: Fetches models from multiple inference providers concurrently
|
|
2. **Filter by Pipeline Tag**: Uses `pipeline_tag` (e.g., `text-to-speech`, `feature-extraction`) to determine supported methods
|
|
3. **Aggregate Results**: Combines responses from all providers into a unified list
|
|
4. **Model ID Format**: Returns models as `huggingface/{provider}/{model_id}`
|
|
|
|
### Provider Model Mapping Cache
|
|
The provider maintains a cache (`modelProviderMappingCache`) to map Hugging Face model IDs to provider-specific model identifiers:
|
|
|
|
```go
|
|
// Example: "meta-llama/Meta-Llama-3-8B-Instruct" -> provider mappings
|
|
{
|
|
"cerebras": {
|
|
"ProviderTask": "chat-completion",
|
|
"ProviderModelID": "llama3-8b-8192"
|
|
},
|
|
"groq": {
|
|
"ProviderTask": "chat-completion",
|
|
"ProviderModelID": "llama3-8b-instant"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Cache Invalidation**: On HTTP 404 errors, the cache is cleared and the mapping is re-fetched, then the request is retried with the updated model ID.
|
|
|
|
## Best Practices
|
|
|
|
When working with the Hugging Face provider:
|
|
|
|
1. **Check Payload Size**: Ensure request bodies are under 2 MB
|
|
2. **Audio Format**: Use MP3 for `fal-ai`, avoid WAV files
|
|
3. **Model Aliases**: Always specify provider in model string: `huggingface/{provider}/{model}`
|
|
4. **Error Handling**: Implement retries for 404 errors (cache invalidation scenarios)
|
|
5. **Provider Selection**: Use `auto` for automatic provider selection based on model capabilities
|
|
6. **Pipeline Tags**: Verify model's `pipeline_tag` matches your use case (chat, embedding, TTS, ASR)
|
|
|
|
## File Structure Reference
|
|
|
|
```
|
|
core/providers/huggingface/
|
|
├── huggingface.go # Main provider implementation, HTTP request handling
|
|
├── types.go # All provider-specific types (Request/Response DTOs)
|
|
├── utils.go # Helpers, constants, URL builders, model mapping
|
|
├── chat.go # Chat completion converters (Bifrost ↔ HF)
|
|
├── embedding.go # Embedding converters
|
|
├── speech.go # Text-to-speech converters
|
|
├── transcription.go # Speech-to-text converters
|
|
├── models.go # Model listing and capability detection
|
|
├── images.go # Image generation converters
|
|
├── errors.go # Error handling
|
|
└── huggingface_test.go # Comprehensive test suite
|
|
```
|
|
|
|
Each file follows strict separation of concerns as outlined in the [Adding a Provider](/contributing/adding-a-provider) guide.
|