bifrost/docs/providers/supported-providers/huggingface.mdx

---
title: "Hugging Face"
description: "Detailed guide on Hugging Face provider implementation specifics, including model aliases and unique request handling."
icon: "face-smiling-hands"
---

The Hugging Face provider in Bifrost (`core/providers/huggingface`) implements a complex integration that supports multiple inference providers (like `hf-inference`, `fal-ai`, `cerebras`, `sambanova`, etc.) through a unified interface.

## Overview

The Hugging Face provider implements custom logic for:
- **Multiple inference backends**: Routes requests to 19+ different inference providers
- **Dynamic model aliasing**: Transforms model IDs based on provider-specific mappings
- **Heterogeneous request formats**: Supports JSON, raw binary, and base64-encoded payloads
- **Provider-specific constraints**: Handles varying payload limits and format restrictions

## Supported Inference Providers

The Hugging Face provider supports routing to 20+ inference backends. Below is the current list of supported providers and their capabilities (as of December 2025):

| Provider | Chat | Embedding | Speech (TTS) | Transcription (ASR) | Image Generation | Image Generation (stream) | Image Edit | Image Edit (stream) |
|----------|------|-----------|--------------|---------------------|------------------|---------------------------|------------|---------------------|
| `hf-inference` | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ❌ |
| `cerebras` | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| `cohere` | ✅ | ❌  | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| `fal-ai` | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| `featherless-ai` | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| `fireworks` | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| `groq` | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| `hyperbolic` | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| `nebius` | ✅ | ✅ | ❌  | ❌ | ✅ | ❌ | ❌ | ❌ |
| `novita` | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| `nscale` | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| `ovhcloud-ai-endpoints` | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| `public-ai` | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| `replicate` | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| `sambanova` | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| `scaleway` | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| `together` | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ |
| `z-ai` | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |

<Note>Provider capabilities may change over time. For the most up-to-date information, refer to the [Hugging Face Inference Providers documentation](https://huggingface.co/docs/inference-providers/en/index#partners). Also checkmarks (✅) indicate capabilities supported by the inference provider itself.</Note>

<Info>All Chat-supported models automatically support Responses(`v1/responses`) as well via Bifrost's internal conversion logic.</Info>

## Model Aliases & Identification

Unlike standard providers where model IDs are direct strings (e.g., `gpt-4`), Hugging Face models in Bifrost are identified by a composite key to route requests to the correct inference backend.

**Format**: `huggingface/[inference_provider]/[model_id]`

- **inference_provider**: The backend service (e.g., `hf-inference`, `fal-ai`, `cerebras`).
- **model_id**: The actual model identifier on Hugging Face Hub (e.g., `meta-llama/Meta-Llama-3-8B-Instruct`).

**Example**: `huggingface/hf-inference/meta-llama/Meta-Llama-3-8B-Instruct`

This parsing logic is handled in `utils.go` and `models.go`, allowing Bifrost to dynamically route requests based on the model string.

## Request Handling Differences

The Hugging Face provider handles various tasks (Chat, Speech, Transcription) which often require different request structures depending on the underlying inference provider.

### Inference Provider Constraints

Different inference providers have specific limitations and requirements:

####  Payload Limit
HuggingFace API enforces a **2 MB request body limit** across all request types (Chat, Embedding, Speech, Transcription). This constraint applies to:
- JSON request payloads
- Raw audio bytes in transcription requests
- Any other request body data

**Impact**: Large audio files, extensive chat histories, or bulk embedding requests may need to be split or compressed before sending.

#### `fal-ai` Audio Format Restrictions
The `fal-ai` provider has strict audio format requirements:
- **Supported Format**: Only **MP3** (`audio/mpeg`) is accepted
- **Rejected Formats**: WAV (`audio/wav`) and other formats are explicitly rejected
- **Encoding**: Audio must be provided as a **base64-encoded Data URI** in the `audio_url` field

**Validation Logic** (from `core/providers/huggingface/transcription.go`):
```go
mimeType := getMimeTypeForAudioType(utils.DetectAudioMimeType(request.Input.File))
if mimeType == "audio/wav" {
    return nil, fmt.Errorf("fal-ai provider does not support audio/wav format; please use a different format like mp3 or ogg")
}
encoded = fmt.Sprintf("data:%s;base64,%s", mimeType, encoded)
```

### Speech (Text-to-Speech)

For Text-to-Speech (TTS) requests, the implementation differs from a standard pipeline request:

- **No Pipeline Tag**: The `HuggingFaceSpeechRequest` struct does not include a `pipeline_tag` field in the JSON body, even though the model might be tagged as `text-to-speech` on the Hub.
- **Structure**:
  ```go
  type HuggingFaceSpeechRequest struct {
      Text       string                       `json:"text"`
      Provider   string                       `json:"provider" validate:"required"`
      Model      string                       `json:"model" validate:"required"`
      Parameters *HuggingFaceSpeechParameters `json:"parameters,omitempty"`
  }
  ```
- **Implementation**: See `core/providers/huggingface/speech.go`.

### Transcription (Automatic Speech Recognition)

The Transcription implementation (`core/providers/huggingface/transcription.go`) exhibits a "pattern-breaking" behavior where the request format changes significantly based on the inference provider.

#### 1. `hf-inference` (Raw Bytes)
When using the standard `hf-inference` provider, the API expects the **raw audio bytes** directly in the request body, not a JSON object.

- **Content-Type**: Audio mime type (e.g., `audio/mpeg`).
- **Body**: Raw binary data from `request.Input.File`.
- **Payload Limit**: **Maximum 2 MB** for the raw audio bytes.
- **Logic**:
  ```go
  // core/providers/huggingface/huggingface.go
  if inferenceProvider == hfInference {
      jsonData = request.Input.File // Raw bytes (max 2 MB)
      isHFInferenceAudioRequest = true
  }
  ```
- **URL Pattern**: `/hf-inference/models/{model_name}` (no `/pipeline/` suffix for ASR).

#### 2. `fal-ai` (JSON with Base64 Data URI)
When using `fal-ai` through HuggingFace provider, the API expects a **JSON body** containing the audio as a **base64-encoded Data URI**.


- **Content-Type**: `application/json`.
- **Body**: JSON object with `audio_url` field.
- **Audio Format Restriction**: **Only MP3** (`audio/mpeg`) is supported. WAV files are rejected.
- **Encoding**: Audio is base64-encoded and prefixed with a Data URI scheme.
- **Logic**:
  ```go
  // core/providers/huggingface/transcription.go
  encoded = base64.StdEncoding.EncodeToString(request.Input.File)
  mimeType := getMimeTypeForAudioType(utils.DetectAudioMimeType(request.Input.File))
  if mimeType == "audio/wav" {
      return nil, fmt.Errorf("fal-ai provider does not support audio/wav format; please use a different format like mp3 or ogg")
  }
  encoded = fmt.Sprintf("data:%s;base64,%s", mimeType, encoded)
  hfRequest = &HuggingFaceTranscriptionRequest{
      AudioURL: encoded,
  }
  ```

#### Dual Fields in `types.go`
To support these divergent requirements, the `HuggingFaceTranscriptionRequest` struct in `types.go` contains fields for both scenarios, which are used mutually exclusively:

```go
type HuggingFaceTranscriptionRequest struct {
    Inputs     []byte  `json:"inputs,omitempty"`    // For standard JSON providers (NOT hf-inference raw body)
    AudioURL   string  `json:"audio_url,omitempty"` // For fal-ai (base64 Data URI, MP3 only)
    Provider   *string `json:"provider,omitempty"`
    Model      *string `json:"model,omitempty"`
    Parameters *HuggingFaceTranscriptionRequestParameters `json:"parameters,omitempty"`
}
```

**Key Points**:
- `Inputs`: Used when JSON body is sent with raw bytes (most providers except `hf-inference` and `fal-ai`).
- `AudioURL`: Used exclusively for `fal-ai`, must be a base64-encoded Data URI with MP3 format.
- **Note**: For `hf-inference`, the entire request body is raw audio bytes—no JSON structure is used at all.

## Image Generation

The Hugging Face provider supports image generation through multiple inference providers, each with different request formats and capabilities.

### Supported Inference Providers

| Provider | Non-Streaming | Streaming | Notes |
|----------|--------------|-----------|-------|
| `hf-inference` | ✅ | ❌ | Simple prompt-only format, returns raw image bytes |
| `fal-ai` | ✅ | ✅ | Full parameter support, supports streaming via Server-Sent Events |
| `nebius` | ✅ | ❌ | Uses Nebius-specific format with width/height, LoRAs support |
| `together` | ✅ | ❌ | OpenAI-compatible format |

### Request Conversion

The provider automatically routes to the appropriate inference provider based on the model string format: `huggingface/{provider}/{model_id}`.

#### 1. `hf-inference`

The simplest format, only requires a prompt:

- **Request Structure**:
  ```go
  type HuggingFaceHFInferenceImageGenerationRequest struct {
      Inputs string `json:"inputs"` // The prompt text
  }
  ```
- **Response**: Raw image bytes (PNG/JPEG), automatically base64-encoded in Bifrost response
- **Limitations**: No size, quality, or other parameter support

#### 2. `fal-ai`

The most feature-rich provider with extensive parameter support:

- **Request Structure**:
  ```go
  type HuggingFaceFalAIImageGenerationRequest struct {
      Prompt                string                `json:"prompt"`
      NumImages             *int                  `json:"num_images,omitempty"`        // Maps from params.n
      ResponseFormat        *string               `json:"response_format,omitempty"`   // "url" or "b64_json"
      ImageSize             *HuggingFaceFalAISize `json:"image_size,omitempty"`        // {width, height} from size
      NegativePrompt        *string               `json:"negative_prompt,omitempty"`
      GuidanceScale         *float64              `json:"guidance_scale,omitempty"`    // From extra_params
      NumInferenceSteps     *int                  `json:"num_inference_steps,omitempty"`
      Seed                  *int                  `json:"seed,omitempty"`
      OutputFormat          *string               `json:"output_format,omitempty"`    // "png", "jpeg", "webp" (jpg→jpeg)
      SyncMode              *bool                 `json:"sync_mode,omitempty"`        // Auto-set if response_format="b64_json"
      EnableSafetyChecker   *bool                 `json:"enable_safety_checker,omitempty"` // Auto-set if moderation="low"
      Acceleration          *string               `json:"acceleration,omitempty"`      // From extra_params
      EnablePromptExpansion *bool                 `json:"enable_prompt_expansion,omitempty"` // From extra_params
  }
  ```
- **Parameter Mappings**:
  - `n` → `num_images`
  - `size` (e.g., `"1024x1024"`) → `image_size: {width: 1024, height: 1024}`
  - `output_format: "jpg"` → `output_format: "jpeg"` (normalized)
  - `response_format: "b64_json"` → `sync_mode: true`
  - `moderation: "low"` → `enable_safety_checker: false`
- **Response**: JSON with `images[]` array containing `url` and/or `b64_json` fields
- **Extra Parameters**: Supports `guidance_scale`, `acceleration`, `enable_prompt_expansion`, `enable_safety_checker` via `extra_params`

#### 3. `nebius`

Uses Nebius-specific format with support for LoRAs:

- **Request Structure**: Uses `NebiusImageGenerationRequest` (see Nebius provider docs)
- **Parameter Mappings**:
  - `size` (e.g., `"1024x1024"`) → `width` and `height` integers
  - `output_format` → `response_extension` (normalized: "jpeg" → "jpg")
  - `seed`, `negative_prompt` → Passed directly
  - `extra_params.num_inference_steps` → `num_inference_steps`
  - `extra_params.guidance_scale` → `guidance_scale`
  - `extra_params.loras` → `loras[]` array (supports both map and array formats)
- **Response**: Uses Nebius response format, converted to Bifrost format

#### 4. `together`

OpenAI-compatible format:

- **Request Structure**:
  ```go
  type HuggingFaceTogetherImageGenerationRequest struct {
      Prompt         string  `json:"prompt"`
      Model          string  `json:"model"`
      ResponseFormat *string `json:"response_format,omitempty"`
      Size           *string `json:"size,omitempty"`  // Passed directly
      N              *int    `json:"n,omitempty"`
      Steps          *int    `json:"steps,omitempty"`  // From num_inference_steps
  }
  ```
- **Parameter Mappings**:
  - `response_format: "b64_json"` → `response_format: "base64"`
  - `num_inference_steps` → `steps`
- **Response**: OpenAI-compatible format with `data[]` array

### Response Conversion

Each provider's response is converted to Bifrost's unified `BifrostImageGenerationResponse` format:

- **hf-inference**: Raw bytes → base64-encoded in `b64_json`
- **fal-ai**: `images[]` array → `ImageData[]` with `url` and/or `b64_json`
- **nebius**: Uses Nebius converter → Bifrost format
- **together**: `data[]` array → `ImageData[]` with `b64_json` and/or `url`

### Image Generation Streaming

**Only `fal-ai` supports streaming** for HuggingFace image generation. Streaming uses Server-Sent Events (SSE) format.

#### Streaming Request Format

```go
type HuggingFaceFalAIImageStreamRequest struct {
    Prompt                string                `json:"prompt"`
    ResponseFormat        *string               `json:"response_format,omitempty"`
    NumImages             *int                  `json:"num_images,omitempty"`
    ImageSize             *HuggingFaceFalAISize `json:"image_size,omitempty"`
    // ... same parameters as non-streaming
}
```

#### Streaming Response Format

- **Event Type**: Server-Sent Events with `data:` prefix
- **Chunk Format**: Each SSE event contains JSON with `images[]` array
- **Stream Processing**:
  - Each image in `images[]` becomes a separate stream chunk
  - Chunks have `type: "partial"` until stream completion
  - Final chunk has `type: "completed"` with the last image data
  - Images can be delivered as `url` (public URL) or `b64_json` (base64-encoded)
- **URL Pattern**: `/fal-ai/{model_id}/stream` (appended to base URL)

#### Streaming Behavior

- **Chunk Indexing**: Each chunk has an `Index` field (0, 1, 2, ...) and `ChunkIndex` for ordering
- **Completion**: Final chunk includes all image data from the last SSE event
- **Error Handling**: Errors in SSE format are parsed and sent as `BifrostError` chunks


### Example Usage

<Tabs>
<Tab title="Gateway - fal-ai">

```bash
curl -X POST http://localhost:8080/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "huggingface/fal-ai/fal-ai/flux/dev",
    "prompt": "A futuristic cityscape at sunset",
    "size": "1024x1024",
    "n": 2,
    "output_format": "png",
    "response_format": "url"
  }'
```

</Tab>
<Tab title="Gateway - Streaming (fal-ai only)">

```bash
curl -X POST http://localhost:8080/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "huggingface/fal-ai/fal-ai/flux/dev",
    "prompt": "A futuristic cityscape at sunset",
    "size": "1024x1024",
    "stream": true
  }'
```

</Tab>
<Tab title="Go SDK">

```go
resp, err := client.ImageGenerationRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostImageGenerationRequest{
    Provider: schemas.HuggingFace,
    Model:    "huggingface/fal-ai/fal-ai/flux/dev",
    Input: &schemas.ImageGenerationInput{
        Prompt: "A futuristic cityscape at sunset",
    },
    Params: &schemas.ImageGenerationParameters{
        Size:         schemas.Ptr("1024x1024"),
        N:            schemas.Ptr(2),
        OutputFormat: schemas.Ptr("png"),
        ResponseFormat: schemas.Ptr("url"),
        Seed:         schemas.Ptr(42),
        NegativePrompt: schemas.Ptr("blurry, low quality"),
        NumInferenceSteps: schemas.Ptr(50),
        ExtraParams: map[string]interface{}{
            "guidance_scale": 7.5,
            "acceleration": "t4",
            "enable_prompt_expansion": true,
        },
    },
})
```

</Tab>
<Tab title="Go SDK - Streaming">

```go
streamChan, err := client.ImageGenerationStreamRequest(schemas.NewBifrostContext(ctx, schemas.NoDeadline), &schemas.BifrostImageGenerationRequest{
    Provider: schemas.HuggingFace,
    Model:    "huggingface/fal-ai/fal-ai/flux/dev",
    Input: &schemas.ImageGenerationInput{
        Prompt: "A futuristic cityscape at sunset",
    },
    Params: &schemas.ImageGenerationParameters{
        Size:    schemas.Ptr("1024x1024"),
        N:       schemas.Ptr(2),
    },
})

for stream := range streamChan {
    if stream.BifrostImageGenerationStreamResponse != nil {
        chunk := stream.BifrostImageGenerationStreamResponse
        if chunk.URL != "" {
            // Handle image URL
        } else if chunk.B64JSON != "" {
            // Handle base64 image data
        }
    }
}
```

</Tab>
</Tabs>

### Provider-Specific Notes

- **fal-ai**:
  - When `response_format="b64_json"`, `sync_mode` is automatically set to `true`
  - When `moderation="low"`, `enable_safety_checker` is set to `false`
  - `output_format: "jpg"` is normalized to `"jpeg"`
- **nebius**:
  - `response_extension: "jpeg"` is normalized to `"jpg"` (Nebius inconsistency)
  - LoRAs can be provided as `{"url": scale}` map or `[{"url": "...", "scale": ...}]` array
- **hf-inference**:
  - Minimal format, only prompt supported
  - Returns raw image bytes (automatically base64-encoded)
- **together**:
  - OpenAI-compatible format
  - `response_format: "b64_json"` is converted to `"base64"`

## Image Edit

<Warning>
Requests use **multipart/form-data**, not JSON.
</Warning>

**Only `fal-ai` supports image editing** for HuggingFace. Image edit requests are routed to fal-ai inference provider.

**Request Parameters**

| Parameter | Type | Required | Notes |
|-----------|------|----------|-------|
| `model` | string | ✅ | Model identifier (must be `huggingface/fal-ai/{model_id}`) |
| `prompt` | string | ✅ | Text description of the edit |
| `image[]` | binary | ✅ | Image file(s) to edit (supports multiple images for some models) |
| `n` | int | ❌ | Number of images to generate (1-10) |
| `size` | string | ❌ | Image size: `"WxH"` format (e.g., `"1024x1024"`) |
| `output_format` | string | ❌ | Output format: `"png"`, `"webp"`, `"jpeg"` (note: `"jpg"` is normalized to `"jpeg"`) |
| `seed` | int | ❌ | Seed for reproducibility (via `ExtraParams["seed"]`) |
| `num_inference_steps` | int | ❌ | Number of inference steps (via `ExtraParams["num_inference_steps"]`) |
| `guidance_scale` | float | ❌ | Guidance scale (via `ExtraParams["guidance_scale"]`) |
| `acceleration` | string | ❌ | Acceleration mode (via `ExtraParams["acceleration"]`) |
| `enable_safety_checker` | bool | ❌ | Enable safety checker (via `ExtraParams["enable_safety_checker"]`) |
| `use_image_urls` | bool | ❌ | Override image field selection (via `ExtraParams["use_image_urls"]`) |

---

**Request Conversion**

- **Model Validation**: Only `fal-ai` inference provider supports image edit. Other providers return `UnsupportedOperationError`.
- **Image Conversion**: Each image in `bifrostReq.Input.Images` is converted to a base64 data URL:
  - Format: `data:{mimeType};base64,{base64Data}`
  - MIME type detection: `image/jpeg`, `image/webp`, `image/png` (via `http.DetectContentType`)
- **Image Field Selection**: The provider uses different image fields based on model capabilities:
  - **Multi-image models** (e.g., `fal-ai/flux-2/edit`, `fal-ai/flux-2-pro/edit`): Uses `image_urls` array field
  - **Single-image models** (e.g., `fal-ai/flux-pro/kontext`, `fal-ai/flux/dev/image-to-image`): Uses `image_url` string field
  - **Override**: `ExtraParams["use_image_urls"]` can override the automatic selection
  - **Fallback**: For unknown models, uses `image_url` if single image, `image_urls` if multiple images
- **Parameter Mapping**:
  - `prompt` → `Prompt`
  - `n` → `NumImages`
  - `size` → `ImageSize` (converted from `"WxH"` string to `{Width, Height}` object)
  - `output_format` → `OutputFormat` (`"jpg"` normalized to `"jpeg"`)
  - `seed` (via `ExtraParams["seed"]`) → `Seed`
  - `num_inference_steps` (via `ExtraParams["num_inference_steps"]`) → `NumInferenceSteps`
  - `guidance_scale` (via `ExtraParams["guidance_scale"]`) → `GuidanceScale`
  - `acceleration` (via `ExtraParams["acceleration"]`) → `Acceleration`
  - `enable_safety_checker` (via `ExtraParams["enable_safety_checker"]`) → `EnableSafetyChecker`

**Response Conversion**

- **Non-streaming**: Uses the same response conversion as image generation (see Image Generation section)
- **Streaming**: fal-ai streaming responses use Server-Sent Events (SSE) format:
  - **Event Type**: Server-Sent Events with `data:` prefix
  - **Chunk Format**: Each SSE event contains JSON with `images[]` array (or `data.images[]` in API envelope format)
  - **Stream Processing**:
    - Each image in `images[]` becomes a separate stream chunk
    - Chunks have `type: "image_edit.partial_image"` until stream completion
    - Final chunk has `type: "image_edit.completed"` with the last image data
    - Images can be delivered as `url` (public URL) or `b64_json` (base64-encoded)
  - **Response Structure**: Handles both API envelope format (`Data.Images`) and legacy flattened format (`Images`)
  - **URL Pattern**: `/fal-ai/{model_id}/stream` (appended to base URL)

**Endpoint**: `/fal-ai/{model_id}` (non-streaming), `/fal-ai/{model_id}/stream` (streaming)

**Image Variation**

Image variation is not supported by HuggingFace.

## Raw JSON Body Handling

While most providers strictly serialize a struct to JSON, the Hugging Face provider's `Transcription` method demonstrates a hybrid approach depending on the inference provider:

### Embedding Requests

For embedding requests, different providers expect different field names:

- **Standard providers** (most): Use `input` field
- **`hf-inference`**: Uses `inputs` field (plural)

**Request Structure**:
```go
type HuggingFaceEmbeddingRequest struct {
    Input    interface{} `json:"input,omitempty"`    // Used by all providers except hf-inference
    Inputs   interface{} `json:"inputs,omitempty"`   // Used by hf-inference
    Provider *string     `json:"provider,omitempty"` // Identifies the inference backend
    Model    *string     `json:"model,omitempty"`
    // ... other fields
}
```

The converter in `embedding.go` populates both fields to ensure compatibility across providers.

### Differences in Inference Provider Constraints

This multi-mode approach allows the provider to support diverse API contracts within a single implementation structure, accommodating:
1. **Legacy endpoints** that expect raw binary data
2. **Modern JSON APIs** with different schema expectations
3. **Third-party providers** (like `fal-ai`) with custom requirements
4. **Performance optimizations** (raw bytes avoid JSON overhead for `hf-inference`)

This flexibility allows the provider to support diverse API contracts within a single implementation structure.

## Model Discovery & Caching

The provider implements sophisticated model discovery using the Hugging Face Hub API:

### List Models Flow
1. **Parallel Queries**: Fetches models from multiple inference providers concurrently
2. **Filter by Pipeline Tag**: Uses `pipeline_tag` (e.g., `text-to-speech`, `feature-extraction`) to determine supported methods
3. **Aggregate Results**: Combines responses from all providers into a unified list
4. **Model ID Format**: Returns models as `huggingface/{provider}/{model_id}`

### Provider Model Mapping Cache
The provider maintains a cache (`modelProviderMappingCache`) to map Hugging Face model IDs to provider-specific model identifiers:

```go
// Example: "meta-llama/Meta-Llama-3-8B-Instruct" -> provider mappings
{
    "cerebras": {
        "ProviderTask": "chat-completion",
        "ProviderModelID": "llama3-8b-8192"
    },
    "groq": {
        "ProviderTask": "chat-completion",
        "ProviderModelID": "llama3-8b-instant"
    }
}
```

**Cache Invalidation**: On HTTP 404 errors, the cache is cleared and the mapping is re-fetched, then the request is retried with the updated model ID.

## Best Practices

When working with the Hugging Face provider:

1. **Check Payload Size**: Ensure request bodies are under 2 MB
2. **Audio Format**: Use MP3 for `fal-ai`, avoid WAV files
3. **Model Aliases**: Always specify provider in model string: `huggingface/{provider}/{model}`
4. **Error Handling**: Implement retries for 404 errors (cache invalidation scenarios)
5. **Provider Selection**: Use `auto` for automatic provider selection based on model capabilities
6. **Pipeline Tags**: Verify model's `pipeline_tag` matches your use case (chat, embedding, TTS, ASR)

## File Structure Reference

```
core/providers/huggingface/
├── huggingface.go       # Main provider implementation, HTTP request handling
├── types.go             # All provider-specific types (Request/Response DTOs)
├── utils.go             # Helpers, constants, URL builders, model mapping
├── chat.go              # Chat completion converters (Bifrost ↔ HF)
├── embedding.go         # Embedding converters
├── speech.go            # Text-to-speech converters
├── transcription.go     # Speech-to-text converters
├── models.go            # Model listing and capability detection
├── images.go            # Image generation converters
├── errors.go            # Error handling
└── huggingface_test.go  # Comprehensive test suite
```

Each file follows strict separation of concerns as outlined in the [Adding a Provider](/contributing/adding-a-provider) guide.