first commit

2026-04-26 21:52:23 +03:00
commit 880f412e2c
2662 changed files with 866266 additions and 0 deletions
--- a/docs/architecture/README.mdx
+++ b/docs/architecture/README.mdx
--- a/docs/architecture/core/concurrency.mdx
+++ b/docs/architecture/core/concurrency.mdx
@@ -0,0 +1,764 @@
+---
+title: "Concurrency"
+description: "Deep dive into Bifrost's advanced concurrency architecture - worker pools, goroutine management, channel-based communication, and resource isolation patterns."
+icon: "traffic-light"
+---
+
+## Concurrency Philosophy
+
+### **Core Principles**
+
+| Principle                          | Implementation                         | Benefit                                |
+| ---------------------------------- | -------------------------------------- | -------------------------------------- |
+| **Provider Isolation**          | Independent worker pools per provider  | Fault tolerance, no cascade failures   |
+| **Channel-Based Communication** | Go channels for all async operations   | Type-safe, deadlock-free communication |
+| **Resource Pooling**            | Object pools with lifecycle management | Predictable memory usage, minimal GC   |
+| **Non-Blocking Operations**     | Async processing throughout pipeline   | Maximum concurrency, no blocking waits |
+| **Backpressure Handling**       | Configurable buffers and flow control  | Graceful degradation under load        |
+
+### **Threading Architecture Overview**
+
+```mermaid
+graph TB
+    subgraph "Main Thread"
+        Main[Main Process<br/>HTTP Server]
+        Router[Request Router<br/>Goroutine]
+        PluginMgr[Plugin Manager<br/>Goroutine]
+    end
+
+    subgraph "Provider Worker Pools"
+        subgraph "OpenAI Pool"
+            OAI1[Worker 1<br/>Goroutine]
+            OAI2[Worker 2<br/>Goroutine]
+            OAIN[Worker N<br/>Goroutine]
+        end
+        subgraph "Anthropic Pool"
+            ANT1[Worker 1<br/>Goroutine]
+            ANT2[Worker 2<br/>Goroutine]
+            ANTN[Worker N<br/>Goroutine]
+        end
+        subgraph "Bedrock Pool"
+            BED1[Worker 1<br/>Goroutine]
+            BED2[Worker 2<br/>Goroutine]
+            BEDN[Worker N<br/>Goroutine]
+        end
+    end
+
+    subgraph "Memory Pools"
+        ChannelPool[Channel Pool<br/>sync.Pool]
+        MessagePool[Message Pool<br/>sync.Pool]
+        ResponsePool[Response Pool<br/>sync.Pool]
+    end
+
+    Main --> Router
+    Router --> PluginMgr
+    PluginMgr --> OAI1
+    PluginMgr --> ANT1
+    PluginMgr --> BED1
+
+    OAI1 --> ChannelPool
+    ANT1 --> MessagePool
+    BED1 --> ResponsePool
+```
+
+---
+
+## Worker Pool Architecture
+
+### **Provider-Isolated Worker Pools**
+
+```mermaid
+stateDiagram-v2
+    [*] --> PoolInit: Worker Pool Creation
+    PoolInit --> WorkerSpawn: Spawn Worker Goroutines
+    WorkerSpawn --> Listening: Workers Listen on Channels
+
+    Listening --> Processing: Job Received
+    Processing --> API_Call: Provider API Request
+    API_Call --> Response: Process Response
+    Response --> Listening: Job Complete
+
+    Listening --> Shutdown: Graceful Shutdown
+    Processing --> Shutdown: Complete Current Job
+    Shutdown --> [*]: Pool Destroyed
+```
+
+**Worker Pool Architecture:**
+
+The worker pool system maintains a sophisticated balance between resource efficiency and performance isolation:
+
+**Key Components:**
+
+- **Worker Pool Management** - Pre-spawned workers reduce startup latency
+- **Job Queue System** - Buffered channels provide smooth load balancing
+- **Resource Pools** - HTTP clients and API keys are pooled for efficiency
+- **Health Monitoring** - Circuit breakers detect and isolate failing providers
+- **Graceful Shutdown** - Workers complete current jobs before terminating
+
+**Startup Process:**
+
+1. **Worker Pre-spawning** - Workers are created during pool initialization
+2. **Channel Setup** - Job queues and worker channels are established
+3. **Resource Allocation** - HTTP clients and API keys are distributed
+4. **Health Checks** - Initial connectivity tests verify provider availability
+5. **Ready State** - Pool becomes available for request processing
+
+**Job Dispatch Logic:**
+
+- **Round-Robin Assignment** - Jobs are distributed evenly across available workers
+- **Load Balancing** - Worker availability determines job assignment
+- **Overflow Handling** - Excess jobs are queued or dropped based on configuration
+
+### **Worker Lifecycle Management**
+
+```mermaid
+sequenceDiagram
+    participant Pool
+    participant Worker
+    participant HTTPClient
+    participant Provider
+    participant Metrics
+
+    Pool->>Worker: Start()
+    Worker->>Worker: Initialize HTTP Client
+    Worker->>Pool: Ready Signal
+
+    loop Job Processing
+        Pool->>Worker: Job Assignment
+        Worker->>HTTPClient: Prepare Request
+        HTTPClient->>Provider: API Call
+        Provider-->>HTTPClient: Response
+        HTTPClient-->>Worker: Parsed Response
+        Worker->>Metrics: Record Performance
+        Worker->>Pool: Job Complete
+    end
+
+    Pool->>Worker: Shutdown Signal
+    Worker->>Worker: Complete Current Job
+    Worker-->>Pool: Shutdown Confirmed
+````
+
+---
+
+## Channel-Based Communication
+
+### **Channel Architecture**
+
+```mermaid
+graph TB
+    subgraph "Channel Types"
+        JobQueue[Job Queue<br/>Buffered Channel]
+        WorkerPool[Worker Pool<br/>Buffered Channel]
+        ResultChan[Result Channel<br/>Buffered Channel]
+        QuitChan[Quit Channel<br/>Unbuffered]
+    end
+
+    subgraph "Flow Control"
+        BackPressure[Backpressure<br/>Buffer Limits]
+        Timeout[Timeout<br/>Context Cancellation]
+        Graceful[Graceful Shutdown<br/>Channel Closing]
+    end
+
+    JobQueue --> BackPressure
+    WorkerPool --> Timeout
+    ResultChan --> Graceful
+```
+
+**Channel Configuration Principles:**
+
+Bifrost's channel system balances throughput and memory usage through careful buffer sizing:
+
+**Job Queuing Configuration:**
+
+- **Job Queue Buffer** - Sized based on expected burst traffic (100-1000 jobs)
+- **Worker Pool Size** - Matches provider concurrency limits (10-100 workers)
+- **Result Buffer** - Accommodates response processing delays (50-500 responses)
+
+**Flow Control Parameters:**
+
+- **Queue Wait Limits** - Maximum time jobs wait before timeout (1-10 seconds)
+- **Processing Timeouts** - Per-job execution limits (30-300 seconds)
+- **Shutdown Timeouts** - Graceful termination periods (5-30 seconds)
+
+**Backpressure Policies:**
+
+- **Drop Policy** - Discard excess jobs when queues are full
+- **Block Policy** - Wait for queue space with timeout
+- **Error Policy** - Immediately return error for full queues
+
+**Channel Type Selection:**
+
+- **Buffered Channels** - Used for async job processing and result handling
+- **Unbuffered Channels** - Used for synchronization signals (quit, done)
+- **Context Cancellation** - Used for timeout and cancellation propagation
+
+### **Backpressure and Flow Control**
+
+```mermaid
+flowchart TD
+    Request[Incoming Request] --> QueueCheck{Queue Full?}
+    QueueCheck -->|No| Queue[Add to Queue]
+    QueueCheck -->|Yes| Policy{Drop Policy?}
+
+    Policy -->|Drop| Drop[Drop Request<br/>Return Error]
+    Policy -->|Block| Block[Block Until Space<br/>With Timeout]
+    Policy -->|Error| Error[Return Queue Full Error]
+
+    Queue --> Worker[Assign to Worker]
+    Block --> TimeoutCheck{Timeout?}
+    TimeoutCheck -->|Yes| Error
+    TimeoutCheck -->|No| Queue
+
+    Worker --> Processing[Process Request]
+    Processing --> Complete[Complete]
+
+    Drop --> Client[Client Response]
+    Error --> Client
+    Complete --> Client
+````
+
+**Backpressure Implementation Strategy:**
+
+The backpressure system protects Bifrost from being overwhelmed while maintaining service availability:
+
+**Non-Blocking Job Submission:**
+
+- **Immediate Queue Check** - Jobs are submitted without blocking on queue space
+- **Success Path** - Available queue space allows immediate job acceptance
+- **Overflow Detection** - Full queues trigger backpressure policies
+- **Metrics Collection** - All queue operations are tracked for monitoring
+
+**Backpressure Policy Execution:**
+
+- **Drop Policy** - Immediately rejects excess jobs with meaningful error messages
+- **Block Policy** - Waits for queue space with configurable timeout limits
+- **Error Policy** - Returns queue full errors for immediate client feedback
+- **Metrics Tracking** - Dropped, blocked, and successful submissions are measured
+
+**Timeout Management:**
+
+- **Context-Based Timeouts** - All blocking operations respect timeout boundaries
+- **Graceful Degradation** - Timeouts result in controlled error responses
+- **Resource Protection** - Prevents goroutine leaks from infinite waits
+
+```go
+  case pool.jobQueue <- job:
+  pool.metrics.IncQueuedJobs()
+  return nil
+  case <-ctx.Done():
+  pool.metrics.IncTimeoutJobs()
+  return errors.New("queue full, timeout waiting")
+  }
+
+          case "error":
+              pool.metrics.IncRejectedJobs()
+              return errors.New("queue full, job rejected")
+
+          default:
+              return errors.New("unknown queue policy")
+          }
+      }
+  }
+```
+
+---
+
+## Memory Pool Concurrency
+
+### **Thread-Safe Object Pools**
+
+```mermaid
+graph TD
+    subgraph "sync.Pool Lifecycle"
+        direction LR
+        GetObject[Get Object<br/>sync.Pool.Get]
+        PoolCheck{Is Pool Empty?}
+        NewObject[New Object<br/>Factory Function]
+        UseObject[Use Object<br/>Application Logic]
+        ResetObject[Reset Object<br/>Clear State]
+        ReturnObject[Return Object<br/>sync.Pool.Put]
+
+        GetObject --> PoolCheck
+        PoolCheck -- Yes --> NewObject
+        PoolCheck -- No --> UseObject
+        NewObject --> UseObject
+        UseObject --> ResetObject
+        ResetObject --> ReturnObject
+        ReturnObject --> GetObject
+    end
+
+    subgraph "GC Interaction"
+        direction TB
+        GCRun[GC Runs]
+        PoolCleanup[Pool Cleanup<br>Removes idle objects]
+        
+        GCRun --> PoolCleanup
+    end
+```
+
+**Thread-Safe Pool Architecture:**
+
+Bifrost's memory pool system ensures thread-safe object reuse across multiple goroutines:
+
+**Pool Structure Design:**
+
+- **Multiple Pool Types** - Separate pools for channels, messages, responses, and buffers
+- **Factory Functions** - Dynamic object creation when pools are empty
+- **Statistics Tracking** - Comprehensive metrics for pool performance monitoring
+- **Thread Safety** - Synchronized access using Go's sync.Pool and read-write mutexes
+
+**Object Lifecycle Management:**
+
+- **Pool Initialization** - Factory functions define object creation patterns
+- **Unique Identification** - Each pooled object gets a unique ID for tracking
+- **Timestamp Tracking** - Creation, acquisition, and return times are recorded
+- **Reusability Flags** - Objects can be marked as non-reusable for single-use scenarios
+
+**Acquisition Strategy:**
+
+- **Request Tracking** - All pool requests are counted for monitoring
+- **Hit/Miss Tracking** - Pool effectiveness is measured through hit ratios
+- **Fallback Creation** - New objects are created when pools are empty
+- **Performance Metrics** - Acquisition times and patterns are monitored
+
+**Return and Reset Process:**
+
+- **State Validation** - Only reusable objects are returned to pools
+- **Object Reset** - All object state is cleared before returning to pool
+- **Return Tracking** - Return operations are counted and timed
+- **Pool Replenishment** - Returned objects become available for reuse
+
+### **Pool Performance Monitoring**
+
+Comprehensive metrics provide insights into pool efficiency and system health:
+
+**Usage Statistics Collection:**
+- **Request Counting** - Track total pool requests by object type
+- **Creation Tracking** - Monitor new object allocations when pools are empty
+- **Hit/Miss Ratios** - Measure pool effectiveness through reuse rates
+- **Return Monitoring** - Track successful object returns to pools
+
+**Performance Metrics Analysis:**
+- **Acquisition Times** - Measure how long it takes to get objects from pools
+- **Reset Performance** - Track time spent cleaning objects for reuse
+- **Hit Ratio Calculation** - Determine percentage of requests served from pools
+- **Memory Efficiency** - Calculate memory savings from object reuse
+
+**Key Performance Indicators:**
+- **Channel Pool Hit Ratio** - Typically 85-95% in steady state
+- **Message Pool Efficiency** - Usually 80-90% reuse rate
+- **Response Pool Utilization** - Often 70-85% hit ratio
+- **Total Memory Savings** - Measured reduction in garbage collection pressure
+
+**Monitoring Integration:**
+- **Thread-Safe Access** - All metrics collection is synchronized
+- **Real-Time Updates** - Statistics are updated with each pool operation
+- **Export Capability** - Metrics are available in JSON format for monitoring systems
+- **Alerting Support** - Low hit ratios can trigger performance alerts
+
+---
+
+## Goroutine Management
+
+### **Goroutine Lifecycle Patterns**
+
+```mermaid
+stateDiagram-v2
+    [*] --> Created: go routine()
+    Created --> Running: Execute Function
+    Running --> Waiting: Channel/Mutex Block
+    Waiting --> Running: Unblocked
+    Running --> Syscall: Network I/O
+    Syscall --> Running: I/O Complete
+    Running --> GCAssist: GC Triggered
+    GCAssist --> Running: GC Complete
+    Running --> Terminated: Function Exit
+    Terminated --> [*]: Cleanup
+```
+
+**Goroutine Pool Management Strategy:**
+
+Bifrost's goroutine management ensures optimal resource usage while preventing goroutine leaks:
+
+**Pool Configuration Management:**
+
+- **Goroutine Limits** - Maximum concurrent goroutines prevent resource exhaustion
+- **Active Counting** - Atomic counters track currently running goroutines
+- **Idle Timeouts** - Unused goroutines are cleaned up after configured periods
+- **Resource Boundaries** - Hard limits prevent runaway goroutine creation
+
+**Lifecycle Orchestration:**
+
+- **Spawn Channels** - New goroutine creation is tracked through channels
+- **Completion Monitoring** - Finished goroutines signal completion for cleanup
+- **Shutdown Coordination** - Graceful shutdown ensures all goroutines complete properly
+- **Health Monitoring** - Continuous monitoring tracks goroutine health and performance
+
+**Worker Creation Process:**
+
+- **Limit Enforcement** - Creation fails when maximum goroutine count is reached
+- **Unique Identification** - Each goroutine gets a unique ID for tracking and debugging
+- **Lifecycle Tracking** - Start times and names enable performance analysis
+- **Atomic Operations** - Thread-safe counters prevent race conditions
+
+**Panic Recovery and Error Handling:**
+
+- **Panic Isolation** - Goroutine panics don't crash the entire system
+- **Error Logging** - Panic details are logged with goroutine context
+- **Metrics Updates** - Panic counts are tracked for monitoring and alerting
+- **Resource Cleanup** - Failed goroutines are properly cleaned up and counted
+
+**Health Monitoring System:**
+
+- **Periodic Health Checks** - Regular intervals check goroutine pool health
+- **Completion Tracking** - Finished goroutines are recorded for performance analysis
+- **Shutdown Handling** - Clean shutdown process ensures no goroutine leaks
+
+### **Resource Leak Prevention**
+
+```mermaid
+flowchart TD
+    GoroutineStart[Goroutine Start] --> ResourceCheck[Resource Allocation Check]
+    ResourceCheck --> Timeout[Set Timeout Context]
+    Timeout --> Work[Execute Work]
+
+    Work --> Complete{Work Complete?}
+    Complete -->|Yes| Cleanup[Cleanup Resources]
+    Complete -->|No| TimeoutCheck{Timeout?}
+
+    TimeoutCheck -->|Yes| ForceCleanup[Force Cleanup]
+    TimeoutCheck -->|No| Work
+
+    Cleanup --> Return[Return Resources to Pool]
+    ForceCleanup --> Return
+    Return --> End[Goroutine End]
+````
+
+**Resource Leak Prevention:**
+
+```go
+func (worker *Worker) ExecuteWithCleanup(job *Job) {
+    // Set timeout context
+    ctx, cancel := context.WithTimeout(
+        context.Background(),
+        worker.config.ProcessTimeout,
+    )
+    defer cancel()
+
+    // Acquire resources with timeout
+    resources, err := worker.acquireResources(ctx)
+    if err != nil {
+        job.resultChan <- &Result{Error: err}
+        return
+    }
+
+    // Ensure cleanup happens
+    defer func() {
+        // Always return resources
+        worker.returnResources(resources)
+
+        // Handle panics
+        if r := recover(); r != nil {
+            worker.metrics.IncPanics()
+            job.resultChan <- &Result{
+                Error: fmt.Errorf("worker panic: %v", r),
+            }
+        }
+    }()
+
+    // Execute job with context
+    result := worker.processJob(ctx, job, resources)
+
+    // Return result
+    select {
+    case job.resultChan <- result:
+        // Success
+    case <-ctx.Done():
+        // Timeout - result channel might be closed
+        worker.metrics.IncTimeouts()
+    }
+}
+```
+
+---
+
+## Concurrency Optimization Strategies
+
+### **Load-Based Worker Scaling** (Planned)
+
+```mermaid
+graph TB
+    subgraph "Load Monitoring"
+        QueueDepth[Queue Depth<br/>Monitoring]
+        ResponseTime[Response Time<br/>Tracking]
+        WorkerUtil[Worker Utilization<br/>Metrics]
+    end
+
+    subgraph "Scaling Decisions"
+        ScaleUp{Scale Up?<br/>Load > 80%}
+        ScaleDown{Scale Down?<br/>Load < 30%}
+        Maintain[Maintain<br/>Current Size]
+    end
+
+    subgraph "Actions"
+        AddWorkers[Spawn Additional<br/>Workers]
+        RemoveWorkers[Graceful Worker<br/>Shutdown]
+        NoAction[No Action<br/>Monitor Continue]
+    end
+
+    QueueDepth --> ScaleUp
+    ResponseTime --> ScaleUp
+    WorkerUtil --> ScaleDown
+
+    ScaleUp -->|Yes| AddWorkers
+    ScaleUp -->|No| ScaleDown
+    ScaleDown -->|Yes| RemoveWorkers
+    ScaleDown -->|No| Maintain
+
+    Maintain --> NoAction
+```
+
+**Adaptive Scaling Implementation:**
+
+```go
+type AdaptiveScaler struct {
+    pool           *ProviderWorkerPool
+    config         ScalingConfig
+    metrics        *ScalingMetrics
+    lastScaleTime  time.Time
+    scalingMutex   sync.Mutex
+}
+
+func (scaler *AdaptiveScaler) EvaluateScaling() {
+    scaler.scalingMutex.Lock()
+    defer scaler.scalingMutex.Unlock()
+
+    // Prevent frequent scaling
+    if time.Since(scaler.lastScaleTime) < scaler.config.MinScaleInterval {
+        return
+    }
+
+    current := scaler.getCurrentMetrics()
+
+    // Scale up conditions
+    if current.QueueUtilization > scaler.config.ScaleUpThreshold ||
+       current.AvgResponseTime > scaler.config.MaxResponseTime {
+
+        scaler.scaleUp(current)
+        return
+    }
+
+    // Scale down conditions
+    if current.QueueUtilization < scaler.config.ScaleDownThreshold &&
+       current.AvgResponseTime < scaler.config.TargetResponseTime {
+
+        scaler.scaleDown(current)
+        return
+    }
+}
+
+func (scaler *AdaptiveScaler) scaleUp(metrics *CurrentMetrics) {
+    currentWorkers := scaler.pool.GetWorkerCount()
+    targetWorkers := int(float64(currentWorkers) * scaler.config.ScaleUpFactor)
+
+    // Respect maximum limits
+    if targetWorkers > scaler.config.MaxWorkers {
+        targetWorkers = scaler.config.MaxWorkers
+    }
+
+    additionalWorkers := targetWorkers - currentWorkers
+    if additionalWorkers > 0 {
+        scaler.pool.AddWorkers(additionalWorkers)
+        scaler.lastScaleTime = time.Now()
+        scaler.metrics.RecordScaleUp(additionalWorkers)
+    }
+}
+```
+
+### **Provider-Specific Optimization**
+
+```go
+type ProviderOptimization struct {
+    // Provider characteristics
+    ProviderName     string        `json:"provider_name"`
+    RateLimit        int           `json:"rate_limit"`        // Requests per second
+    AvgLatency       time.Duration `json:"avg_latency"`       // Average response time
+    ErrorRate        float64       `json:"error_rate"`        // Historical error rate
+
+    // Optimal configuration
+    OptimalWorkers   int           `json:"optimal_workers"`
+    OptimalBuffer    int           `json:"optimal_buffer"`
+    TimeoutConfig    time.Duration `json:"timeout_config"`
+    RetryStrategy    RetryConfig   `json:"retry_strategy"`
+}
+
+func CalculateOptimalConcurrency(provider ProviderOptimization) ConcurrencyConfig {
+    // Calculate based on rate limits and latency
+    optimalWorkers := provider.RateLimit * int(provider.AvgLatency.Seconds())
+
+    // Adjust for error rate (more workers for higher error rate)
+    errorAdjustment := 1.0 + provider.ErrorRate
+    optimalWorkers = int(float64(optimalWorkers) * errorAdjustment)
+
+    // Buffer should be 2-3x worker count for smooth operation
+    optimalBuffer := optimalWorkers * 3
+
+    return ConcurrencyConfig{
+        Concurrency: optimalWorkers,
+        BufferSize:  optimalBuffer,
+        Timeout:     provider.AvgLatency * 2, // 2x avg latency for timeout
+    }
+}
+```
+
+---
+
+## Concurrency Monitoring & Metrics
+
+### **Key Concurrency Metrics**
+
+```mermaid
+graph TB
+    subgraph "Worker Metrics"
+        ActiveWorkers[Active Workers<br/>Current Count]
+        IdleWorkers[Idle Workers<br/>Available Count]
+        BusyWorkers[Busy Workers<br/>Processing Count]
+    end
+
+    subgraph "Queue Metrics"
+        QueueDepth[Queue Depth<br/>Pending Jobs]
+        QueueThroughput[Queue Throughput<br/>Jobs/Second]
+        QueueWaitTime[Queue Wait Time<br/>Average Delay]
+    end
+
+    subgraph "Performance Metrics"
+        GoroutineCount[Goroutine Count<br/>Total Active]
+        MemoryUsage[Memory Usage<br/>Pool Utilization]
+        GCPressure[GC Pressure<br/>Collection Frequency]
+    end
+
+    subgraph "Health Metrics"
+        ErrorRate[Error Rate<br/>Failed Jobs %]
+        PanicCount[Panic Count<br/>Crashed Goroutines]
+        DeadlockDetection[Deadlock Detection<br/>Blocked Operations]
+    end
+```
+
+**Metrics Collection Strategy:**
+
+Comprehensive concurrency monitoring provides operational insights and performance optimization data:
+
+**Worker Pool Monitoring:**
+
+- **Total Worker Tracking** - Monitor configured vs actual worker counts
+- **Active Worker Monitoring** - Track workers currently processing requests
+- **Idle Worker Analysis** - Identify unused capacity and optimization opportunities
+- **Queue Depth Monitoring** - Track pending job backlog and processing delays
+
+**Performance Data Collection:**
+
+- **Throughput Metrics** - Measure jobs processed per second across all pools
+- **Wait Time Analysis** - Track how long jobs wait in queues before processing
+- **Memory Pool Performance** - Monitor hit/miss ratios for memory pool effectiveness
+- **Goroutine Count Tracking** - Ensure goroutine counts remain within healthy limits
+
+**Health and Reliability Metrics:**
+
+- **Panic Recovery Tracking** - Count and analyze worker panic occurrences
+- **Timeout Monitoring** - Track jobs that exceed processing time limits
+- **Circuit Breaker Events** - Monitor provider isolation events and recoveries
+- **Error Rate Analysis** - Track failure patterns for capacity planning
+
+**Real-Time Updates:**
+
+- **Live Metric Updates** - Worker metrics are updated continuously during operation
+- **Processing Event Recording** - Each job completion updates relevant metrics
+- **Performance Correlation** - Queue times and processing times are correlated for analysis
+- **Success/Failure Tracking** - All job outcomes are recorded for reliability analysis
+
+---
+
+## Deadlock Prevention & Detection
+
+### **Deadlock Prevention Strategies**
+
+```mermaid
+flowchart TD
+    Strategy1[Lock Ordering<br/>Consistent Acquisition]
+    Strategy2[Timeout-Based Locks<br/>Context Cancellation]
+    Strategy3[Channel Select<br/>Non-blocking Operations]
+    Strategy4[Resource Hierarchy<br/>Layered Locking]
+
+    Prevention[Deadlock Prevention<br/>Design Patterns]
+
+    Prevention --> Strategy1
+    Prevention --> Strategy2
+    Prevention --> Strategy3
+    Prevention --> Strategy4
+
+    Strategy1 --> Success[No Deadlocks<br/>Guaranteed Order]
+    Strategy2 --> Success
+    Strategy3 --> Success
+    Strategy4 --> Success
+````
+
+**Deadlock Prevention Implementation Strategy:**
+
+Bifrost employs multiple complementary strategies to prevent deadlocks in concurrent operations:
+
+**Lock Ordering Management:**
+
+- **Consistent Acquisition Order** - All locks are acquired in a predetermined order
+- **Global Lock Registry** - Centralized registry maintains lock ordering relationships
+- **Order Enforcement** - Lock acquisition automatically sorts by predetermined order
+- **Dependency Tracking** - Lock dependencies are mapped to prevent circular waits
+
+**Timeout-Based Protection:**
+
+- **Default Timeouts** - All lock acquisitions have reasonable timeout limits
+- **Context Cancellation** - Operations respect context cancellation for cleanup
+- **Maximum Timeout Limits** - Upper bounds prevent indefinite blocking
+- **Graceful Timeout Handling** - Timeout errors provide meaningful context
+
+**Multi-Lock Acquisition Process:**
+
+- **Ordered Sorting** - Multiple locks are sorted before acquisition attempts
+- **Progressive Acquisition** - Locks are acquired one by one in sorted order
+- **Failure Recovery** - Failed acquisitions trigger automatic cleanup of held locks
+- **Resource Tracking** - All acquired locks are tracked for proper release
+
+**Lock Acquisition Safety:**
+
+- **Non-Blocking Detection** - Channel-based lock attempts prevent indefinite blocking
+- **Timeout Enforcement** - All lock attempts respect configured timeout limits
+- **Error Propagation** - Lock failures are properly propagated with context
+- **Cleanup Guarantees** - Failed operations always clean up partially acquired resources
+
+**Deadlock Detection and Recovery:**
+
+- **Active Monitoring** - Continuous monitoring for potential deadlock conditions
+- **Automatic Recovery** - Detected deadlocks trigger automatic resolution procedures
+- **Resource Release** - Deadlock resolution involves strategic resource release
+- **Prevention Learning** - Deadlock patterns inform prevention strategy improvements
+
+---
+
+## Related Architecture Documentation
+
+- **[Request Flow](./request-flow)** - How concurrency fits in request processing
+- **[Benchmarks](../../benchmarking/getting-started)** - Concurrency performance characteristics
+- **[Plugin System](./plugins)** - Plugin concurrency considerations
+- **[MCP System](./mcp)** - MCP concurrency and worker integration
+
+## Usage Documentation
+
+- **[Provider Configuration](../../quickstart/gateway/provider-configuration)** - Configure concurrency settings per provider
+- **[Performance Analysis](../../benchmarking/getting-started)** - Memory pool configuration and optimization
+- **[Performance Monitoring](../../features/telemetry)** - Monitor concurrency metrics and health
+- **[Go SDK Usage](../../quickstart/go-sdk/setting-up)** - Use Bifrost concurrency in Go applications
+- **[Gateway Setup](../../quickstart/gateway/setting-up)** - Deploy Bifrost with optimal concurrency settings
+
+---
+
+**🎯 Next Step:** Understand how plugins integrate with the concurrency model in **[Plugin System](./plugins)**.
+```
--- a/docs/architecture/core/mcp.mdx
+++ b/docs/architecture/core/mcp.mdx
@@ -0,0 +1,985 @@
+---
+title: "Model Context Protocol (MCP)"
+description: "Deep dive into Bifrost's Model Context Protocol (MCP) integration - how external tool discovery, execution, and integration work internally."
+icon: "toolbox"
+---
+
+## MCP Architecture Overview
+
+### **What is MCP in Bifrost?**
+
+The Model Context Protocol (MCP) system in Bifrost enables AI models to seamlessly discover and execute external tools, transforming static chat models into dynamic, action-capable agents. This architecture bridges the gap between AI reasoning and real-world tool execution.
+
+**Core MCP Principles:**
+
+- **Dynamic Discovery** - Tools are discovered at runtime, not hardcoded
+- **Client-Side Execution** - Bifrost controls all tool execution for security
+- **Multi-Protocol Support** - STDIO, HTTP, and SSE connection types
+- **Request-Level Filtering** - Granular control over tool availability
+- **Async Execution** - Non-blocking tool invocation and response handling
+
+### **MCP System Components**
+
+```mermaid
+graph TB
+    subgraph "MCP Management Layer"
+        MCPMgr[MCP Manager<br/>Central Controller]
+        ClientRegistry[Client Registry<br/>Connection Management]
+        ToolDiscovery[Tool Discovery<br/>Runtime Registration]
+    end
+
+    subgraph "MCP Execution Layer"
+        ToolFilter[Tool Filter<br/>Access Control]
+        ToolExecutor[Tool Executor<br/>Invocation Engine]
+        ResultProcessor[Result Processor<br/>Response Handling]
+    end
+
+    subgraph "Connection Types"
+        STDIOConn[STDIO Connections<br/>Command-line Tools]
+        HTTPConn[HTTP Connections<br/>Web Services]
+        SSEConn[SSE Connections<br/>Real-time Streams]
+    end
+
+    subgraph "External MCP Servers"
+        FileSystem[Filesystem Tools<br/>File Operations]
+        WebSearch[Web Search<br/>Information Retrieval]
+        Database[Database Tools<br/>Data Access]
+        Custom[Custom Tools<br/>Business Logic]
+    end
+
+    MCPMgr --> ClientRegistry
+    ClientRegistry --> ToolDiscovery
+    ToolDiscovery --> ToolFilter
+    ToolFilter --> ToolExecutor
+    ToolExecutor --> ResultProcessor
+
+    ClientRegistry --> STDIOConn
+    ClientRegistry --> HTTPConn
+    ClientRegistry --> SSEConn
+
+    STDIOConn --> FileSystem
+    HTTPConn --> WebSearch
+    HTTPConn --> Database
+    STDIOConn --> Custom
+```
+
+---
+
+## MCP Connection Architecture
+
+### **Multi-Protocol Connection System**
+
+Bifrost supports four MCP connection types, each optimized for different tool deployment patterns:
+
+```mermaid
+graph TB
+    subgraph "InProcess Connections"
+        InProcess[In-Memory Tools<br/>Same Process]
+        InProcessEx[Examples:<br/>• Embedded tools<br/>• High-perf operations<br/>• Testing tools]
+    end
+
+    subgraph "STDIO Connections"
+        STDIO[Command Line Tools<br/>Local Execution]
+        STDIOEx[Examples:<br/>• Filesystem tools<br/>• Local scripts<br/>• CLI utilities]
+    end
+
+    subgraph "HTTP Connections"
+        HTTP[Web Service Tools<br/>Remote APIs]
+        HTTPEx[Examples:<br/>• Web search APIs<br/>• Database services<br/>• External integrations]
+    end
+
+    subgraph "SSE Connections"
+        SSE[Real-time Tools<br/>Streaming Data]
+        SSEEx[Examples:<br/>• Live data feeds<br/>• Real-time monitoring<br/>• Event streams]
+    end
+
+    subgraph "Connection Characteristics"
+        Latency[Latency:<br/>InProcess < STDIO < HTTP < SSE]
+        Security[Security:<br/>InProcess/Local > HTTP > SSE]
+        Scalability[Scalability:<br/>HTTP > SSE > STDIO > InProcess]
+        Complexity[Complexity:<br/>InProcess < STDIO < HTTP < SSE]
+    end
+
+    InProcess --> Latency
+    STDIO --> Latency
+    HTTP --> Security
+    SSE --> Scalability
+    HTTP --> Complexity
+```
+
+### **Connection Type Details**
+
+**InProcess Connections (In-Memory Tools):**
+
+- **Use Case:** Embedded tools, high-performance operations, testing
+- **Performance:** Lowest possible latency (~0.1ms) with no IPC overhead
+- **Security:** Highest security as tools run in the same process
+- **Limitations:** Go package only, cannot be configured via JSON
+
+**STDIO Connections (Local Tools):**
+
+- **Use Case:** Command-line tools, local scripts, filesystem operations
+- **Performance:** Low latency (~1-10ms) due to local execution
+- **Security:** High security with full local control
+- **Limitations:** Single-server deployment, resource sharing
+
+**HTTP Connections (Remote Services):**
+
+- **Use Case:** Web APIs, microservices, cloud functions
+- **Performance:** Network-dependent latency (~10-500ms)
+- **Security:** Configurable with authentication and encryption
+- **Advantages:** Scalable, multi-server deployment, service isolation
+
+**SSE Connections (Streaming Tools):**
+
+- **Use Case:** Real-time data feeds, live monitoring, event streams
+- **Performance:** Variable latency depending on stream frequency
+- **Security:** Similar to HTTP with streaming capabilities
+- **Benefits:** Real-time updates, persistent connections, event-driven
+
+> **MCP Configuration:** [MCP Setup Guide →](../../mcp/overview)
+
+---
+
+## Tool Discovery & Registration
+
+### **Dynamic Tool Discovery Process**
+
+The MCP system discovers tools at runtime rather than requiring static configuration, enabling flexible and adaptive tool availability:
+
+```mermaid
+sequenceDiagram
+    participant Bifrost
+    participant MCPManager
+    participant MCPServer
+    participant ToolRegistry
+    participant AIModel
+
+    Note over Bifrost: System Startup
+    Bifrost->>MCPManager: Initialize MCP System
+    MCPManager->>MCPServer: Establish Connection
+    MCPServer-->>MCPManager: Connection Ready
+
+    MCPManager->>MCPServer: List Available Tools
+    MCPServer-->>MCPManager: Tool Definitions
+    MCPManager->>ToolRegistry: Register Tools
+
+    Note over Bifrost: Runtime Request Processing
+    AIModel->>MCPManager: Request Available Tools
+    MCPManager->>ToolRegistry: Query Tools
+    ToolRegistry-->>MCPManager: Filtered Tool List
+    MCPManager-->>AIModel: Available Tools
+
+    AIModel->>MCPManager: Execute Tool Call
+    MCPManager->>MCPServer: Tool Invocation
+    MCPServer->>MCPServer: Execute Tool Logic
+    MCPServer-->>MCPManager: Tool Result
+    MCPManager-->>AIModel: Enhanced Response
+```
+
+### **Tool Registry Management**
+
+**Registration Process:**
+
+1. **Connection Establishment** - MCP client connects to configured servers
+2. **Capability Exchange** - Server announces available tools and schemas
+3. **Tool Validation** - Bifrost validates tool definitions and security
+4. **Registry Update** - Tools are registered in the internal tool registry
+5. **Availability Notification** - Tools become available for AI model use
+
+**Registry Features:**
+
+- **Dynamic Updates** - Tools can be added/removed during runtime
+- **Version Management** - Support for tool versioning and compatibility
+- **Access Control** - Request-level tool filtering and permissions
+- **Health Monitoring** - Continuous tool availability checking
+
+**Tool Metadata Structure:**
+
+- **Name & Description** - Human-readable tool identification
+- **Parameters Schema** - JSON schema for tool input validation
+- **Return Schema** - Expected response format definition
+- **Capabilities** - Tool feature flags and limitations
+- **Authentication** - Required credentials and permissions
+
+---
+
+## Tool Filtering & Access Control
+
+### **Multi-Level Filtering System**
+
+Bifrost provides granular control over tool availability through a sophisticated filtering system:
+
+```mermaid
+flowchart TD
+    Request[Incoming Request] --> GlobalFilter{Global MCP Filter}
+    GlobalFilter -->|Enabled| ClientFilter[MCP Client Filtering]
+    GlobalFilter -->|Disabled| NoMCP[No MCP Tools]
+
+    ClientFilter --> IncludeClients{Include Clients?}
+    IncludeClients -->|Yes| IncludeList[Include Specified<br/>MCP Clients]
+    IncludeClients -->|No| AllClients[All MCP Clients]
+
+    IncludeList --> ExcludeClients{Exclude Clients?}
+    AllClients --> ExcludeClients
+    ExcludeClients -->|Yes| RemoveClients[Remove Excluded<br/>MCP Clients]
+    ExcludeClients -->|No| ClientsFiltered[Filtered Clients]
+
+    RemoveClients --> ToolFilter[Tool-Level Filtering]
+    ClientsFiltered --> ToolFilter
+
+    ToolFilter --> IncludeTools{Include Tools?}
+    IncludeTools -->|Yes| IncludeSpecific[Include Specified<br/>Tools Only]
+    IncludeTools -->|No| AllTools[All Available Tools]
+
+    IncludeSpecific --> ExcludeTools{Exclude Tools?}
+    AllTools --> ExcludeTools
+    ExcludeTools -->|Yes| RemoveTools[Remove Excluded<br/>Tools]
+    ExcludeTools -->|No| FinalTools[Final Tool Set]
+
+    RemoveTools --> FinalTools
+    FinalTools --> AIModel[Available to AI Model]
+    NoMCP --> AIModel
+```
+
+### **Filtering Configuration Levels**
+
+**Request-Level Filtering:**
+
+```bash
+# Include only specific MCP clients
+curl -X POST http://localhost:8080/v1/chat/completions \
+  -H "x-bf-mcp-include-clients: filesystem,websearch" \
+  -d '{"model": "gpt-4o-mini", "messages": [...]}'
+
+# Include only specific tools
+curl -X POST http://localhost:8080/v1/chat/completions \
+  -H "x-bf-mcp-include-tools: filesystem-read_file,websearch-search" \
+  -d '{"model": "gpt-4o-mini", "messages": [...]}'
+```
+
+**Configuration-Level Filtering:**
+
+- **Client Selection** - Choose which MCP servers to connect to
+- **Tool Blacklisting** - Permanently disable dangerous or unwanted tools
+- **Permission Mapping** - Map user roles to available tool sets
+- **Environment-Based** - Different tool sets for development vs production
+
+**Security Benefits:**
+
+- **Principle of Least Privilege** - Only necessary tools are exposed
+- **Dynamic Access Control** - Per-request tool availability
+- **Audit Trail** - Track which tools are used by which requests
+- **Risk Mitigation** - Prevent access to dangerous operations
+
+> **📖 Tool Filtering:** [MCP Tool Control →](../../mcp/filtering)
+
+---
+
+## Tool Execution Engine
+
+### **Async Tool Execution Architecture**
+
+The MCP execution engine handles tool invocation asynchronously to maintain system responsiveness and enable complex multi-tool workflows:
+
+```mermaid
+sequenceDiagram
+    participant AIModel
+    participant ExecutionEngine
+    participant ToolInvoker
+    participant MCPServer
+    participant ResultProcessor
+
+    AIModel->>ExecutionEngine: Tool Call Request
+    ExecutionEngine->>ExecutionEngine: Validate Tool Call
+    ExecutionEngine->>ToolInvoker: Queue Tool Execution
+
+    Note over ToolInvoker: Async Tool Execution
+    ToolInvoker->>MCPServer: Invoke Tool
+    MCPServer->>MCPServer: Execute Tool Logic
+    MCPServer-->>ToolInvoker: Raw Tool Result
+
+    ToolInvoker->>ResultProcessor: Process Result
+    ResultProcessor->>ResultProcessor: Format & Validate
+    ResultProcessor-->>ExecutionEngine: Processed Result
+
+    ExecutionEngine-->>AIModel: Tool Execution Complete
+
+    Note over AIModel: Multi-turn Conversation
+    AIModel->>ExecutionEngine: Continue with Tool Results
+    ExecutionEngine->>ExecutionEngine: Merge Results into Context
+    ExecutionEngine-->>AIModel: Enhanced Response
+```
+
+### **Execution Flow Characteristics**
+
+**Validation Phase:**
+
+- **Parameter Validation** - Ensure tool arguments match expected schema
+- **Permission Checking** - Verify tool access permissions for the request
+- **Rate Limiting** - Apply per-tool and per-user rate limits
+- **Security Scanning** - Check for potentially dangerous operations
+
+**Execution Phase:**
+
+- **Timeout Management** - Bounded execution time to prevent hanging
+- **Error Handling** - Graceful handling of tool failures and timeouts
+- **Result Streaming** - Support for tools that return streaming responses
+- **Resource Monitoring** - Track tool resource usage and performance
+
+**Response Phase:**
+
+- **Result Formatting** - Convert tool outputs to consistent format
+- **Error Enrichment** - Add context and suggestions for tool failures
+- **Multi-Result Aggregation** - Combine multiple tool outputs coherently
+- **Context Integration** - Merge tool results into conversation context
+
+### **Multi-Turn Conversation Support**
+
+The MCP system enables sophisticated multi-turn conversations where AI models can:
+
+1. **Initial Tool Discovery** - Request available tools for a given context
+2. **Tool Execution** - Execute one or more tools based on user request
+3. **Result Analysis** - Analyze tool outputs and determine next steps
+4. **Follow-up Actions** - Execute additional tools based on previous results
+5. **Response Synthesis** - Combine tool results into coherent user response
+
+**Example Multi-Turn Flow:**
+
+```
+User: "Find recent news about AI and save interesting articles"
+AI: → Execute web_search("AI news recent")
+AI: → Analyze search results
+AI: → Execute save_article() for each interesting result
+AI: → Respond with summary of saved articles
+```
+
+### **Complete User-Controlled Tool Execution Flow**
+
+The following diagram shows the end-to-end user experience with MCP tool execution, highlighting the critical user control points and decision-making process:
+
+```mermaid
+flowchart TD
+    A["👤 User Message<br/>\"List files in current directory\""] --> B["🤖 Bifrost Core"]
+
+    B --> C["🔧 MCP Manager<br/>Auto-discovers and adds<br/>available tools to request"]
+
+    C --> D["🌐 LLM Provider<br/>(OpenAI, Anthropic, etc.)"]
+
+    D --> E{"🔍 Response contains<br/>tool_calls?"}
+
+    E -->|No| F["✅ Final Response<br/>Display to user"]
+
+    E -->|Yes| G["📝 Add assistant message<br/>with tool_calls to history"]
+
+    G --> H["🛡️ YOUR EXECUTION LOGIC<br/>(Security, Approval, Logging)"]
+
+    H --> I{"🤔 User Decision Point<br/>Execute this tool?"}
+
+    I -->|Deny| J["❌ Create denial result<br/>Add to conversation history"]
+
+    I -->|Approve| K["⚙️ client.ExecuteMCPTool()<br/>Bifrost executes via MCP"]
+
+    K --> L["📊 Tool Result<br/>Add to conversation history"]
+
+    J --> M["🔄 Continue conversation loop<br/>Send updated history back to LLM"]
+    L --> M
+
+    M --> D
+
+    style A fill:#e1f5fe
+    style F fill:#e8f5e8
+    style H fill:#fff3e0
+    style I fill:#fce4ec
+    style K fill:#f3e5f5
+```
+
+**Key Flow Characteristics:**
+
+**User Control Points:**
+
+- **Security Layer** - Your application controls all tool execution decisions
+- **Approval Gate** - Users can approve or deny each tool execution
+- **Transparency** - Full visibility into what tools will be executed and why
+- **Conversation Continuity** - Tool results seamlessly integrate into conversation flow
+
+**Security Benefits:**
+
+- **No Automatic Execution** - Tools never execute without explicit approval
+- **Audit Trail** - Complete logging of all tool execution decisions
+- **Contextual Security** - Approval decisions can consider full conversation context
+- **Graceful Denials** - Denied tools result in informative responses, not errors
+
+**Implementation Patterns:**
+
+```go
+// Example tool execution control in your application
+func handleToolExecution(toolCall schemas.ChatToolCall, userContext UserContext) error {
+    // YOUR SECURITY AND APPROVAL LOGIC HERE
+    if !userContext.HasPermission(toolCall.Function.Name) {
+        return createDenialResponse("Tool not permitted for user role")
+    }
+
+    if requiresApproval(toolCall) {
+        approved := promptUserForApproval(toolCall)
+        if !approved {
+            return createDenialResponse("User denied tool execution")
+        }
+    }
+
+    // Execute the tool via Bifrost
+    result, err := client.ExecuteMCPTool(ctx, toolCall)
+    if err != nil {
+        return handleToolError(err)
+    }
+
+    return addToolResultToHistory(result)
+}
+```
+
+This flow ensures that while AI models can discover and request tool usage, all actual execution remains under user control, providing the perfect balance of AI capability and human oversight.
+
+---
+
+## Agent Mode Architecture
+
+Agent Mode transforms Bifrost into an autonomous agent runtime by automatically executing pre-approved tools. This section details the internal architecture of the agent execution loop.
+
+### **Agent Execution Loop**
+
+The agent mode operates as an iterative loop that continues until one of the termination conditions is met:
+
+```mermaid
+flowchart TD
+    subgraph "Agent Mode Entry"
+        A["📥 Incoming Chat Request"] --> B{"🔍 Check MCP Config<br/>Any tools_to_auto_execute?"}
+        B -->|No| C["📤 Standard Flow<br/>Return tool_calls for manual execution"]
+        B -->|Yes| D["🤖 Enter Agent Loop"]
+    end
+
+    subgraph "Agent Execution Loop"
+        D --> E["🌐 Send to LLM Provider<br/>With available tools"]
+        E --> F{"🔧 Response has<br/>tool_calls?"}
+        F -->|No| G["✅ Return Final Response<br/>No more tools needed"]
+        F -->|Yes| H["📋 Classify Tool Calls"]
+
+        H --> I{"🔐 Separate by<br/>auto-execute status"}
+        I --> J["⚡ Auto-Executable Tools"]
+        I --> K["🛡️ Non-Auto-Executable Tools"]
+
+        J --> L["🔄 Execute in Parallel<br/>Via ToolsManager"]
+        L --> M["📊 Collect Results"]
+
+        K --> N{"Any non-auto<br/>tools found?"}
+        N -->|Yes| O["🛑 Exit Loop Early<br/>Return mixed response"]
+        N -->|No| P{"⏱️ Max depth<br/>reached?"}
+
+        M --> P
+        P -->|Yes| Q["⚠️ Return Current State<br/>May have pending tools"]
+        P -->|No| R["📝 Add results to history"]
+        R --> E
+    end
+
+    subgraph "Response Handling"
+        O --> S["📦 Create Mixed Response<br/>• Content: executed results JSON<br/>• tool_calls: pending tools<br/>• finish_reason: stop"]
+        G --> T["📦 Standard Response<br/>Final answer from LLM"]
+        Q --> U["📦 Depth Limit Response<br/>Current state with any pending"]
+    end
+
+    style D fill:#e3f2fd
+    style L fill:#e8f5e9
+    style O fill:#fff3e0
+    style S fill:#fce4ec
+```
+
+### **Tool Classification System**
+
+When the LLM returns tool calls, Bifrost classifies each tool based on the client configuration:
+
+```mermaid
+flowchart LR
+    subgraph "Tool Call Classification"
+        TC["🔧 Tool Call<br/>from LLM Response"] --> CHECK{"Tool in<br/>tools_to_execute?"}
+        CHECK -->|No| SKIP["❌ Skip<br/>Not allowed"]
+        CHECK -->|Yes| AUTO{"Tool in<br/>tools_to_auto_execute?"}
+        AUTO -->|Yes| EXEC["⚡ Auto-Execute<br/>Run immediately"]
+        AUTO -->|No| MANUAL["🛡️ Manual<br/>Return to caller"]
+    end
+
+    subgraph "Configuration Example"
+        CONFIG["MCPClientConfig"]
+        CONFIG --> TE["tools_to_execute: [*]<br/>All tools available"]
+        CONFIG --> TAE["tools_to_auto_execute:<br/>[read_file, list_dir]"]
+    end
+
+    style EXEC fill:#c8e6c9
+    style MANUAL fill:#fff9c4
+    style SKIP fill:#ffcdd2
+```
+
+### **Mixed Tool Response Format**
+
+When a response contains both auto-executable and non-auto-executable tools, the agent creates a special response format:
+
+<AccordionGroup>
+  <Accordion title="Chat API Response Format" icon="message" defaultOpen>
+
+```json
+{
+  "id": "chatcmpl-abc123",
+  "choices": [{
+    "index": 0,
+    "finish_reason": "stop",
+    "message": {
+      "role": "assistant",
+      "content": "The Output from allowed tools calls is - {\"filesystem_read_file\":\"file contents here\",\"filesystem_list_directory\":\"[\\\"file1.txt\\\",\\\"file2.txt\\\"]\"}\n\nNow I shall call these tools next...",
+      "tool_calls": [
+        {
+          "id": "call_write_123",
+          "type": "function",
+          "function": {
+            "name": "filesystem_write_file",
+            "arguments": "{\"path\":\"output.txt\",\"content\":\"...\"}"
+          }
+        }
+      ]
+    }
+  }]
+}
+```
+
+<Note>
+The `content` field contains JSON-formatted results from auto-executed tools. The `tool_calls` array contains only non-auto-executable tools awaiting approval. Setting `finish_reason` to `"stop"` ensures the agent loop exits.
+</Note>
+
+  </Accordion>
+
+  <Accordion title="Responses API Format" icon="code">
+
+```json
+{
+  "id": "resp-abc123",
+  "output": [
+    {
+      "type": "message",
+      "role": "assistant",
+      "content": [{
+        "type": "text",
+        "text": "The Output from allowed tools calls is - {...}\n\nNow I shall call these tools next..."
+      }]
+    },
+    {
+      "type": "function_call",
+      "role": "assistant",
+      "call_id": "call_write_123",
+      "name": "filesystem_write_file",
+      "arguments": "{\"path\":\"output.txt\",\"content\":\"...\"}"
+    }
+  ]
+}
+```
+
+  </Accordion>
+</AccordionGroup>
+
+### **Agent Depth Control**
+
+The `max_agent_depth` setting prevents infinite loops and controls resource usage:
+
+```mermaid
+graph LR
+    subgraph "Depth Tracking"
+        D0["Depth 0<br/>Initial Request"] --> D1["Depth 1<br/>First tool execution"]
+        D1 --> D2["Depth 2<br/>Second iteration"]
+        D2 --> D3["Depth 3<br/>..."]
+        D3 --> DN["Depth N<br/>Max reached"]
+    end
+
+    DN --> EXIT["🛑 Force Exit<br/>Return current state"]
+
+    subgraph "Configuration"
+        CFG["MCPToolManagerConfig"]
+        CFG --> MAX["max_agent_depth: 10<br/>(default)"]
+        CFG --> TIMEOUT["tool_execution_timeout:<br/>30s per tool"]
+    end
+```
+
+<Warning>
+When max depth is reached, the response may contain pending tool calls that weren't executed. Your application should handle this gracefully.
+</Warning>
+
+---
+
+## Code Mode Architecture
+
+Code Mode enables AI models to write and execute Python code (Starlark) that orchestrates multiple MCP tools in a single request. This provides a powerful meta-layer for complex multi-tool workflows.
+
+### **Code Mode System Overview**
+
+```mermaid
+graph TB
+    subgraph "Code Mode Components"
+        VM["🖥️ Starlark Interpreter<br/>Python-like Runtime"]
+        VFS["📁 Virtual File System<br/>Tool Definitions as .pyi"]
+        EXEC["⚙️ Code Executor<br/>Sandboxed Execution"]
+    end
+
+    subgraph "Meta Tools"
+        LIST["listToolFiles()<br/>Discover available servers"]
+        READ["readToolFile(fileName)<br/>Get tool signatures"]
+        DOCS["getToolDocs(server, tool)<br/>Get detailed docs"]
+        CODE["executeToolCode(code)<br/>Run Python code"]
+    end
+
+    subgraph "MCP Integration"
+        TOOLS["🔧 Connected MCP Tools"]
+        RESULTS["📊 Tool Results"]
+    end
+
+    LLM["🤖 LLM"] --> LIST
+    LIST --> VFS
+    VFS --> LLM
+    LLM --> READ
+    READ --> VFS
+    VFS --> LLM
+    LLM --> DOCS
+    DOCS --> VFS
+    VFS --> LLM
+    LLM --> CODE
+    CODE --> VM
+    VM --> EXEC
+    EXEC --> TOOLS
+    TOOLS --> RESULTS
+    RESULTS --> LLM
+
+    style VM fill:#e8eaf6
+    style VFS fill:#e3f2fd
+    style CODE fill:#e8f5e9
+```
+
+### **Virtual File System (VFS)**
+
+Code Mode generates Python stub files (`.pyi`) for all connected MCP tools, providing compact function signatures:
+
+<Tabs>
+  <Tab title="Server-Level Binding">
+
+When `code_mode_binding_level: "server"` (default), tools are grouped by MCP client:
+
+```
+servers/
+├── filesystem.pyi      → All filesystem tools
+├── web_search.pyi      → All web search tools
+└── database.pyi        → All database tools
+```
+
+**Generated Stub Example:**
+```python
+# servers/filesystem.pyi
+# Usage: filesystem.tool_name(param=value)
+# For detailed docs: use getToolDocs(server="filesystem", tool="tool_name")
+
+def read_file(path: str) -> dict:  # Read contents of a file
+def write_file(path: str, content: str) -> dict:  # Write content to a file
+def list_directory(path: str) -> dict:  # List directory contents
+```
+
+**Usage in Code:**
+```python
+files = filesystem.list_directory(path=".")
+content = filesystem.read_file(path=files["entries"][0])
+result = content
+```
+
+  </Tab>
+  <Tab title="Tool-Level Binding">
+
+When `code_mode_binding_level: "tool"`, each tool gets its own file:
+
+```
+servers/
+├── filesystem/
+│   ├── read_file.pyi
+│   ├── write_file.pyi
+│   └── list_directory.pyi
+├── web_search/
+│   └── search.pyi
+└── database/
+    └── query.pyi
+```
+
+**Generated Stub Example:**
+```python
+# servers/filesystem/read_file.pyi
+# Usage: filesystem.read_file(param=value)
+
+def read_file(path: str) -> dict:  # Read contents of a file
+```
+
+**Usage in Code:**
+```python
+content = filesystem.read_file(path="config.json")
+result = content
+```
+
+  </Tab>
+</Tabs>
+
+### **Code Execution Flow**
+
+```mermaid
+sequenceDiagram
+    participant LLM as 🤖 LLM
+    participant CM as 📝 Code Mode Handler
+    participant VM as 🖥️ Starlark Interpreter
+    participant TM as 🔧 Tools Manager
+    participant MCP as 🌐 MCP Servers
+
+    LLM->>CM: executeToolCode({ code: "..." })
+    CM->>VM: Initialize sandbox
+    CM->>VM: Inject tool bindings
+    CM->>VM: Execute Python code
+
+    loop For each tool call in code
+        VM->>TM: server.tool(param=value)
+        TM->>MCP: Execute tool
+        MCP-->>TM: Tool result
+        TM-->>VM: Return result
+    end
+
+    VM-->>CM: Execution result
+    CM-->>LLM: { result, logs }
+```
+
+### **Starlark Sandbox**
+
+The code execution environment is carefully sandboxed using Starlark, a Python-like language designed for configuration and embedded scripting:
+
+<AccordionGroup>
+  <Accordion title="Available Features" icon="check" defaultOpen>
+
+  - ✅ **Python-like syntax** - Familiar Python syntax and semantics
+  - ✅ **Synchronous calls** - No async/await needed, direct function calls
+  - ✅ **List comprehensions** - `[x for x in items if condition]`
+  - ✅ **print()** - Output captured and returned in logs
+  - ✅ **Dict/List operations** - Standard Python data structures
+  - ✅ **Tool bindings** - All connected MCP tools as globals
+  </Accordion>
+
+  <Accordion title="Restricted Features" icon="ban">
+
+  - ❌ **Imports** - No `import` statements (tools are pre-bound)
+  - ❌ **Classes** - Use dicts and functions instead
+  - ❌ **File I/O** - No direct filesystem access (use MCP tools)
+  - ❌ **Network** - No direct network access (use MCP tools)
+  - ❌ **Randomness/Time** - Deterministic execution only
+
+  </Accordion>
+</AccordionGroup>
+
+### **Code Mode Security Model**
+
+```mermaid
+graph TB
+    subgraph "Security Layers"
+        L1["🔒 Code Validation<br/>Syntax checking before execution"]
+        L2["🛡️ Sandboxed Runtime<br/>No external module access"]
+        L3["⏱️ Execution Timeout<br/>Bounded runtime"]
+        L4["🔐 Tool ACL<br/>Only allowed tools accessible"]
+    end
+
+    subgraph "Execution Boundaries"
+        B1["No filesystem access<br/>(except via MCP tools)"]
+        B2["No network access<br/>(except via MCP tools)"]
+        B3["No process spawning"]
+        B4["Memory isolation enforced"]
+    end
+
+    L1 --> L2 --> L3 --> L4
+    L4 --> B1
+    L4 --> B2
+    L4 --> B3
+    L4 --> B4
+```
+
+### **Code Mode Configuration**
+
+<Tabs>
+  <Tab title="Gateway (config.json)">
+
+```json
+{
+  "mcp": {
+    "client_configs": [
+      {
+        "name": "filesystem",
+        "is_code_mode_client": true,
+        "connection_type": "stdio",
+        "stdio_config": {
+          "command": "npx",
+          "args": ["-y", "@anthropic/mcp-filesystem"]
+        },
+        "tools_to_execute": ["*"]
+      }
+    ],
+    "tool_manager_config": {
+      "code_mode_binding_level": "server",
+      "tool_execution_timeout": "30s"
+    }
+  }
+}
+```
+
+  </Tab>
+  <Tab title="Go SDK">
+
+```go
+mcpConfig := &schemas.MCPConfig{
+    ClientConfigs: []schemas.MCPClientConfig{
+        {
+            Name:             "filesystem",
+            IsCodeModeClient: true,
+            ConnectionType:   schemas.MCPConnectionTypeSTDIO,
+            StdioConfig: &schemas.MCPStdioConfig{
+                Command: "npx",
+                Args:    []string{"-y", "@anthropic/mcp-filesystem"},
+            },
+            ToolsToExecute: []string{"*"},
+        },
+    },
+    ToolManagerConfig: &schemas.MCPToolManagerConfig{
+        CodeModeBindingLevel: schemas.CodeModeBindingLevelServer,
+        ToolExecutionTimeout: 30 * time.Second,
+    },
+}
+```
+
+  </Tab>
+</Tabs>
+
+### **Code Mode vs Agent Mode**
+
+| Aspect | Agent Mode | Code Mode |
+|--------|------------|-----------|
+| **Execution Model** | LLM decides one tool at a time | LLM writes code orchestrating multiple tools |
+| **Iterations** | Multiple LLM round-trips | Single LLM call, code handles orchestration |
+| **Complexity** | Simple tool chains | Complex workflows with conditionals/loops |
+| **Latency** | Higher (multiple LLM calls) | Lower (single LLM call + code execution) |
+| **Control** | Per-tool approval possible | Code runs atomically |
+| **Best For** | Interactive agents | Batch operations, complex data processing |
+
+---
+
+## MCP Integration Patterns
+
+### **Common Integration Scenarios**
+
+**1. Filesystem Operations**
+
+- **Tools:** `list_files`, `read_file`, `write_file`, `create_directory`
+- **Use Cases:** Code analysis, document processing, file management
+- **Security:** Sandboxed file access, path validation, permission checks
+- **Performance:** Local execution for fast file operations
+
+**2. Web Search & Information Retrieval**
+
+- **Tools:** `web_search`, `fetch_url`, `extract_content`, `summarize`
+- **Use Cases:** Research assistance, fact-checking, content gathering
+- **Integration:** External search APIs, content parsing services
+- **Caching:** Response caching for repeated queries
+
+**3. Database Operations**
+
+- **Tools:** `query_database`, `insert_record`, `update_record`, `schema_info`
+- **Use Cases:** Data analysis, report generation, database administration
+- **Security:** Read-only access by default, query validation, injection prevention
+- **Performance:** Connection pooling, query optimization
+
+**4. API Integrations**
+
+- **Tools:** Custom business logic tools, third-party service integration
+- **Use Cases:** CRM operations, payment processing, notification sending
+- **Authentication:** API key management, OAuth token handling
+- **Error Handling:** Retry logic, fallback mechanisms
+
+### **MCP Server Development Patterns**
+
+**Simple STDIO Server:**
+
+- **Language:** Any language that can read/write JSON to stdin/stdout
+- **Deployment:** Single executable, minimal dependencies
+- **Use Case:** Local tools, development utilities, simple scripts
+
+**HTTP Service Server:**
+
+- **Architecture:** RESTful API with MCP protocol endpoints
+- **Scalability:** Horizontal scaling, load balancing
+- **Use Case:** Shared tools, enterprise integrations, cloud services
+
+**Hybrid Approach:**
+
+- **Local + Remote:** Combine STDIO tools for local operations with HTTP for remote services
+- **Failover:** Use local fallbacks when remote services are unavailable
+- **Optimization:** Route tool calls to most appropriate execution environment
+
+> **📖 MCP Development:** [Tool Development Guide →](../../mcp/overview)
+
+---
+
+## Security & Safety Considerations
+
+### **MCP Security Architecture**
+
+```mermaid
+graph TB
+    subgraph "Security Layers"
+        L1[Connection Security<br/>Authentication & Encryption]
+        L2[Tool Validation<br/>Schema & Permission Checks]
+        L3[Execution Security<br/>Sandboxing & Limits]
+        L4[Result Security<br/>Output Validation & Filtering]
+    end
+
+    subgraph "Threat Mitigation"
+        T1[Malicious Tools<br/>Code Injection Prevention]
+        T2[Resource Abuse<br/>Rate Limiting & Quotas]
+        T3[Data Exposure<br/>Output Sanitization]
+        T4[System Access<br/>Privilege Isolation]
+    end
+
+    L1 --> T1
+    L2 --> T2
+    L3 --> T4
+    L4 --> T3
+```
+
+**Security Measures:**
+
+**Connection Security:**
+
+- **Authentication** - API keys, certificates, or token-based auth for HTTP/SSE
+- **Encryption** - TLS for HTTP connections, secure pipes for STDIO
+- **Network Isolation** - Firewall rules and network segmentation
+
+**Execution Security:**
+
+- **Sandboxing** - Isolated execution environments for tools
+- **Resource Limits** - CPU, memory, and time constraints
+- **Permission Model** - Principle of least privilege for tool access
+
+**Operational Security:**
+
+- **Regular Updates** - Keep MCP servers and tools updated
+- **Monitoring** - Continuous security monitoring and alerting
+- **Incident Response** - Procedures for security incidents involving tools
+
+---
+
+## Related Architecture Documentation
+
+- **[Request Flow](./request-flow)** - MCP integration in request processing
+- **[Concurrency Model](./concurrency)** - MCP concurrency and worker integration
+- **[Plugin System](./plugins)** - Integration between MCP and plugin systems
+- **[Benchmarks](../../benchmarking/getting-started)** - MCP performance impact and optimization
+
+
+
--- a/docs/architecture/core/plugins.mdx
+++ b/docs/architecture/core/plugins.mdx
@@ -0,0 +1,552 @@
+---
+title: "Plugins"
+description: "Deep dive into Bifrost's extensible plugin architecture - how plugins work internally, lifecycle management, execution model, and integration patterns."
+icon: "puzzle-piece"
+---
+
+## Plugin Architecture Philosophy
+
+### **Core Design Principles**
+
+Bifrost's plugin system is built around five key principles that ensure extensibility without compromising performance or reliability:
+
+| Principle                     | Implementation                                   | Benefit                                          |
+| ----------------------------- | ------------------------------------------------ | ------------------------------------------------ |
+| **Plugin-First Design**    | Core logic designed around plugin hook points    | Maximum extensibility without core modifications |
+| **Zero-Copy Integration**  | Direct memory access to request/response objects | Minimal performance overhead                     |
+| **Lifecycle Management**   | Complete plugin lifecycle with automatic cleanup | Resource safety and leak prevention              |
+| **Interface-Based Safety** | Well-defined interfaces for type safety          | Compile-time validation and consistency          |
+| **Failure Isolation**      | Plugin errors don't crash the core system        | Fault tolerance and system stability             |
+
+### **Plugin System Overview**
+
+```mermaid
+graph TB
+    subgraph "Plugin Management Layer"
+        PluginMgr[Plugin Manager<br/>Central Controller]
+        Registry[Plugin Registry<br/>Discovery & Loading]
+        Lifecycle[Lifecycle Manager<br/>State Management]
+    end
+
+    subgraph "Plugin Execution Layer"
+        Pipeline[Plugin Pipeline<br/>Execution Orchestrator]
+        PreHooks[Pre-Processing Hooks<br/>Request Modification]
+        PostHooks[Post-Processing Hooks<br/>Response Enhancement]
+    end
+
+    subgraph "Plugin Categories"
+        Auth[Authentication<br/>& Authorization]
+        RateLimit[Rate Limiting<br/>& Throttling]
+        Transform[Data Transformation<br/>& Validation]
+        Monitor[Monitoring<br/>& Analytics]
+        Custom[Custom Business<br/>Logic]
+    end
+
+    PluginMgr --> Registry
+    Registry --> Lifecycle
+    Lifecycle --> Pipeline
+
+    Pipeline --> PreHooks
+    Pipeline --> PostHooks
+
+    PreHooks --> Auth
+    PreHooks --> RateLimit
+    PostHooks --> Transform
+    PostHooks --> Monitor
+    PostHooks --> Custom
+```
+
+---
+
+## Plugin Lifecycle Management
+
+### **Complete Lifecycle States**
+
+Every plugin goes through a well-defined lifecycle that ensures proper resource management and error handling:
+
+```mermaid
+stateDiagram-v2
+    [*] --> PluginInit: Plugin Creation
+    PluginInit --> Registered: Add to BifrostConfig
+    Registered --> PreHookCall: Request Received
+
+    PreHookCall --> ModifyRequest: Normal Flow
+    PreHookCall --> ShortCircuitResponse: Return Response
+    PreHookCall --> ShortCircuitError: Return Error
+
+    ModifyRequest --> ProviderCall: Send to Provider
+    ProviderCall --> PostHookCall: Receive Response
+
+    ShortCircuitResponse --> PostHookCall: Skip Provider
+    ShortCircuitError --> PostHookCall: Pipeline Symmetry
+
+    PostHookCall --> ModifyResponse: Process Result
+    PostHookCall --> RecoverError: Error Recovery
+    PostHookCall --> FallbackCheck: Check AllowFallbacks
+    PostHookCall --> ResponseReady: Pass Through
+
+    FallbackCheck --> TryFallback: AllowFallbacks=true/nil
+    FallbackCheck --> ResponseReady: AllowFallbacks=false
+    TryFallback --> PreHookCall: Next Provider
+
+    ModifyResponse --> ResponseReady: Modified
+    RecoverError --> ResponseReady: Recovered
+    ResponseReady --> [*]: Return to Client
+
+    Registered --> CleanupCall: Bifrost Shutdown
+    CleanupCall --> [*]: Plugin Destroyed
+```
+
+### **Lifecycle Phase Details**
+
+**Discovery Phase:**
+
+- **Purpose:** Find and catalog available plugins
+- **Sources:** Command line, environment variables, JSON configuration, directory scanning
+- **Validation:** Basic existence and format checks
+- **Output:** Plugin descriptors with metadata
+
+**Loading Phase:**
+
+- **Purpose:** Load plugin binaries into memory
+- **Security:** Digital signature verification and checksum validation
+- **Compatibility:** Interface implementation validation
+- **Resource:** Memory and capability assessment
+
+**Initialization Phase:**
+
+- **Purpose:** Configure plugin with runtime settings
+- **Timeout:** Bounded initialization time to prevent hanging
+- **Dependencies:** External service connectivity verification
+- **State:** Internal state setup and resource allocation
+
+**Runtime Phase:**
+
+- **Purpose:** Active request processing
+- **Monitoring:** Continuous health checking and performance tracking
+- **Recovery:** Automatic error recovery and degraded mode handling
+- **Metrics:** Real-time performance and health metrics collection
+
+> **Plugin Lifecycle:** [Plugin Management →](../../enterprise/custom-plugins)
+
+---
+
+## Plugin Execution Pipeline
+
+### **Request Processing Flow**
+
+The plugin pipeline ensures consistent, predictable execution while maintaining high performance:
+
+#### **Normal Execution Flow (No Short-Circuit)**
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant Bifrost
+    participant Plugin1
+    participant Plugin2
+    participant Provider
+
+    Client->>Bifrost: Request
+    Bifrost->>Plugin1: PreLLMHook(request)
+    Plugin1-->>Bifrost: modified request
+    Bifrost->>Plugin2: PreLLMHook(request)
+    Plugin2-->>Bifrost: modified request
+    Bifrost->>Provider: API Call
+    Provider-->>Bifrost: response
+    Bifrost->>Plugin2: PostLLMHook(response)
+    Plugin2-->>Bifrost: modified response
+    Bifrost->>Plugin1: PostLLMHook(response)
+    Plugin1-->>Bifrost: modified response
+    Bifrost-->>Client: Final Response
+```
+
+**Execution Order:**
+
+1. **PreHooks:** Execute in registration order (1 → 2 → N)
+2. **Provider Call:** If no short-circuit occurred
+3. **PostHooks:** Execute in reverse order (N → 2 → 1)
+
+#### **Short-Circuit Response Flow (Cache Hit)**
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant Bifrost
+    participant Cache
+    participant Auth
+    participant Provider
+
+    Client->>Bifrost: Request
+    Bifrost->>Auth: PreLLMHook(request)
+    Auth-->>Bifrost: modified request
+    Bifrost->>Cache: PreLLMHook(request)
+    Cache-->>Bifrost: LLMPluginShortCircuit{Response}
+    Note over Provider: Provider call skipped
+    Bifrost->>Cache: PostLLMHook(response)
+    Cache-->>Bifrost: modified response
+    Bifrost->>Auth: PostLLMHook(response)
+    Auth-->>Bifrost: modified response
+    Bifrost-->>Client: Cached Response
+```
+
+#### **Streaming Response Flow**
+
+For streaming responses, the plugin pipeline executes post-hooks for every delta/chunk received from the provider:
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant Bifrost
+    participant Plugin1
+    participant Plugin2
+    participant Provider
+
+    Client->>Bifrost: Stream Request
+    Bifrost->>Plugin1: PreLLMHook(request)
+    Plugin1-->>Bifrost: modified request
+    Bifrost->>Plugin2: PreLLMHook(request)
+    Plugin2-->>Bifrost: modified request
+    Bifrost->>Provider: Stream API Call
+
+    loop For Each Delta
+        Provider-->>Bifrost: stream delta
+        Bifrost->>Plugin2: PostLLMHook(delta)
+        Plugin2-->>Bifrost: modified delta
+        Bifrost->>Plugin1: PostLLMHook(delta)
+        Plugin1-->>Bifrost: modified delta
+        Bifrost-->>Client: Send Delta
+    end
+
+    Provider-->>Bifrost: final chunk (finish reason)
+    Bifrost->>Plugin2: PostLLMHook(final)
+    Plugin2-->>Bifrost: modified final
+    Bifrost->>Plugin1: PostLLMHook(final)
+    Plugin1-->>Bifrost: modified final
+    Bifrost-->>Client: Final Chunk
+```
+
+**Streaming Execution Characteristics:**
+
+1. **Delta Processing:**
+   - Each stream delta (chunk) goes through all post-hooks
+   - Plugins can modify/transform each delta before it reaches the client
+   - Deltas can contain: text content, tool calls, role changes, or usage info
+
+2. **Special Delta Types:**
+   - **Start Event:** Initial delta with role information
+   - **Content Delta:** Regular text or tool call content
+   - **Usage Update:** Token usage statistics (if enabled)
+   - **Final Chunk:** Contains finish reason and any final metadata
+
+3. **Plugin Considerations:**
+   - Plugins must handle streaming responses efficiently
+   - Each delta should be processed quickly to maintain stream responsiveness
+   - Plugins can track state across deltas using context
+   - Heavy processing should be done asynchronously
+
+4. **Error Handling:**
+   - If a post-hook returns an error, it's sent as an error stream chunk
+   - Stream is terminated after error chunks
+   - Plugins can recover from errors by providing valid responses
+
+5. **Performance Optimization:**
+   - Lightweight delta processing to minimize latency
+   - Object pooling for common data structures
+   - Non-blocking operations for logging and metrics
+   - Efficient memory management for stream processing
+
+> **Streaming Details:** [Streaming Guide →](../../quickstart/gateway/streaming)
+
+**Short-Circuit Rules:**
+
+- **Provider Skipped:** When plugin returns short-circuit response/error
+- **PostLLMHook Guarantee:** All executed PreHooks get corresponding PostLLMHook calls
+- **Reverse Order:** PostHooks execute in reverse order of PreHooks
+
+#### **Short-Circuit Error Flow (Allow Fallbacks)**
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant Bifrost
+    participant Plugin1
+    participant Provider1
+    participant Provider2
+
+    Client->>Bifrost: Request (Provider1 + Fallback Provider2)
+    Bifrost->>Plugin1: PreLLMHook(request)
+    Plugin1-->>Bifrost: LLMPluginShortCircuit{Error, AllowFallbacks=true}
+    Note over Provider1: Provider1 call skipped
+    Bifrost->>Plugin1: PostLLMHook(error)
+    Plugin1-->>Bifrost: error unchanged
+
+    Note over Bifrost: Try fallback provider
+    Bifrost->>Plugin1: PreLLMHook(request for Provider2)
+    Plugin1-->>Bifrost: modified request
+    Bifrost->>Provider2: API Call
+    Provider2-->>Bifrost: response
+    Bifrost->>Plugin1: PostLLMHook(response)
+    Plugin1-->>Bifrost: modified response
+    Bifrost-->>Client: Final Response
+```
+
+#### **Error Recovery Flow**
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant Bifrost
+    participant Plugin1
+    participant Plugin2
+    participant Provider
+    participant RecoveryPlugin
+
+    Client->>Bifrost: Request
+    Bifrost->>Plugin1: PreLLMHook(request)
+    Plugin1-->>Bifrost: modified request
+    Bifrost->>Plugin2: PreLLMHook(request)
+    Plugin2-->>Bifrost: modified request
+    Bifrost->>RecoveryPlugin: PreLLMHook(request)
+    RecoveryPlugin-->>Bifrost: modified request
+    Bifrost->>Provider: API Call
+    Provider-->>Bifrost: error
+    Bifrost->>RecoveryPlugin: PostLLMHook(error)
+    RecoveryPlugin-->>Bifrost: recovered response
+    Bifrost->>Plugin2: PostLLMHook(response)
+    Plugin2-->>Bifrost: modified response
+    Bifrost->>Plugin1: PostLLMHook(response)
+    Plugin1-->>Bifrost: modified response
+    Bifrost-->>Client: Recovered Response
+```
+
+**Error Recovery Features:**
+
+- **Error Transformation:** Plugins can convert errors to successful responses
+- **Graceful Degradation:** Provide fallback responses for service failures
+- **Context Preservation:** Error context is maintained through recovery process
+
+### **Complex Plugin Decision Flow**
+
+Real-world plugin interactions involving authentication, rate limiting, and caching with different decision paths:
+
+```mermaid
+graph TD
+    A["Client Request"] --> B["Bifrost"]
+    B --> C["Auth Plugin PreLLMHook"]
+    C --> D{"Authenticated?"}
+    D -->|No| E["Return Auth Error<br/>AllowFallbacks=false"]
+    D -->|Yes| F["RateLimit Plugin PreLLMHook"]
+    F --> G{"Rate Limited?"}
+    G -->|Yes| H["Return Rate Error<br/>AllowFallbacks=nil"]
+    G -->|No| I["Cache Plugin PreLLMHook"]
+    I --> J{"Cache Hit?"}
+    J -->|Yes| K["Return Cached Response"]
+    J -->|No| L["Provider API Call"]
+    L --> M["Cache Plugin PostLLMHook"]
+    M --> N["Store in Cache"]
+    N --> O["RateLimit Plugin PostLLMHook"]
+    O --> P["Auth Plugin PostLLMHook"]
+    P --> Q["Final Response"]
+
+    E --> R["Skip Fallbacks"]
+    H --> S["Try Fallback Provider"]
+    K --> T["Skip Provider Call"]
+```
+
+### **Execution Characteristics**
+
+**Symmetric Execution Pattern:**
+
+- **Pre-processing:** Plugins execute in priority order (high to low)
+- **Post-processing:** Plugins execute in reverse order (low to high)
+- **Rationale:** Ensures proper cleanup and state management (last in, first out)
+
+**Performance Optimizations:**
+
+- **Timeout Boundaries:** Each plugin has configurable execution timeouts
+- **Panic Recovery:** Plugin panics are caught and logged without crashing the system
+- **Resource Limits:** Memory and CPU limits prevent runaway plugins
+- **Circuit Breaking:** Repeated failures trigger plugin isolation
+
+**Error Handling Strategies:**
+
+- **Continue:** Use original request/response if plugin fails
+- **Fail Fast:** Return error immediately if critical plugin fails
+- **Retry:** Attempt plugin execution with exponential backoff
+- **Fallback:** Use alternative plugin or default behavior
+
+> **Plugin Execution:** [Request Flow →](./request-flow#stage-3-plugin-pipeline-processing)
+
+---
+
+## Security & Validation
+
+### **Multi-Layer Security Model**
+
+Plugin security operates at multiple layers to ensure system integrity:
+
+```mermaid
+graph TB
+    subgraph "Security Validation Layers"
+        L1[Layer 1: Binary Validation<br/>Signature & Checksum]
+        L2[Layer 2: Interface Validation<br/>Type Safety & Compatibility]
+        L3[Layer 3: Runtime Validation<br/>Resource Limits & Timeouts]
+        L4[Layer 4: Execution Isolation<br/>Panic Recovery & Error Handling]
+    end
+
+    subgraph "Security Benefits"
+        Integrity[Code Integrity<br/>Verified Authenticity]
+        Safety[Type Safety<br/>Compile-time Checks]
+        Stability[System Stability<br/>Isolated Failures]
+        Performance[Performance Protection<br/>Resource Limits]
+    end
+
+    L1 --> Integrity
+    L2 --> Safety
+    L3 --> Performance
+    L4 --> Stability
+```
+
+### **Validation Process**
+
+**Binary Security:**
+
+- **Digital Signatures:** Cryptographic verification of plugin authenticity
+- **Checksum Validation:** File integrity verification
+- **Source Verification:** Trusted source requirements
+
+**Interface Security:**
+
+- **Type Safety:** Interface implementation verification
+- **Version Compatibility:** Plugin API version checking
+- **Memory Safety:** Safe memory access patterns
+
+**Runtime Security:**
+
+- **Resource Quotas:** Memory and CPU usage limits
+- **Execution Timeouts:** Bounded execution time
+- **Sandbox Execution:** Isolated execution environment
+
+**Operational Security:**
+
+- **Health Monitoring:** Continuous plugin health assessment
+- **Error Tracking:** Plugin error rate monitoring
+- **Automatic Recovery:** Failed plugin restart and recovery
+
+---
+
+## Plugin Performance & Monitoring
+
+### **Comprehensive Metrics System**
+
+Bifrost provides detailed metrics for plugin performance and health monitoring:
+
+```mermaid
+graph TB
+    subgraph "Execution Metrics"
+        ExecTime[Execution Time<br/>Latency per Plugin]
+        ExecCount[Execution Count<br/>Request Volume]
+        SuccessRate[Success Rate<br/>Error Percentage]
+        Throughput[Throughput<br/>Requests/Second]
+    end
+
+    subgraph "Resource Metrics"
+        MemoryUsage[Memory Usage<br/>Per Plugin Instance]
+        CPUUsage[CPU Utilization<br/>Processing Time]
+        IOMetrics[I/O Operations<br/>Network/Disk Activity]
+        PoolUtilization[Pool Utilization<br/>Resource Efficiency]
+    end
+
+    subgraph "Health Metrics"
+        ErrorRate[Error Rate<br/>Failed Executions]
+        PanicCount[Panic Recovery<br/>Crash Events]
+        TimeoutCount[Timeout Events<br/>Slow Executions]
+        RecoveryRate[Recovery Success<br/>Failure Handling]
+    end
+
+    subgraph "Business Metrics"
+        AddedLatency[Added Latency<br/>Plugin Overhead]
+        SystemImpact[System Impact<br/>Overall Performance]
+        FeatureUsage[Feature Usage<br/>Plugin Utilization]
+        CostImpact[Cost Impact<br/>Resource Consumption]
+    end
+```
+
+### **Performance Characteristics**
+
+**Plugin Execution Performance:**
+
+- **Typical Overhead:** 1-10μs per plugin for simple operations
+- **Authentication Plugins:** 1-5μs for key validation
+- **Rate Limiting Plugins:** 500ns for quota checks
+- **Monitoring Plugins:** 200ns for metric collection
+- **Transformation Plugins:** 2-10μs depending on complexity
+
+**Resource Usage Patterns:**
+
+- **Memory Efficiency:** Object pooling reduces allocations
+- **CPU Optimization:** Minimal processing overhead
+- **Network Impact:** Configurable external service calls
+- **Storage Overhead:** Minimal for stateless plugins
+
+---
+
+## Plugin Integration Patterns
+
+### **Common Integration Scenarios**
+
+**1. Authentication & Authorization**
+
+- **Pre-processing Hook:** Validate API keys or JWT tokens
+- **Configuration:** External identity provider integration
+- **Error Handling:** Return 401/403 responses for invalid credentials
+- **Performance:** Sub-5μs validation with caching
+
+**2. Rate Limiting & Quotas**
+
+- **Pre-processing Hook:** Check request quotas and limits
+- **Storage:** Redis or in-memory rate limit tracking
+- **Algorithms:** Token bucket, sliding window, fixed window
+- **Responses:** 429 Too Many Requests with retry headers
+
+**3. Request/Response Transformation**
+
+- **Dual Hooks:** Pre-processing for requests, post-processing for responses
+- **Use Cases:** Data format conversion, field mapping, content filtering
+- **Performance:** Streaming transformations for large payloads
+- **Compatibility:** Provider-specific format adaptations
+
+**4. Monitoring & Analytics**
+
+- **Post-processing Hook:** Collect metrics and logs after request completion
+- **Destinations:** Prometheus, DataDog, custom analytics systems
+- **Data:** Request/response metadata, performance metrics, error tracking
+- **Privacy:** Configurable data sanitization and filtering
+
+### **Plugin Communication Patterns**
+
+**Plugin-to-Plugin Communication:**
+
+- **Shared Context:** Plugins can store data in request context for downstream plugins
+- **Event System:** Plugin can emit events for other plugins to consume
+- **Data Passing:** Structured data exchange between related plugins
+
+**Plugin-to-External Service Communication:**
+
+- **HTTP Clients:** Built-in HTTP client pools for external API calls
+- **Database Connections:** Connection pooling for database access
+- **Message Queues:** Integration with message queue systems
+- **Caching Systems:** Redis, Memcached integration for state storage
+
+> **📖 Integration Examples:** [Plugin Development Guide →](../../enterprise/custom-plugins)
+
+---
+
+## Related Architecture Documentation
+
+- **[Request Flow](./request-flow)** - Plugin execution in request processing pipeline
+- **[Concurrency Model](./concurrency)** - Plugin concurrency and threading considerations
+- **[Benchmarks](../../benchmarking/getting-started)** - Plugin performance characteristics and optimization
+- **[MCP System](./mcp)** - Integration between plugins and MCP system
+
--- a/docs/architecture/core/providers.mdx
+++ b/docs/architecture/core/providers.mdx
--- a/docs/architecture/core/request-flow.mdx
+++ b/docs/architecture/core/request-flow.mdx
@@ -0,0 +1,527 @@
+---
+title: "Request Flow"
+description: "Deep dive into Bifrost's request processing pipeline - from transport layer ingestion through provider execution to response delivery."
+icon: "route"
+---
+
+## Stage 1: Transport Layer Processing
+
+### **HTTP Transport Flow**
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant HTTPTransport
+    participant Router
+    participant Validation
+
+    Client->>HTTPTransport: POST /v1/chat/completions
+    HTTPTransport->>HTTPTransport: Parse Headers
+    HTTPTransport->>HTTPTransport: Extract Body
+    HTTPTransport->>Validation: Validate JSON Schema
+    Validation->>Router: BifrostRequest
+    Router-->>HTTPTransport: Processing Started
+    HTTPTransport-->>Client: HTTP 200 (async processing)
+```
+
+**Key Processing Steps:**
+
+1. **Request Reception** - FastHTTP server receives request
+2. **Header Processing** - Extract authentication, content-type, custom headers
+3. **Body Parsing** - JSON unmarshaling with schema validation
+4. **Request Transformation** - Convert to internal `BifrostRequest` schema
+5. **Context Creation** - Build request context with metadata
+
+**Performance Characteristics:**
+
+- **Parsing Time:** ~2.1μs for typical requests
+- **Validation Overhead:** ~400ns for schema checks
+- **Memory Allocation:** Zero-copy where possible
+
+### **Go SDK Flow**
+
+```mermaid
+sequenceDiagram
+    participant Application
+    participant SDK
+    participant Core
+    participant Validation
+
+    Application->>SDK: bifrost.ChatCompletion(req)
+    SDK->>SDK: Type Validation
+    SDK->>Core: Direct Function Call
+    Core->>Validation: Schema Validation
+    Validation-->>Core: Validated Request
+    Core-->>SDK: Processing Result
+    SDK-->>Application: Typed Response
+```
+
+**Advantages:**
+
+- **Zero Serialization** - Direct Go struct passing
+- **Type Safety** - Compile-time validation
+- **Lower Latency** - No HTTP/JSON overhead
+- **Memory Efficiency** - No intermediate allocations
+
+---
+
+## Stage 2: Request Routing & Load Balancing
+
+### **Provider Selection Logic**
+
+```mermaid
+flowchart TD
+    Request[Incoming Request] --> ModelCheck{Model Available?}
+    ModelCheck -->|Yes| ProviderDirect[Use Specified Provider]
+    ModelCheck -->|No| ModelMapping[Model → Provider Mapping]
+
+    ProviderDirect --> KeyPool[API Key Pool]
+    ModelMapping --> KeyPool
+
+    KeyPool --> WeightedSelect[Weighted Random Selection]
+    WeightedSelect --> HealthCheck{Provider Healthy?}
+
+    HealthCheck -->|Yes| AssignWorker[Assign Worker]
+    HealthCheck -->|No| CircuitBreaker[Circuit Breaker]
+
+    CircuitBreaker --> FallbackCheck{Fallback Available?}
+    FallbackCheck -->|Yes| FallbackProvider[Try Fallback]
+    FallbackCheck -->|No| ErrorResponse[Return Error]
+
+    FallbackProvider --> KeyPool
+```
+
+**Key Selection Algorithm:**
+
+```go
+// Weighted random key selection
+type KeySelector struct {
+    keys    []APIKey
+    weights []float64
+    total   float64
+}
+
+func (ks *KeySelector) SelectKey() *APIKey {
+    r := rand.Float64() * ks.total
+    cumulative := 0.0
+
+    for i, weight := range ks.weights {
+        cumulative += weight
+        if r <= cumulative {
+            return &ks.keys[i]
+        }
+    }
+    return &ks.keys[len(ks.keys)-1]
+}
+```
+
+**Performance Metrics:**
+
+- **Key Selection Time:** ~10ns (constant time)
+- **Health Check Overhead:** ~50ns (cached results)
+- **Fallback Decision:** ~25ns (configuration lookup)
+
+---
+
+## Stage 3: Plugin Pipeline Processing
+
+### **Pre-Processing Hooks**
+
+```mermaid
+sequenceDiagram
+    participant Request
+    participant AuthPlugin
+    participant RateLimitPlugin
+    participant TransformPlugin
+    participant Core
+
+    Request->>AuthPlugin: ProcessRequest()
+    AuthPlugin->>AuthPlugin: Validate API Key
+    AuthPlugin->>RateLimitPlugin: Authorized Request
+
+    RateLimitPlugin->>RateLimitPlugin: Check Rate Limits
+    RateLimitPlugin->>TransformPlugin: Allowed Request
+
+    TransformPlugin->>TransformPlugin: Modify Request
+    TransformPlugin->>Core: Final Request
+```
+
+**Plugin Execution Model:**
+
+```go
+type PluginManager struct {
+    plugins []Plugin
+}
+
+func (pm *PluginManager) ExecutePreHooks(
+    ctx BifrostContext,
+    req *BifrostRequest,
+) (*BifrostRequest, *BifrostError) {
+    for _, plugin := range pm.plugins {
+        modifiedReq, err := plugin.ProcessRequest(ctx, req)
+        if err != nil {
+            return nil, err
+        }
+        req = modifiedReq
+    }
+    return req, nil
+}
+```
+
+**Plugin Types & Performance:**
+
+| Plugin Type           | Processing Time | Memory Impact | Failure Mode           |
+| --------------------- | --------------- | ------------- | ---------------------- |
+| **Authentication**    | ~1-5μs          | Minimal       | Reject request         |
+| **Rate Limiting**     | ~500ns          | Cache-based   | Throttle/reject        |
+| **Request Transform** | ~2-10μs         | Copy-on-write | Continue with original |
+| **Monitoring**        | ~200ns          | Append-only   | Continue silently      |
+
+---
+
+## Stage 4: MCP Tool Discovery & Integration
+
+### **Tool Discovery Process**
+
+```mermaid
+flowchart TD
+    Request[Request with Model] --> MCPCheck{MCP Enabled?}
+    MCPCheck -->|No| SkipMCP[Skip MCP Processing]
+    MCPCheck -->|Yes| ClientLookup[MCP Client Lookup]
+
+    ClientLookup --> ToolFilter[Tool Filtering]
+    ToolFilter --> ToolInject[Inject Tools into Request]
+
+    ToolFilter --> IncludeCheck{Include Filter?}
+    ToolFilter --> ExcludeCheck{Exclude Filter?}
+
+    IncludeCheck -->|Yes| IncludeTools[Include Specified Tools]
+    IncludeCheck -->|No| AllTools[Include All Tools]
+
+    ExcludeCheck -->|Yes| RemoveTools[Remove Excluded Tools]
+    ExcludeCheck -->|No| KeepFiltered[Keep Filtered Tools]
+
+    IncludeTools --> ToolInject
+    AllTools --> ToolInject
+    RemoveTools --> ToolInject
+    KeepFiltered --> ToolInject
+
+    ToolInject --> EnhancedRequest[Request with Tools]
+    SkipMCP --> EnhancedRequest
+```
+
+**Tool Integration Algorithm:**
+
+```go
+func (mcpm *MCPManager) EnhanceRequest(
+    ctx BifrostContext,
+    req *BifrostChatRequest,
+) (*BifrostRequest, error) {
+    // Extract tool filtering from context
+    includeClients := ctx.GetStringSlice("mcp-include-clients")
+    includeTools := ctx.GetStringSlice("mcp-include-tools")
+
+    // Get available tools
+    availableTools := mcpm.getAvailableTools(includeClients)
+
+    // Filter tools  
+    filteredTools := mcpm.filterTools(availableTools, includeTools)
+
+    // Inject into request
+    if req.Params == nil {
+        req.Params = &ChatParameters{}
+    }
+    req.Params.Tools = append(req.Params.Tools, filteredTools...)
+
+    return req, nil
+}
+```
+
+**MCP Performance Impact:**
+
+- **Tool Discovery:** ~100-500μs (cached after first request)
+- **Tool Filtering:** ~50-200ns per tool
+- **Request Enhancement:** ~1-5μs depending on tool count
+
+---
+
+## Stage 5: Memory Pool Management
+
+### **Object Pool Lifecycle**
+
+```mermaid
+stateDiagram-v2
+    [*] --> PoolInit: System Startup
+    PoolInit --> Available: Objects Pre-allocated
+
+    Available --> Acquired: Request Processing
+    Acquired --> InUse: Object Populated
+    InUse --> Processing: Worker Processing
+    Processing --> Completed: Processing Done
+    Completed --> Reset: Object Cleanup
+    Reset --> Available: Return to Pool
+
+    Available --> Expansion: Pool Exhaustion
+    Expansion --> Available: New Objects Created
+
+    Reset --> GC: Pool Full
+    GC --> [*]: Garbage Collection
+```
+
+**Memory Pool Implementation:**
+
+```go
+type MemoryPools struct {
+    channelPool  sync.Pool
+    messagePool  sync.Pool
+    responsePool sync.Pool
+    bufferPool   sync.Pool
+}
+
+func (mp *MemoryPools) GetChannel() *ProcessingChannel {
+    if ch := mp.channelPool.Get(); ch != nil {
+        return ch.(*ProcessingChannel)
+    }
+    return NewProcessingChannel()
+}
+
+func (mp *MemoryPools) ReturnChannel(ch *ProcessingChannel) {
+    ch.Reset() // Clear previous data
+    mp.channelPool.Put(ch)
+}
+```
+
+---
+
+## Stage 6: Worker Pool Processing
+
+### **Worker Assignment & Execution**
+
+```mermaid
+sequenceDiagram
+    participant Queue
+    participant WorkerPool
+    participant Worker
+    participant Provider
+    participant Circuit
+
+    Queue->>WorkerPool: Enqueue Request
+    WorkerPool->>Worker: Assign Available Worker
+    Worker->>Circuit: Check Circuit Breaker
+    Circuit->>Provider: Forward Request
+
+    Provider-->>Circuit: Response/Error
+    Circuit->>Circuit: Update Health Metrics
+    Circuit-->>Worker: Provider Response
+    Worker-->>WorkerPool: Release Worker
+    WorkerPool-->>Queue: Request Completed
+```
+
+**Worker Pool Architecture:**
+
+```go
+type ProviderWorkerPool struct {
+    workers    chan *Worker
+    queue      chan *ProcessingJob
+    config     WorkerPoolConfig
+    metrics    *PoolMetrics
+}
+
+func (pwp *ProviderWorkerPool) ProcessRequest(job *ProcessingJob) {
+    // Get worker from pool
+    worker := <-pwp.workers
+
+    go func() {
+        defer func() {
+            // Return worker to pool
+            pwp.workers <- worker
+        }()
+
+        // Process request
+        result := worker.Execute(job)
+        job.ResultChan <- result
+    }()
+}
+```
+
+---
+
+## Stage 7: Provider API Communication
+
+### **HTTP Request Execution**
+
+```mermaid
+sequenceDiagram
+    participant Worker
+    participant HTTPClient
+    participant Provider
+    participant CircuitBreaker
+    participant Metrics
+
+    Worker->>HTTPClient: PrepareRequest()
+    HTTPClient->>HTTPClient: Add Headers & Auth
+    HTTPClient->>CircuitBreaker: CheckHealth()
+    CircuitBreaker->>Provider: HTTP Request
+
+    Provider-->>CircuitBreaker: HTTP Response
+    CircuitBreaker->>Metrics: Record Metrics
+    CircuitBreaker-->>HTTPClient: Response/Error
+    HTTPClient-->>Worker: Parsed Response
+```
+
+**Request Preparation Pipeline:**
+
+```go
+func (w *ProviderWorker) ExecuteRequest(job *ProcessingJob) *ProviderResponse {
+    // Prepare HTTP request
+    httpReq := w.prepareHTTPRequest(job.Request)
+
+    // Add authentication
+    w.addAuthentication(httpReq, job.APIKey)
+
+    // Execute with timeout
+    ctx, cancel := context.WithTimeout(context.Background(), job.Timeout)
+    defer cancel()
+
+    httpResp, err := w.httpClient.Do(httpReq.WithContext(ctx))
+    if err != nil {
+        return w.handleError(err, job)
+    }
+
+    // Parse response
+    return w.parseResponse(httpResp, job)
+}
+```
+
+---
+
+## Stage 8: Tool Execution & Response Processing
+
+### **MCP Tool Execution Flow**
+
+```mermaid
+sequenceDiagram
+    participant Provider
+    participant MCPProcessor
+    participant MCPServer
+    participant ToolExecutor
+    participant ResponseBuilder
+
+    Provider->>MCPProcessor: Response with Tool Calls
+    MCPProcessor->>MCPProcessor: Extract Tool Calls
+
+    loop For each tool call
+        MCPProcessor->>MCPServer: Execute Tool
+        MCPServer->>ToolExecutor: Tool Invocation
+        ToolExecutor-->>MCPServer: Tool Result
+        MCPServer-->>MCPProcessor: Tool Response
+    end
+
+    MCPProcessor->>ResponseBuilder: Combine Results
+    ResponseBuilder-->>Provider: Enhanced Response
+```
+
+**Tool Execution Pipeline:**
+
+```go
+func (mcp *MCPProcessor) ProcessToolCalls(
+    response *ProviderResponse,
+) (*ProviderResponse, error) {
+    toolCalls := mcp.extractToolCalls(response)
+    if len(toolCalls) == 0 {
+        return response, nil
+    }
+
+    // Execute tools concurrently
+    results := make(chan ToolResult, len(toolCalls))
+    for _, toolCall := range toolCalls {
+        go func(tc ToolCall) {
+            result := mcp.executeTool(tc)
+            results <- result
+        }(toolCall)
+    }
+
+    // Collect results
+    toolResults := make([]ToolResult, 0, len(toolCalls))
+    for i := 0; i < len(toolCalls); i++ {
+        toolResults = append(toolResults, <-results)
+    }
+
+    // Enhance response
+    return mcp.enhanceResponse(response, toolResults), nil
+}
+```
+
+---
+
+## Stage 9: Post-Processing & Response Formation
+
+### **Plugin Post-Processing**
+
+```mermaid
+sequenceDiagram
+    participant CoreResponse
+    participant LoggingPlugin
+    participant CachePlugin
+    participant MetricsPlugin
+    participant Transport
+
+    CoreResponse->>LoggingPlugin: ProcessResponse()
+    LoggingPlugin->>LoggingPlugin: Log Request/Response
+    LoggingPlugin->>CachePlugin: Response + Logs
+
+    CachePlugin->>CachePlugin: Cache Response
+    CachePlugin->>MetricsPlugin: Cached Response
+
+    MetricsPlugin->>MetricsPlugin: Record Metrics
+    MetricsPlugin->>Transport: Final Response
+```
+
+**Response Enhancement Pipeline:**
+
+```go
+func (pm *PluginManager) ExecutePostHooks(
+    ctx BifrostContext,
+    req *BifrostRequest,
+    resp *BifrostResponse,
+) (*BifrostResponse, error) {
+    for _, plugin := range pm.plugins {
+        enhancedResp, err := plugin.ProcessResponse(ctx, req, resp)
+        if err != nil {
+            // Log error but continue processing
+            pm.logger.Warn("Plugin post-processing error", "plugin", plugin.Name(), "error", err)
+            continue
+        }
+        resp = enhancedResp
+    }
+    return resp, nil
+}
+```
+
+### **Response Serialization**
+
+```mermaid
+flowchart TD
+    Response[BifrostResponse] --> Format{Response Format}
+    Format -->|HTTP| JSONSerialize[JSON Serialization]
+    Format -->|SDK| DirectReturn[Direct Go Struct]
+
+    JSONSerialize --> Compress[Compression]
+    DirectReturn --> TypeCheck[Type Validation]
+
+    Compress --> Headers[Set Headers]
+    TypeCheck --> Return[Return Response]
+
+    Headers --> HTTPResponse[HTTP Response]
+    HTTPResponse --> Client[Client Response]
+    Return --> Client
+```
+
+---
+
+## Related Architecture Documentation
+
+- **[Concurrency Model](./concurrency)** - Worker pools and threading details
+- **[Plugin System](./plugins)** - Plugin execution and lifecycle
+- **[MCP System](./mcp)** - Tool discovery and execution internals
+- **[Benchmarks](../../benchmarking/getting-started)** - Detailed performance analysis
--- a/docs/architecture/framework/config-store.mdx
+++ b/docs/architecture/framework/config-store.mdx
@@ -0,0 +1,161 @@
+---
+title: "Config Store"
+description: "A persistent and flexible configuration management system for Bifrost, supporting multiple database backends."
+icon: "gear"
+---
+
+The ConfigStore is a critical component of the Bifrost framework, providing a centralized and persistent storage solution for all gateway configurations. It abstracts the underlying database, offering a unified API for managing everything from provider settings and virtual keys to governance policies and plugin configurations.
+
+## Core Features
+
+- **Unified Configuration API**: A single interface (`ConfigStore`) for all configuration CRUD (Create, Read, Update, Delete) operations.
+- **Multiple Backend Support**: Out-of-the-box support for SQLite and PostgreSQL, with an extensible architecture for adding new database backends.
+- **Comprehensive Data Management**: Manages a wide range of configuration data, including:
+    - Provider and key settings
+    - Virtual keys and governance rules (budgets, rate limits)
+    - Customer and team information for multi-tenancy
+    - Plugin configurations
+    - Vector store and log store settings
+    - Model pricing information
+- **Transactional Operations**: Ensures data consistency by supporting atomic transactions for complex configuration changes.
+- **Database Migrations**: Integrated migration system to manage schema evolution across different versions of Bifrost.
+- **Environment Variable Handling**: Securely manages sensitive data like API keys by storing references to environment variables instead of raw values.
+
+## Architecture
+
+The ConfigStore is designed around the `ConfigStore` interface, which defines all the methods for interacting with the configuration data. The primary implementation is `RDBConfigStore`, which uses [GORM](https://gorm.io/) as an ORM to communicate with relational databases.
+
+### Supported Backends
+
+- **SQLite**: The default, file-based database, perfect for local development, testing, and single-node deployments. It requires no external services.
+- **PostgreSQL**: A robust, production-grade database suitable for large-scale, high-availability deployments.
+
+The backend is selected and configured in Bifrost's main configuration file.
+
+### Initialization
+
+The ConfigStore is initialized at startup based on the provided configuration.
+
+```go
+import (
+    "github.com/maximhq/bifrost/framework/configstore"
+    "github.com/maximhq/bifrost/core/schemas"
+)
+
+// Example: Initialize a SQLite-based ConfigStore
+config := &configstore.Config{
+    Enabled: true,
+    Type:    configstore.ConfigStoreTypeSQLite,
+    Config: &configstore.SQLiteConfig{
+        File: "/path/to/config.db",
+    },
+}
+
+var logger schemas.Logger // Assume logger is initialized
+store, err := configstore.NewConfigStore(context.Background(), config, logger)
+if err != nil {
+    // Handle error
+}
+```
+
+Here is an example for initializing a PostgreSQL-based `ConfigStore`:
+```go
+// Example: Initialize a PostgreSQL-based ConfigStore
+pgConfig := &configstore.Config{
+    Enabled: true,
+    Type:    configstore.ConfigStoreTypePostgres,
+    Config: &configstore.PostgresConfig{
+        Host:         "localhost",
+        Port:         "5432",
+        User:         "postgres",
+        Password:     "secret",
+        DBName:       "bifrost",
+        SSLMode:      "disable",
+        MaxIdleConns: 5,  // Optional: Maximum idle connections (default: 5)
+        MaxOpenConns: 50, // Optional: Maximum open connections (default: 50)
+    },
+}
+
+store, err = configstore.NewConfigStore(context.Background(), pgConfig, logger)
+if err != nil {
+    // Handle error
+}
+```
+
+<Note>
+PostgreSQL databases used by Bifrost stores must be UTF8 encoded. See [PostgreSQL UTF8 Requirement](../../quickstart/gateway/setting-up#postgresql-utf8-requirement).
+</Note>
+
+### Connection Pool Configuration
+
+For PostgreSQL backends, you can configure the database connection pool to optimize performance based on your workload:
+
+- **MaxIdleConns**: Maximum number of idle connections in the pool (default: 5)
+- **MaxOpenConns**: Maximum number of open connections to the database (default: 50)
+
+These parameters help manage database connection resources effectively. Increase them for high-traffic deployments or decrease them for resource-constrained environments.
+
+## Data Models
+
+The ConfigStore manages a variety of data models, which are defined as GORM tables in the `framework/configstore/tables` directory. Some of the key models include:
+
+- `TableVirtualKey`: Represents a virtual key with its associated governance rules, keys, and metadata.
+- `TableProvider` & `TableKey`: Store provider-specific configurations and the physical API keys.
+- `TableBudget` & `TableRateLimit`: Define spending limits and request rate limits for governance.
+- `TableCustomer` & `TableTeam`: Enable multi-tenant configurations.
+- `TableModelPricing`: Caches model pricing information for cost calculation.
+- `TablePlugin`: Stores configuration for loaded plugins.
+
+## Usage
+
+The `ConfigStore` interface provides a rich set of methods for managing Bifrost's configuration.
+
+### Managing Virtual Keys
+
+```go
+// Create a new virtual key
+newKey := &tables.TableVirtualKey{
+    ID: "vk-12345",
+    Name: "My Test Key",
+    // ... other fields
+}
+err := store.CreateVirtualKey(ctx, newKey)
+
+// Retrieve a virtual key
+virtualKey, err := store.GetVirtualKey(ctx, "vk-12345")
+```
+
+### Managing Providers
+
+```go
+// Get all provider configurations
+providers, err := store.GetProvidersConfig(ctx)
+
+// Update a specific provider
+providerConfig := providers[schemas.OpenAI]
+providerConfig.NetworkConfig.TimeoutSeconds = 120
+err = store.UpdateProvider(ctx, schemas.OpenAI, providerConfig, envKeys)
+```
+
+### Executing Transactions
+
+For operations that require multiple database writes, you can use a transaction to ensure atomicity.
+
+```go
+err := store.ExecuteTransaction(ctx, func(tx *gorm.DB) error {
+    // Perform multiple operations within this transaction
+    if err := store.CreateBudget(ctx, budget1, tx); err != nil {
+        return err // Rollback
+    }
+    if err := store.UpdateRateLimit(ctx, limit1, tx); err != nil {
+        return err // Rollback
+    }
+    return nil // Commit
+})
+```
+
+## Migrations
+
+The ConfigStore includes a migration system to handle database schema changes between Bifrost versions. Migrations are automatically applied at startup, ensuring the database schema is always up-to-date. This process is managed by the `migrator` package and is transparent to the user.
+
+The ConfigStore is a powerful and flexible component that provides the backbone for Bifrost's dynamic configuration capabilities. Its support for multiple backends and transactional operations makes it suitable for both small-scale and large-scale, production environments.
--- a/docs/architecture/framework/log-store.mdx
+++ b/docs/architecture/framework/log-store.mdx
@@ -0,0 +1,176 @@
+---
+title: "Log Store"
+description: "A robust and queryable system for persisting API request and response logs, with support for multiple database backends."
+icon: "clipboard-list"
+---
+
+The LogStore is a core component of the Bifrost framework responsible for capturing, storing, and retrieving detailed logs of API requests and responses. It provides a persistent, queryable audit trail of all activity passing through the gateway, which is essential for debugging, monitoring, analytics, and compliance.
+
+## Core Features
+
+- **Persistent Logging**: Automatically saves detailed information about each API request, including input, output, status, latency, and cost.
+- **Multiple Backend Support**: Comes with built-in support for SQLite and PostgreSQL, allowing you to choose the best storage solution for your deployment needs.
+- **Rich Querying and Filtering**: A powerful search API allows you to filter and sort logs based on a wide range of criteria such as provider, model, status, latency, cost, and content.
+- **Performance Analytics**: The search functionality also provides aggregated statistics, including total requests, success rate, average latency, total tokens, and total cost for the queried data.
+- **Structured Data Model**: Logs are stored in a structured format, with complex objects like message history and tool calls serialized as JSON for efficient storage and retrieval.
+- **Automatic Data Management**: Includes GORM hooks to automatically handle JSON serialization/deserialization and to build a searchable content summary.
+
+## Architecture
+
+The LogStore is built around the `LogStore` interface, which defines the standard methods for interacting with the log database. The primary implementation, `RDBLogStore`, uses GORM to provide an abstraction over relational databases.
+
+### Supported Backends
+
+- **SQLite**: The default, file-based database, ideal for local development and smaller, single-node deployments.
+- **PostgreSQL**: A production-ready database for scalable and high-availability deployments.
+
+The backend is configured in Bifrost's main configuration file.
+
+### Initialization
+
+The LogStore is initialized at startup based on the provided configuration.
+
+```go
+import (
+    "github.com/maximhq/bifrost/framework/logstore"
+    "github.com/maximhq/bifrost/core/schemas"
+)
+
+// Example: Initialize a SQLite-based LogStore
+config := &logstore.Config{
+    Enabled: true,
+    Type:    logstore.LogStoreTypeSQLite,
+    Config: &logstore.SQLiteConfig{
+        File: "/path/to/logs.db",
+    },
+}
+
+var logger schemas.Logger // Assume logger is initialized
+store, err := logstore.NewLogStore(context.Background(), config, logger)
+if err != nil {
+    // Handle error
+}
+```
+
+Here is an example for initializing a PostgreSQL-based `LogStore`:
+```go
+// Example: Initialize a PostgreSQL-based LogStore
+pgConfig := &logstore.Config{
+    Enabled: true,
+    Type:    logstore.LogStoreTypePostgres,
+    Config: &logstore.PostgresConfig{
+        Host:         "localhost",
+        Port:         "5432",
+        User:         "postgres",
+        Password:     "secret",
+        DBName:       "bifrost_logs",
+        SSLMode:      "disable",
+        MaxIdleConns: 5,  // Optional: Maximum idle connections (default: 5)
+        MaxOpenConns: 50, // Optional: Maximum open connections (default: 50)
+    },
+}
+
+store, err = logstore.NewLogStore(context.Background(), pgConfig, logger)
+if err != nil {
+    // Handle error
+}
+```
+
+<Note>
+PostgreSQL databases used by Bifrost stores must be UTF8 encoded. See [PostgreSQL UTF8 Requirement](../../quickstart/gateway/setting-up#postgresql-utf8-requirement).
+</Note>
+
+### Connection Pool Configuration
+
+For PostgreSQL backends, you can configure the database connection pool to optimize performance based on your workload:
+
+- **MaxIdleConns**: Maximum number of idle connections in the pool (default: 5)
+- **MaxOpenConns**: Maximum number of open connections to the database (default: 50)
+
+These parameters help manage database connection resources effectively. Increase them for high-traffic deployments or decrease them for resource-constrained environments.
+
+## Data Model
+
+The core of the LogStore is the `Log` struct, which represents a single log entry in the `logs` table.
+
+```go
+// Log represents a complete log entry for a request/response cycle
+type Log struct {
+    ID                  string    `gorm:"primaryKey;type:varchar(255)"`
+    Timestamp           time.Time `gorm:"index;not null"`
+    Object              string    `gorm:"type:varchar(255);index;not null;column:object_type"`
+    Provider            string    `gorm:"type:varchar(255);index;not null"`
+    Model               string    `gorm:"type:varchar(255);index;not null"`
+    Latency             *float64
+    Cost                *float64  `gorm:"index"`
+    Status              string    `gorm:"type:varchar(50);index;not null"` // "processing", "success", or "error"
+    Stream              bool      `gorm:"default:false"`
+
+    // Denormalized token fields for easier querying
+    PromptTokens     int `gorm:"default:0"`
+    CompletionTokens int `gorm:"default:0"`
+    TotalTokens      int `gorm:"default:0"`
+
+    // JSON serialized fields
+    InputHistory        string `gorm:"type:text"`
+    OutputMessage       string `gorm:"type:text"`
+    TokenUsage          string `gorm:"type:text"`
+    ErrorDetails        string `gorm:"type:text"`
+    // ... and many more for different data types
+}
+```
+Complex data like message arrays and tool calls are serialized into JSON strings for storage and are automatically deserialized back into their struct forms when retrieved.
+
+## Usage
+
+### Creating Log Entries
+
+A log entry is created by populating a `Log` struct and passing it to the `Create` method. This is typically handled internally by Bifrost's logging plugins.
+
+```go
+logEntry := &logstore.Log{
+    ID:        "req-xyz123",
+    Timestamp: time.Now(),
+    Provider:  "openai",
+    Model:     "gpt-4",
+    Status:    "success",
+    // ... other fields
+}
+err := store.Create(ctx, logEntry)
+```
+
+### Searching and Filtering Logs
+
+The `SearchLogs` method provides a powerful way to query logs with fine-grained filters and pagination.
+
+```go
+// Define search criteria
+filters := logstore.SearchFilters{
+    Providers: []string{"openai", "anthropic"},
+    Status:    []string{"error"},
+    StartTime: &startTime, // time.Time pointer
+}
+
+pagination := logstore.PaginationOptions{
+    Limit:  50,
+    Offset: 0,
+    SortBy: "timestamp",
+    Order:  "desc",
+}
+
+// Execute the search
+results, err := store.SearchLogs(ctx, filters, pagination)
+if err != nil {
+    // Handle error
+}
+
+// Process the results
+for _, log := range results.Logs {
+    fmt.Printf("Found log: %s\n", log.ID)
+}
+
+// Access aggregated stats
+fmt.Printf("Total errors: %d\n", results.Stats.TotalRequests)
+```
+
+The LogStore is an indispensable tool for observability in Bifrost, providing the detailed audit trail needed to monitor, debug, and analyze AI application performance and behavior effectively.
--- a/docs/architecture/framework/model-catalog.mdx
+++ b/docs/architecture/framework/model-catalog.mdx
@@ -0,0 +1,412 @@
+---
+title: "Model Catalog"
+description: "A centralized system for managing model information, pricing, and capabilities across all supported AI providers."
+icon: "book-open"
+---
+
+The Model Catalog is a foundational component of Bifrost that provides a unified interface for managing AI models, including their pricing, capabilities, and availability. It serves as a centralized repository for all model-related information, enabling dynamic cost calculation, intelligent model routing, and efficient resource management.
+
+<Info>
+**Related Documentation**: The Model Catalog powers Bifrost's intelligent routing system. See [Provider Routing](/providers/provider-routing) for detailed examples of how governance and load balancing use the catalog to make routing decisions, including cross-provider scenarios and weighted routing via proxy providers.
+</Info>
+
+## Core Features
+
+### **1. Automatic Pricing Synchronization**
+The Model Catalog manages pricing data through a two-phase approach:
+
+**Startup Behavior:**
+- **With ConfigStore**: Downloads a pricing sheet from Maxim's datasheet, persists it to the config store, and then loads it into memory for fast lookups.
+- **Without ConfigStore**: Downloads the pricing sheet directly into memory on every startup.
+
+**Ongoing Synchronization:**
+- When ConfigStore is available, an automatic sync occurs every 24 hours to keep pricing data current.
+- All pricing data is cached in memory for O(1) lookup performance during cost calculations.
+
+This ensures that cost calculations always use the latest pricing information from AI providers while maintaining optimal performance.
+
+### **2. Multi-Modal Cost Calculation**
+It supports diverse pricing models across different AI operation types:
+- **Text Operations**: Token-based pricing for chat completions, text completions, responses, and embeddings. Cache-read/cache-write pricing applies to chat/text/responses when providers surface prompt cache token details.
+- **Audio Processing**: Character-based, token-based, and duration-based pricing for speech synthesis and transcription, with audio token detail breakdown. Speech responses populate `usage.input_chars` so speech can be billed by input characters in addition to tokens/duration.
+- **Image Processing**: Per-image (`input_cost_per_image`/`output_cost_per_image`), per-pixel (`input_cost_per_pixel`/`output_cost_per_pixel`), or token-based pricing with text/image token breakdown.
+- **Video Processing**: Token-based or duration-based pricing. Input can use prompt tokens or `input_cost_per_video_per_second`; output can use completion tokens or fall back to `output_cost_per_video_per_second` / `output_cost_per_second`.
+- **Reranking**: Input/output token pricing with search query cost support.
+- **Prompt Caching**: Separate rates for cache-read tokens (`cached_read_tokens`) and cache-creation tokens (`cached_write_tokens`), both surfaced under `prompt_tokens_details` (see [Prompt Cache Cost Calculation](#prompt-cache-cost-calculation)).
+
+### **3. Model Information Management**
+The Model Catalog maintains a pool of available models for each provider, populated from both pricing data and provider list models APIs. This enables:
+- **Model Discovery**: Listing all available models for a given provider
+- **Provider Discovery**: Finding all providers that support a specific model with intelligent cross-provider resolution (OpenRouter, Vertex, Groq, Bedrock)
+- **Model Validation**: Checking if a model is allowed for a provider based on allowed models lists (supports provider-prefixed entries)
+
+### **4. Intelligent Cache Cost Handling**
+It integrates with semantic caching to provide accurate cost calculations:
+- **Cache Hits**: Zero cost for direct cache hits, and embedding cost only for semantic matches.
+- **Cache Misses**: Combined cost of the base model usage plus the embedding generation cost for cache storage.
+
+### **5. Tiered Pricing Support**
+The system automatically applies different pricing rates for high-token contexts, reflecting real provider pricing models. Two tiers are supported: above 128k tokens and above 200k tokens, with the higher tier taking precedence when both are configured.
+
+## Configuration
+
+The `ModelCatalog` can be configured during initialization by passing a `Config` struct.
+
+```go
+type Config struct {
+	PricingURL          *string        `json:"pricing_url,omitempty"`
+	PricingSyncInterval *time.Duration `json:"pricing_sync_interval,omitempty"`
+}
+```
+
+- **`PricingURL`**: Overrides the default URL (`https://getbifrost.ai/datasheet`) for downloading the pricing sheet.
+- **`PricingSyncInterval`**: Customizes the interval for periodic pricing data synchronization. The default is 24 hours.
+
+This configuration is passed during the initialization of the `ModelCatalog`:
+
+```go
+config := &modelcatalog.Config{
+    PricingURL: "https://my-custom-url.com/pricing.json",
+}
+modelCatalog, err := modelcatalog.Init(context.Background(), config, configStore, logger)
+```
+
+## Architecture
+
+### ModelCatalog
+The `ModelCatalog` is the central component that handles all model and pricing operations:
+
+```go
+type ModelCatalog struct {
+    configStore configstore.ConfigStore
+    logger      schemas.Logger
+
+    pricingURL          string
+    pricingSyncInterval time.Duration
+
+    // In-memory cache for fast access
+    pricingData map[string]configstoreTables.TableModelPricing
+    mu          sync.RWMutex
+
+    modelPool map[schemas.ModelProvider][]string
+
+    // Background sync worker
+    syncTicker *time.Ticker
+    done       chan struct{}
+    wg         sync.WaitGroup
+    syncCtx    context.Context
+    syncCancel context.CancelFunc
+}
+```
+
+### Pricing Data Structure
+Each model's pricing information includes comprehensive cost metrics, supporting various modalities and tiered pricing:
+
+```go
+// PricingEntry represents a single model's pricing information.
+// The fields below are an excerpt — see framework/modelcatalog/main.go for the full definition.
+type PricingEntry struct {
+    BaseModel string `json:"base_model,omitempty"`
+    Provider  string `json:"provider"`
+    Mode      string `json:"mode"`
+
+    // Costs - Text
+    InputCostPerToken                 float64  `json:"input_cost_per_token"`
+    OutputCostPerToken                float64  `json:"output_cost_per_token"`
+    InputCostPerTokenBatches          *float64 `json:"input_cost_per_token_batches,omitempty"`
+    OutputCostPerTokenBatches         *float64 `json:"output_cost_per_token_batches,omitempty"`
+    InputCostPerTokenPriority         *float64 `json:"input_cost_per_token_priority,omitempty"`
+    OutputCostPerTokenPriority        *float64 `json:"output_cost_per_token_priority,omitempty"`
+    InputCostPerTokenAbove200kTokens  *float64 `json:"input_cost_per_token_above_200k_tokens,omitempty"`
+    OutputCostPerTokenAbove200kTokens *float64 `json:"output_cost_per_token_above_200k_tokens,omitempty"`
+
+    // Costs - Cache
+    CacheCreationInputTokenCost                        *float64 `json:"cache_creation_input_token_cost,omitempty"`
+    CacheReadInputTokenCost                            *float64 `json:"cache_read_input_token_cost,omitempty"`
+    CacheCreationInputTokenCostAbove200kTokens         *float64 `json:"cache_creation_input_token_cost_above_200k_tokens,omitempty"`
+    CacheReadInputTokenCostAbove200kTokens             *float64 `json:"cache_read_input_token_cost_above_200k_tokens,omitempty"`
+    CacheCreationInputTokenCostAbove1hr                *float64 `json:"cache_creation_input_token_cost_above_1hr,omitempty"`
+    CacheCreationInputTokenCostAbove1hrAbove200kTokens *float64 `json:"cache_creation_input_token_cost_above_1hr_above_200k_tokens,omitempty"`
+    CacheCreationInputAudioTokenCost                   *float64 `json:"cache_creation_input_audio_token_cost,omitempty"`
+    CacheReadInputTokenCostPriority                    *float64 `json:"cache_read_input_token_cost_priority,omitempty"`
+
+    // Costs - Image
+    InputCostPerImage                             *float64 `json:"input_cost_per_image,omitempty"`
+    InputCostPerPixel                             *float64 `json:"input_cost_per_pixel,omitempty"`
+    OutputCostPerImage                            *float64 `json:"output_cost_per_image,omitempty"`
+    OutputCostPerPixel                            *float64 `json:"output_cost_per_pixel,omitempty"`
+    OutputCostPerImagePremiumImage                *float64 `json:"output_cost_per_image_premium_image,omitempty"`
+    OutputCostPerImageAbove512x512Pixels          *float64 `json:"output_cost_per_image_above_512_and_512_pixels,omitempty"`
+    OutputCostPerImageAbove512x512PixelsPremium   *float64 `json:"output_cost_per_image_above_512_and_512_pixels_and_premium_image,omitempty"`
+    OutputCostPerImageAbove1024x1024Pixels        *float64 `json:"output_cost_per_image_above_1024_and_1024_pixels,omitempty"`
+    OutputCostPerImageAbove1024x1024PixelsPremium *float64 `json:"output_cost_per_image_above_1024_and_1024_pixels_and_premium_image,omitempty"`
+    OutputCostPerImageAbove2048x2048Pixels        *float64 `json:"output_cost_per_image_above_2048_and_2048_pixels,omitempty"`
+    OutputCostPerImageAbove4096x4096Pixels        *float64 `json:"output_cost_per_image_above_4096_and_4096_pixels,omitempty"`
+    OutputCostPerImageLowQuality                  *float64 `json:"output_cost_per_image_low_quality,omitempty"`
+    OutputCostPerImageMediumQuality               *float64 `json:"output_cost_per_image_medium_quality,omitempty"`
+    OutputCostPerImageHighQuality                 *float64 `json:"output_cost_per_image_high_quality,omitempty"`
+    OutputCostPerImageAutoQuality                 *float64 `json:"output_cost_per_image_auto_quality,omitempty"`
+    // Costs - Audio/Video
+    InputCostPerAudioToken      *float64 `json:"input_cost_per_audio_token,omitempty"`
+    InputCostPerAudioPerSecond  *float64 `json:"input_cost_per_audio_per_second,omitempty"`
+    InputCostPerSecond          *float64 `json:"input_cost_per_second,omitempty"`
+    InputCostPerVideoPerSecond  *float64 `json:"input_cost_per_video_per_second,omitempty"`
+    OutputCostPerAudioToken     *float64 `json:"output_cost_per_audio_token,omitempty"`
+    OutputCostPerVideoPerSecond *float64 `json:"output_cost_per_video_per_second,omitempty"`
+    OutputCostPerSecond         *float64 `json:"output_cost_per_second,omitempty"`
+
+    // Costs - Other
+    SearchContextCostPerQuery     *float64 `json:"search_context_cost_per_query,omitempty"`
+    CodeInterpreterCostPerSession *float64 `json:"code_interpreter_cost_per_session,omitempty"`
+}
+```
+
+## Usage in Plugins
+
+The Model Catalog is designed to be shared across all Bifrost plugins, providing consistent model information and validation logic for governance, load balancing, and other routing mechanisms.
+
+<Note>
+**Governance & Load Balancing**: Both plugins delegate model validation to the Model Catalog's `IsModelAllowedForProvider` method, ensuring consistent handling of cross-provider scenarios and provider-prefixed allowed models. See [Provider Routing](/providers/provider-routing) for configuration examples.
+</Note>
+
+### Initialization
+In Bifrost's gateway, the `ModelCatalog` is initialized once at the start and shared across all plugins:
+
+```go
+import "github.com/maximhq/bifrost/framework/modelcatalog"
+
+// Initialize model catalog with config store and logger
+modelCatalog, err := modelcatalog.Init(context.Background(), &modelcatalog.Config{}, configStore, logger)
+if err != nil {
+    return fmt.Errorf("failed to initialize model catalog: %w", err)
+}
+```
+
+### Basic Cost Calculation
+Calculate costs from a Bifrost response:
+
+```go
+// Calculate cost for a completed request
+cost := modelCatalog.CalculateCost(
+    result, // *schemas.BifrostResponse
+    nil,    // *PricingLookupScopes (nil = no scoped overrides)
+)
+
+logger.Info("Request cost: $%.6f", cost)
+```
+
+### Unified Cost Calculation
+`CalculateCost` is the single entry point for all cost calculations. It handles all request types, semantic cache billing, and tiered pricing automatically:
+
+```go
+// CalculateCost handles all cost scenarios including cache-aware pricing
+cost := modelCatalog.CalculateCost(result, nil) // *schemas.BifrostResponse, *PricingLookupScopes
+
+// Cache hits return 0 for direct hits, embedding cost for semantic matches
+// Cache misses return base model cost + embedding generation cost
+// Returns 0.0 if pricing data is not found (logs a debug message)
+```
+
+### Model Discovery
+The `ModelCatalog` provides several methods to query for model and provider information.
+
+#### Get Models for a Provider
+Retrieve a list of all models supported by a specific provider.
+```go
+openaiModels := modelCatalog.GetModelsForProvider(schemas.OpenAI)
+for _, model := range openaiModels {
+    logger.Info("Found OpenAI model: %s", model)
+}
+```
+
+**Thread-safe**: Uses read lock for concurrent access.
+
+#### Get Providers for a Model
+Find all providers that offer a specific model, including cross-provider resolution.
+
+```go
+gpt4Providers := modelCatalog.GetProvidersForModel("gpt-4o")
+for _, provider := range gpt4Providers {
+    logger.Info("gpt-4o is available from: %s", provider)
+}
+// Result: [openai, azure, groq] (includes cross-provider mappings)
+```
+
+**Cross-Provider Resolution**:
+
+This method implements intelligent cross-provider routing logic to discover all providers that can serve a model:
+
+1. **Direct Match**: Checks each provider's model list in `modelPool` for the exact model name
+2. **OpenRouter Format**: For models found in other providers, checks if `provider/model` exists in OpenRouter
+   - Example: `claude-3-5-sonnet` found in Anthropic → checks OpenRouter for `anthropic/claude-3-5-sonnet`
+3. **Vertex Format**: Similar check for Vertex with `provider/model` format
+4. **Groq OpenAI Compatibility**: For GPT models, checks if `openai/model` exists in Groq's catalog
+5. **Bedrock Claude Models**: For Claude models, flexible matching against Bedrock's full ARN format
+
+**Example**:
+```go
+providers := modelCatalog.GetProvidersForModel("claude-3-5-sonnet")
+// Returns: [anthropic, vertex, bedrock, openrouter]
+// Even though request was just "claude-3-5-sonnet" without provider prefix!
+```
+
+<Note>
+This cross-provider logic powers Bifrost's intelligent routing capabilities. See [Provider Routing](/providers/provider-routing#the-model-catalog) for detailed examples of how this enables features like weighted routing via proxy providers.
+</Note>
+
+#### Check Model Allowance for Provider
+Validate if a model is allowed for a specific provider based on an allowed models list. This method is used internally by governance and load balancing plugins.
+
+```go
+// ["*"] wildcard - uses catalog to determine support
+isAllowed := modelCatalog.IsModelAllowedForProvider(
+    schemas.OpenRouter,
+    "gpt-4o",
+    schemas.WhiteList{"*"}, // wildcard = check catalog
+)
+// Returns: true (catalog knows OpenRouter supports openai/gpt-4o)
+
+// Explicit allowedModels with provider prefix
+isAllowed := modelCatalog.IsModelAllowedForProvider(
+    schemas.OpenRouter,
+    "gpt-4o",
+    schemas.WhiteList{"openai/gpt-4o", "anthropic/claude-3-5-sonnet"},
+)
+// Returns: true (strips "openai/" prefix and matches "gpt-4o")
+
+// Explicit allowedModels without prefix
+isAllowed := modelCatalog.IsModelAllowedForProvider(
+    schemas.OpenAI,
+    "gpt-4o",
+    schemas.WhiteList{"gpt-4o", "gpt-4o-mini"},
+)
+// Returns: true (direct match)
+```
+
+**Behavior**:
+- **`["*"]` wildcard**: Delegates to `GetProvidersForModel` (includes cross-provider logic) — this is the "allow all via catalog" mode
+- **Non-empty explicit list**: Checks for both direct matches and provider-prefixed entries
+- **Empty slice (`[]string{}` / empty `schemas.WhiteList`)**: Returns `false` (deny-all) — mirrors the config deny-by-default semantics
+
+<Note>
+In `config.json` and the governance API, `allowed_models: []` (empty array) means **deny all models** (deny-by-default, v1.5.0+). The Go helper `IsModelAllowedForProvider` behaves the same way: an empty `allowedModels` slice also returns `false`. Use `["*"]` to allow all models validated through the catalog.
+</Note>
+  - Direct: `"gpt-4o"` matches `"gpt-4o"`
+  - Prefixed: `"openai/gpt-4o"` matches request for `"gpt-4o"` (prefix stripped)
+
+**Use Cases**:
+- **Governance Routing**: Validate if a model request is allowed for a provider configuration
+- **Load Balancing**: Filter providers based on allowed models before performance scoring
+- **Virtual Key Validation**: Check if a model can be used with a specific virtual key's provider configs
+
+<Tip>
+This method is the central validation point for both governance and load balancing plugins, ensuring consistent model allowance logic across all routing mechanisms. It handles all edge cases including proxy providers (OpenRouter, Vertex) and provider-prefixed model entries.
+</Tip>
+
+#### Dynamically Add Models
+You can dynamically add models to the catalog's pool from a `v1/models` compatible response structure. This is useful for providers that expose a model list endpoint.
+```go
+// response is *schemas.BifrostListModelsResponse
+modelCatalog.AddModelDataToPool(response)
+```
+This is automatically done in Bifrost gateway initialization for all providers that are supported by Bifrost.
+
+**When to use**:
+- After fetching models from a provider's `/v1/models` endpoint
+- When a new provider is dynamically added at runtime
+- For testing with custom model lists
+### Reloading Configuration
+You can reload the pricing configuration at runtime if you need to change the pricing URL or sync interval.
+```go
+newConfig := &modelcatalog.Config{
+    PricingSyncInterval: 12 * time.Hour,
+}
+err := modelCatalog.UpdateSyncConfig(ctx, newConfig)
+```
+
+## Error Handling and Fallbacks
+
+The Model Catalog handles missing pricing data gracefully with intelligent fallbacks:
+
+```go
+// resolvePricing resolves the pricing entry for a model, trying deployment as fallback.
+func (mc *ModelCatalog) resolvePricing(provider, model, deployment string, requestType schemas.RequestType) *configstoreTables.TableModelPricing {
+	pricing, exists := mc.getPricing(model, provider, requestType)
+	if exists {
+		return pricing
+	}
+	// If pricing not found for model, try the deployment name
+	if deployment != "" {
+		pricing, exists = mc.getPricing(deployment, provider, requestType)
+		if exists {
+			return pricing
+		}
+	}
+	return nil
+}
+
+// getPricing returns pricing information for a model (thread-safe).
+// It implements a multi-step fallback chain:
+//   1. Direct lookup by model + provider + mode
+//   2. Gemini → Vertex provider fallback
+//   3. Vertex "provider/model" prefix stripping
+//   4. Bedrock "anthropic." prefix addition for Claude models
+//   5. Responses → Chat mode fallback (at each step)
+//   6. ImageEdit / ImageVariation → ImageGeneration mode fallback
+func (mc *ModelCatalog) getPricing(model, provider string, requestType schemas.RequestType) (*configstoreTables.TableModelPricing, bool) {
+	mc.mu.RLock()
+	defer mc.mu.RUnlock()
+
+	mode := normalizeRequestType(requestType)
+
+	pricing, ok := mc.pricingData[makeKey(model, provider, mode)]
+	if ok {
+		return &pricing, true
+	}
+
+	// Provider-specific fallbacks (Gemini→Vertex, Vertex prefix strip, Bedrock anthropic. prefix)
+	// Each fallback also tries Responses→Chat mode if applicable
+	// ...
+
+	// Final fallback: Responses → Chat mode for any provider
+	if requestType == schemas.ResponsesRequest || requestType == schemas.ResponsesStreamRequest {
+		pricing, ok = mc.pricingData[makeKey(model, provider, normalizeRequestType(schemas.ChatCompletionRequest))]
+		if ok {
+			return &pricing, true
+		}
+	}
+
+	return nil, false
+}
+
+// When pricing is not found, CalculateCost returns 0.0 and logs a debug message.
+// This ensures operations continue smoothly without billing failures.
+```
+
+
+## Cleanup and Lifecycle Management
+
+Properly clean up resources when shutting down:
+
+```go
+// Cleanup model catalog resources
+defer func() {
+    if err := modelCatalog.Cleanup(); err != nil {
+        logger.Error("Failed to cleanup model catalog: %v", err)
+    }
+}()
+```
+
+## Thread Safety
+
+All `ModelCatalog` operations are thread-safe, making it suitable for concurrent usage across multiple plugins and goroutines. The internal pricing data cache uses read-write mutexes for optimal performance during frequent lookups.
+
+## Best Practices
+
+1. **Shared Instance**: Use a single `ModelCatalog` instance across all plugins to avoid redundant data synchronization.
+2. **Error Handling**: Always handle the case where pricing returns 0.0 due to missing model data.
+3. **Logging**: Monitor pricing sync failures and missing model warnings in production.
+4. **Cache Awareness**: Use `CalculateCost` which automatically handles cache hits/misses and embedding costs.
+5. **Resource Cleanup**: Always call `Cleanup()` during application shutdown to prevent resource leaks.
+
+The Model Catalog provides a robust, production-ready foundation for implementing billing, budgeting, and cost monitoring features in Bifrost plugins.
--- a/docs/architecture/framework/streaming.mdx
+++ b/docs/architecture/framework/streaming.mdx
@@ -0,0 +1,130 @@
+---
+title: "Streaming"
+description: "Framework utility for aggregating and processing real-time stream chunks from AI providers"
+icon: "water"
+---
+
+## Overview
+
+The **Streaming** package (`framework/streaming`) is a core utility within Bifrost designed to handle real-time data streams from AI providers. It provides a robust and efficient mechanism for plugins like [Logging](/features/observability/default), [OTel](/features/observability/otel), and [Maxim](/features/observability/maxim) to process, aggregate, and format streaming responses for chat completions, transcriptions, and other real-time AI interactions.
+
+```mermaid
+sequenceDiagram
+    participant Plugin
+    participant BC as Bifrost Core
+    participant Accumulator
+
+    BC->>Plugin: PreLLMHook(StreamingRequest)
+    activate Plugin
+    Plugin->>Accumulator: CreateStreamAccumulator(requestID)
+    activate Accumulator
+    Accumulator-->>Plugin: ack
+    deactivate Accumulator
+    Plugin-->>BC: return
+    deactivate Plugin
+
+    loop For each response chunk
+        BC->>Plugin: PostLLMHook(StreamChunk)
+        activate Plugin
+        Plugin->>Accumulator: ProcessStreamingResponse(StreamChunk)
+        activate Accumulator
+        alt Is NOT Final Chunk
+            Accumulator-->>Plugin: return {Type: Delta}
+        else Is Final Chunk
+            Accumulator->>Accumulator: buildCompleteResponse()
+            Accumulator-->>Plugin: return {Type: Final, CompleteData}
+        end
+        deactivate Accumulator
+        Plugin-->>BC: return
+        deactivate Plugin
+    end
+
+```
+
+Its primary purpose is to simplify the complexity of handling chunked data, ensuring that plugins can work with complete, well-structured responses without needing to implement their own aggregation logic.
+
+
+## How It Works
+
+The streaming package uses an `Accumulator` to manage the lifecycle of a streaming operation. This process is designed to be highly efficient, using `sync.Pool` to reuse objects and minimize memory allocations.
+
+1.  **Initialization**: When a plugin that needs to process streams (like `logging` or `otel`) is initialized, it creates a new `streaming.Accumulator`.
+
+2.  **Stream Start**: In the `PreLLMHook` phase of a request, if the request is identified as a streaming type, the plugin calls `accumulator.CreateStreamAccumulator(requestID, timestamp)` to prepare a dedicated buffer for the incoming chunks of that request.
+
+3.  **Chunk Processing**: In the `PostLLMHook` phase, as each chunk of the streaming response arrives, the plugin passes it to `accumulator.ProcessStreamingResponse()`.
+    *   For each `delta` chunk, the accumulator appends it to the buffer associated with the request ID.
+    *   The accumulator handles different types of streams, including chat, audio, and transcriptions, using specialized logic to correctly piece together the data. For example, it accumulates text deltas, tool call argument deltas, and other parts of the message.
+
+4.  **Finalization**: When the final chunk of the stream is received (indicated by a `finish_reason` or other provider-specific signal), `ProcessStreamingResponse` performs the final assembly.
+    *   It reconstructs the complete `ChatMessage` or other response object from all the stored chunks.
+    *   It calculates total token usage, cost, and latency.
+    *   It returns a `ProcessedStreamResponse` object with `StreamResponseTypeFinal` and the complete, structured `AccumulatedData`.
+
+5.  **Cleanup**: Once the final response is processed, the accumulator cleans up all buffered chunks for that request ID, returning them to the `sync.Pool` for reuse.
+
+## Key Components
+
+### `Accumulator`
+
+The central component of the package. It is a thread-safe manager that:
+-   Tracks stream chunks for multiple concurrent requests using a `sync.Map`.
+-   Uses `sync.Pool` to recycle `*StreamChunk` objects, reducing garbage collection overhead.
+-   Provides methods to add chunks (`addChatStreamChunk`, `addAudioStreamChunk`, etc.).
+-   Includes a periodic cleanup worker to remove stale accumulators for incomplete or orphaned requests.
+
+### `ProcessStreamingResponse`
+
+This is the main entry point for plugins to process stream data. It inspects the response type and delegates to the appropriate handler:
+-   `processChatStreamingResponse`
+-   `processAudioStreamingResponse`
+-   `processTranscriptionStreamingResponse`
+-   `processResponsesStreamingResponse`
+
+It returns a `ProcessedStreamResponse`, which indicates whether the chunk is a `delta` or the `final` aggregated response.
+
+### Stream-Specific Builders
+
+The package includes internal logic to correctly build complete messages from chunks. For example, `buildCompleteMessageFromChatStreamChunks` iterates through the collected `ChatStreamChunk` objects, appending content deltas and assembling tool calls into a final, coherent `schemas.ChatMessage`.
+
+## Usage Example
+
+The following snippet from the `logging` plugin shows how the `streaming` package is used in practice within a plugin's `PostLLMHook`.
+
+```go
+// In plugins/logging/main.go
+
+func (p *LoggerPlugin) PostLLMHook(ctx *schemas.BifrostContext, result *schemas.BifrostResponse, bifrostErr *schemas.BifrostError) (*schemas.BifrostResponse, *schemas.BifrostError, error) {
+    // ... setup, get requestID ...
+
+    go func() {
+        // ...
+        if bifrost.IsStreamRequestType(requestType) {
+            p.logger.Debug("[logging] processing streaming response")
+
+            // 1. Pass the response chunk to the accumulator
+            streamResponse, err := p.accumulator.ProcessStreamingResponse(ctx, result, bifrostErr)
+            if err != nil {
+                p.logger.Error("failed to process streaming response: %v", err)
+            // 2. Check if this is the final, aggregated response
+            } else if streamResponse != nil && streamResponse.Type == streaming.StreamResponseTypeFinal {
+                // Prepare final log data
+                logMsg.Operation = LogOperationStreamUpdate
+                logMsg.StreamResponse = streamResponse
+                
+                // 3. Update the log entry with the complete data
+                processingErr := retryOnNotFound(p.ctx, func() error {
+                    return p.updateStreamingLogEntry(p.ctx, logMsg.RequestID, logMsg.SemanticCacheDebug, logMsg.StreamResponse, true)
+                })
+                
+                // ... handle errors and callbacks ...
+            }
+        }
+        // ... handle non-streaming responses ...
+    }()
+
+    return result, bifrostErr, nil
+}
+```
+
+This demonstrates how a plugin can remain agnostic to the details of stream aggregation and simply react to the final, complete data returned by the `streaming` package. This greatly simplifies plugin development and ensures consistent data handling across the framework.
--- a/docs/architecture/framework/vector-store.mdx
+++ b/docs/architecture/framework/vector-store.mdx
@@ -0,0 +1,185 @@
+---
+title: "Vector Store"
+description: "Vector database implementations for semantic search, embeddings storage, and AI-powered features in Bifrost."
+icon: "diagram-project"
+---
+
+## Overview
+
+The VectorStore is a core component of Bifrost's framework package that provides a unified interface for vector database operations. It enables plugins to store embeddings, perform similarity searches, and build AI-powered features like semantic caching, content recommendations, and knowledge retrieval.
+
+**Key Capabilities:**
+- **Vector Similarity Search**: Find semantically similar content using embeddings
+- **Namespace Management**: Organize data into separate collections with custom schemas
+- **Flexible Filtering**: Query data with complex filters and pagination
+- **Multiple Backends**: Support for Weaviate, Redis/Valkey-compatible, Qdrant, and Pinecone vector stores
+- **High Performance**: Optimized for production workloads
+- **Scalable Storage**: Handle millions of vectors with efficient indexing
+
+## VectorStore Interface Usage
+
+### Creating Namespaces
+Create collections (namespaces) with custom schemas:
+
+```go
+// Define properties for your data
+properties := map[string]vectorstore.VectorStoreProperties{
+    "content": {
+        DataType:    vectorstore.VectorStorePropertyTypeString,
+        Description: "The main content text",
+    },
+    "category": {
+        DataType:    vectorstore.VectorStorePropertyTypeString,
+        Description: "Content category",
+    },
+    "tags": {
+        DataType:    vectorstore.VectorStorePropertyTypeStringArray,
+        Description: "Content tags",
+    },
+}
+
+// Create namespace
+err := store.CreateNamespace(ctx, "my_content", 1536, properties)
+if err != nil {
+    log.Fatal("Failed to create namespace:", err)
+}
+```
+
+### Storing Data with Embeddings
+Add data with vector embeddings for similarity search:
+
+```go
+// Your embedding data (typically from an embedding model)
+embedding := []float32{0.1, 0.2, 0.3 } // example 3-dimensional vector
+
+// Metadata associated with this vector
+metadata := map[string]interface{}{
+    "content":  "This is my content text",
+    "category": "documentation",
+    "tags":     []string{"guide", "tutorial"},
+}
+
+// Store in vector database
+err := store.Add(ctx, "my_content", "unique-id-123", embedding, metadata)
+if err != nil {
+    log.Fatal("Failed to add data:", err)
+}
+```
+
+### Similarity Search
+Find similar content using vector similarity:
+
+```go
+// Query embedding (from user query)
+queryEmbedding := []float32{0.15, 0.25, 0.35, ...}
+
+// Optional filters
+filters := []vectorstore.Query{
+    {
+        Field:    "category",
+        Operator: vectorstore.QueryOperatorEqual,
+        Value:    "documentation",
+    },
+}
+
+// Perform similarity search
+results, err := store.GetNearest(
+    ctx,
+    "my_content",        // namespace
+    queryEmbedding,      // query vector
+    filters,             // optional filters
+    []string{"content", "category"}, // fields to return
+    0.7,                 // similarity threshold (0-1)
+    10,                  // limit
+)
+
+for _, result := range results {
+    fmt.Printf("Score: %.3f, Content: %s\n", *result.Score, result.Properties["content"])
+}
+```
+
+### Data Retrieval and Management
+Query and manage stored data:
+
+```go
+// Get specific item by ID
+item, err := store.GetChunk(ctx, "my_content", "unique-id-123")
+if err != nil {
+    log.Fatal("Failed to get item:", err)
+}
+
+// Get all items with filtering and pagination
+allResults, cursor, err := store.GetAll(
+    ctx,
+    "my_content",
+    []vectorstore.Query{
+        {Field: "category", Operator: vectorstore.QueryOperatorEqual, Value: "documentation"},
+    },
+    []string{"content", "tags"}, // select fields
+    nil,  // cursor for pagination
+    50,   // limit
+)
+
+// Delete items
+err = store.Delete(ctx, "my_content", "unique-id-123")
+```
+
+## Supported Vector Stores
+
+<CardGroup cols={2}>
+  <Card title="Weaviate" icon="database" href="/integrations/vector-databases/weaviate">
+    Production-ready vector database with gRPC support.
+  </Card>
+  <Card title="Redis / Valkey" icon="database" href="/integrations/vector-databases/redis">
+    High-performance in-memory vector store.
+  </Card>
+  <Card title="Qdrant" icon="database" href="/integrations/vector-databases/qdrant">
+    Rust-based vector search engine with advanced filtering.
+  </Card>
+  <Card title="Pinecone" icon="database" href="/integrations/vector-databases/pinecone">
+    Managed vector database with serverless options.
+  </Card>
+</CardGroup>
+
+---
+
+## Use Cases
+
+### [Semantic Caching](../../features/semantic-caching)
+Build intelligent caching systems that understand query intent rather than just exact matches.
+
+**Applications:**
+- Customer support systems with FAQ matching
+- Code completion and documentation search  
+- Content management with semantic deduplication
+
+### Knowledge Base & Search
+Create intelligent search systems that understand user queries contextually.
+
+**Applications:**
+- Document search and retrieval systems
+- Product recommendation engines
+- Research paper and knowledge discovery platforms
+
+### Content Classification
+Automatically categorize and tag content based on semantic similarity.
+
+**Applications:**
+- Email classification and routing
+- Content moderation and filtering
+- News article categorization and clustering
+
+### Recommendation Systems
+Build personalized recommendation engines using vector similarity.
+
+**Applications:**
+- Product recommendations based on user preferences
+- Content suggestions for media platforms
+- Similar document or article recommendations
+
+## Related Documentation
+
+| Topic | Documentation | Description |
+|-------|---------------|-------------|
+| **Framework Overview** | [What is Framework](./what-is-framework) | Understanding the framework package and VectorStore interface |
+| **Semantic Caching** | [Semantic Caching](../../features/semantic-caching) | Using VectorStore for AI response caching |
--- a/docs/architecture/framework/what-is-framework.mdx
+++ b/docs/architecture/framework/what-is-framework.mdx
@@ -0,0 +1,49 @@
+---
+title: "What is framework?"
+description: "Framework is Bifrost's shared storage and utilities SDK package that provides common database interfaces and logic for the plugin ecosystem."
+icon: "play"
+---
+
+Framework serves as the foundation layer that enables plugins to implement consistent data management patterns without reinventing storage solutions.
+
+## Installation
+
+```bash
+go get github.com/maximhq/bifrost/framework
+```
+
+## Purpose
+
+The framework package was designed to solve a fundamental challenge in plugin development: providing standardized, reliable storage and utility interfaces that plugins can depend on. Instead of each plugin implementing its own database logic, configuration management, or logging systems, framework offers battle-tested, shared implementations.
+
+## Core Components
+
+### ConfigStore
+A unified configuration persistence layer that provides consistent storage patterns for plugin settings, provider configurations, and system state. Plugins can leverage `ConfigStore` to manage their configuration data with built-in CRUD operations, transaction support, and schema management.
+
+### LogStore
+Standardized logging and audit trail capabilities that enable plugins to implement observability features. `LogStore` provides structured logging, search and filtering capabilities, pagination support, and automated data retention policies.
+
+### VectorStore
+Vector database operations designed for AI-powered plugins that need semantic capabilities. `VectorStore` handles embeddings management, similarity search operations, and namespace isolation, making it easy for plugins to add features like semantic caching, content search, and AI-powered recommendations.
+
+### Pricing Module
+Cost calculation and model pricing management tools that help plugins implement billing and usage tracking features. The pricing system supports multi-tier pricing models, real-time usage tracking, and dynamic pricing updates.
+
+## Benefits for Plugin Developers
+
+**Shared Logic**: Common patterns for configuration, logging, and data management are provided out-of-the-box, reducing development time and ensuring consistency across plugins.
+
+**Standardized Interfaces**: All framework components use consistent APIs, making it easier for developers to work across different plugins and maintain code quality.
+
+**Pluggable Architecture**: The interface-based design allows different storage backends to be used without changing plugin code, providing flexibility for different deployment scenarios.
+
+**Transaction Support**: Built-in transaction management and error handling ensure data integrity and provide reliable rollback capabilities.
+
+**Production Ready**: Framework components are battle-tested in production environments and include features like connection pooling, retry logic, and performance optimizations.
+
+## Integration with Bifrost
+
+Framework seamlessly integrates with the Bifrost ecosystem, providing the storage foundation that powers core features like provider management, request logging, semantic caching, and governance. When plugins use framework components, they automatically participate in Bifrost's unified data management strategy.
+
+The framework package enables plugin developers to focus on their core business logic while relying on robust, shared infrastructure for all storage and utility needs.
--- a/docs/architecture/plugins/governance.mdx
+++ b/docs/architecture/plugins/governance.mdx
--- a/docs/architecture/plugins/jsonparser.mdx
+++ b/docs/architecture/plugins/jsonparser.mdx
--- a/docs/architecture/plugins/logging.mdx
+++ b/docs/architecture/plugins/logging.mdx
--- a/docs/architecture/plugins/maxim.mdx
+++ b/docs/architecture/plugins/maxim.mdx
--- a/docs/architecture/plugins/mocker.mdx
+++ b/docs/architecture/plugins/mocker.mdx
--- a/docs/architecture/plugins/semantic-cache.mdx
+++ b/docs/architecture/plugins/semantic-cache.mdx
--- a/docs/architecture/plugins/telemetry.mdx
+++ b/docs/architecture/plugins/telemetry.mdx
--- a/docs/architecture/transports/in-memory-store.mdx
+++ b/docs/architecture/transports/in-memory-store.mdx