first commit

2026-04-26 21:52:23 +03:00
commit 880f412e2c
2662 changed files with 866266 additions and 0 deletions
--- a/docs/architecture/core/concurrency.mdx
+++ b/docs/architecture/core/concurrency.mdx
@@ -0,0 +1,764 @@
+---
+title: "Concurrency"
+description: "Deep dive into Bifrost's advanced concurrency architecture - worker pools, goroutine management, channel-based communication, and resource isolation patterns."
+icon: "traffic-light"
+---
+
+## Concurrency Philosophy
+
+### **Core Principles**
+
+| Principle                          | Implementation                         | Benefit                                |
+| ---------------------------------- | -------------------------------------- | -------------------------------------- |
+| **Provider Isolation**          | Independent worker pools per provider  | Fault tolerance, no cascade failures   |
+| **Channel-Based Communication** | Go channels for all async operations   | Type-safe, deadlock-free communication |
+| **Resource Pooling**            | Object pools with lifecycle management | Predictable memory usage, minimal GC   |
+| **Non-Blocking Operations**     | Async processing throughout pipeline   | Maximum concurrency, no blocking waits |
+| **Backpressure Handling**       | Configurable buffers and flow control  | Graceful degradation under load        |
+
+### **Threading Architecture Overview**
+
+```mermaid
+graph TB
+    subgraph "Main Thread"
+        Main[Main Process<br/>HTTP Server]
+        Router[Request Router<br/>Goroutine]
+        PluginMgr[Plugin Manager<br/>Goroutine]
+    end
+
+    subgraph "Provider Worker Pools"
+        subgraph "OpenAI Pool"
+            OAI1[Worker 1<br/>Goroutine]
+            OAI2[Worker 2<br/>Goroutine]
+            OAIN[Worker N<br/>Goroutine]
+        end
+        subgraph "Anthropic Pool"
+            ANT1[Worker 1<br/>Goroutine]
+            ANT2[Worker 2<br/>Goroutine]
+            ANTN[Worker N<br/>Goroutine]
+        end
+        subgraph "Bedrock Pool"
+            BED1[Worker 1<br/>Goroutine]
+            BED2[Worker 2<br/>Goroutine]
+            BEDN[Worker N<br/>Goroutine]
+        end
+    end
+
+    subgraph "Memory Pools"
+        ChannelPool[Channel Pool<br/>sync.Pool]
+        MessagePool[Message Pool<br/>sync.Pool]
+        ResponsePool[Response Pool<br/>sync.Pool]
+    end
+
+    Main --> Router
+    Router --> PluginMgr
+    PluginMgr --> OAI1
+    PluginMgr --> ANT1
+    PluginMgr --> BED1
+
+    OAI1 --> ChannelPool
+    ANT1 --> MessagePool
+    BED1 --> ResponsePool
+```
+
+---
+
+## Worker Pool Architecture
+
+### **Provider-Isolated Worker Pools**
+
+```mermaid
+stateDiagram-v2
+    [*] --> PoolInit: Worker Pool Creation
+    PoolInit --> WorkerSpawn: Spawn Worker Goroutines
+    WorkerSpawn --> Listening: Workers Listen on Channels
+
+    Listening --> Processing: Job Received
+    Processing --> API_Call: Provider API Request
+    API_Call --> Response: Process Response
+    Response --> Listening: Job Complete
+
+    Listening --> Shutdown: Graceful Shutdown
+    Processing --> Shutdown: Complete Current Job
+    Shutdown --> [*]: Pool Destroyed
+```
+
+**Worker Pool Architecture:**
+
+The worker pool system maintains a sophisticated balance between resource efficiency and performance isolation:
+
+**Key Components:**
+
+- **Worker Pool Management** - Pre-spawned workers reduce startup latency
+- **Job Queue System** - Buffered channels provide smooth load balancing
+- **Resource Pools** - HTTP clients and API keys are pooled for efficiency
+- **Health Monitoring** - Circuit breakers detect and isolate failing providers
+- **Graceful Shutdown** - Workers complete current jobs before terminating
+
+**Startup Process:**
+
+1. **Worker Pre-spawning** - Workers are created during pool initialization
+2. **Channel Setup** - Job queues and worker channels are established
+3. **Resource Allocation** - HTTP clients and API keys are distributed
+4. **Health Checks** - Initial connectivity tests verify provider availability
+5. **Ready State** - Pool becomes available for request processing
+
+**Job Dispatch Logic:**
+
+- **Round-Robin Assignment** - Jobs are distributed evenly across available workers
+- **Load Balancing** - Worker availability determines job assignment
+- **Overflow Handling** - Excess jobs are queued or dropped based on configuration
+
+### **Worker Lifecycle Management**
+
+```mermaid
+sequenceDiagram
+    participant Pool
+    participant Worker
+    participant HTTPClient
+    participant Provider
+    participant Metrics
+
+    Pool->>Worker: Start()
+    Worker->>Worker: Initialize HTTP Client
+    Worker->>Pool: Ready Signal
+
+    loop Job Processing
+        Pool->>Worker: Job Assignment
+        Worker->>HTTPClient: Prepare Request
+        HTTPClient->>Provider: API Call
+        Provider-->>HTTPClient: Response
+        HTTPClient-->>Worker: Parsed Response
+        Worker->>Metrics: Record Performance
+        Worker->>Pool: Job Complete
+    end
+
+    Pool->>Worker: Shutdown Signal
+    Worker->>Worker: Complete Current Job
+    Worker-->>Pool: Shutdown Confirmed
+````
+
+---
+
+## Channel-Based Communication
+
+### **Channel Architecture**
+
+```mermaid
+graph TB
+    subgraph "Channel Types"
+        JobQueue[Job Queue<br/>Buffered Channel]
+        WorkerPool[Worker Pool<br/>Buffered Channel]
+        ResultChan[Result Channel<br/>Buffered Channel]
+        QuitChan[Quit Channel<br/>Unbuffered]
+    end
+
+    subgraph "Flow Control"
+        BackPressure[Backpressure<br/>Buffer Limits]
+        Timeout[Timeout<br/>Context Cancellation]
+        Graceful[Graceful Shutdown<br/>Channel Closing]
+    end
+
+    JobQueue --> BackPressure
+    WorkerPool --> Timeout
+    ResultChan --> Graceful
+```
+
+**Channel Configuration Principles:**
+
+Bifrost's channel system balances throughput and memory usage through careful buffer sizing:
+
+**Job Queuing Configuration:**
+
+- **Job Queue Buffer** - Sized based on expected burst traffic (100-1000 jobs)
+- **Worker Pool Size** - Matches provider concurrency limits (10-100 workers)
+- **Result Buffer** - Accommodates response processing delays (50-500 responses)
+
+**Flow Control Parameters:**
+
+- **Queue Wait Limits** - Maximum time jobs wait before timeout (1-10 seconds)
+- **Processing Timeouts** - Per-job execution limits (30-300 seconds)
+- **Shutdown Timeouts** - Graceful termination periods (5-30 seconds)
+
+**Backpressure Policies:**
+
+- **Drop Policy** - Discard excess jobs when queues are full
+- **Block Policy** - Wait for queue space with timeout
+- **Error Policy** - Immediately return error for full queues
+
+**Channel Type Selection:**
+
+- **Buffered Channels** - Used for async job processing and result handling
+- **Unbuffered Channels** - Used for synchronization signals (quit, done)
+- **Context Cancellation** - Used for timeout and cancellation propagation
+
+### **Backpressure and Flow Control**
+
+```mermaid
+flowchart TD
+    Request[Incoming Request] --> QueueCheck{Queue Full?}
+    QueueCheck -->|No| Queue[Add to Queue]
+    QueueCheck -->|Yes| Policy{Drop Policy?}
+
+    Policy -->|Drop| Drop[Drop Request<br/>Return Error]
+    Policy -->|Block| Block[Block Until Space<br/>With Timeout]
+    Policy -->|Error| Error[Return Queue Full Error]
+
+    Queue --> Worker[Assign to Worker]
+    Block --> TimeoutCheck{Timeout?}
+    TimeoutCheck -->|Yes| Error
+    TimeoutCheck -->|No| Queue
+
+    Worker --> Processing[Process Request]
+    Processing --> Complete[Complete]
+
+    Drop --> Client[Client Response]
+    Error --> Client
+    Complete --> Client
+````
+
+**Backpressure Implementation Strategy:**
+
+The backpressure system protects Bifrost from being overwhelmed while maintaining service availability:
+
+**Non-Blocking Job Submission:**
+
+- **Immediate Queue Check** - Jobs are submitted without blocking on queue space
+- **Success Path** - Available queue space allows immediate job acceptance
+- **Overflow Detection** - Full queues trigger backpressure policies
+- **Metrics Collection** - All queue operations are tracked for monitoring
+
+**Backpressure Policy Execution:**
+
+- **Drop Policy** - Immediately rejects excess jobs with meaningful error messages
+- **Block Policy** - Waits for queue space with configurable timeout limits
+- **Error Policy** - Returns queue full errors for immediate client feedback
+- **Metrics Tracking** - Dropped, blocked, and successful submissions are measured
+
+**Timeout Management:**
+
+- **Context-Based Timeouts** - All blocking operations respect timeout boundaries
+- **Graceful Degradation** - Timeouts result in controlled error responses
+- **Resource Protection** - Prevents goroutine leaks from infinite waits
+
+```go
+  case pool.jobQueue <- job:
+  pool.metrics.IncQueuedJobs()
+  return nil
+  case <-ctx.Done():
+  pool.metrics.IncTimeoutJobs()
+  return errors.New("queue full, timeout waiting")
+  }
+
+          case "error":
+              pool.metrics.IncRejectedJobs()
+              return errors.New("queue full, job rejected")
+
+          default:
+              return errors.New("unknown queue policy")
+          }
+      }
+  }
+```
+
+---
+
+## Memory Pool Concurrency
+
+### **Thread-Safe Object Pools**
+
+```mermaid
+graph TD
+    subgraph "sync.Pool Lifecycle"
+        direction LR
+        GetObject[Get Object<br/>sync.Pool.Get]
+        PoolCheck{Is Pool Empty?}
+        NewObject[New Object<br/>Factory Function]
+        UseObject[Use Object<br/>Application Logic]
+        ResetObject[Reset Object<br/>Clear State]
+        ReturnObject[Return Object<br/>sync.Pool.Put]
+
+        GetObject --> PoolCheck
+        PoolCheck -- Yes --> NewObject
+        PoolCheck -- No --> UseObject
+        NewObject --> UseObject
+        UseObject --> ResetObject
+        ResetObject --> ReturnObject
+        ReturnObject --> GetObject
+    end
+
+    subgraph "GC Interaction"
+        direction TB
+        GCRun[GC Runs]
+        PoolCleanup[Pool Cleanup<br>Removes idle objects]
+        
+        GCRun --> PoolCleanup
+    end
+```
+
+**Thread-Safe Pool Architecture:**
+
+Bifrost's memory pool system ensures thread-safe object reuse across multiple goroutines:
+
+**Pool Structure Design:**
+
+- **Multiple Pool Types** - Separate pools for channels, messages, responses, and buffers
+- **Factory Functions** - Dynamic object creation when pools are empty
+- **Statistics Tracking** - Comprehensive metrics for pool performance monitoring
+- **Thread Safety** - Synchronized access using Go's sync.Pool and read-write mutexes
+
+**Object Lifecycle Management:**
+
+- **Pool Initialization** - Factory functions define object creation patterns
+- **Unique Identification** - Each pooled object gets a unique ID for tracking
+- **Timestamp Tracking** - Creation, acquisition, and return times are recorded
+- **Reusability Flags** - Objects can be marked as non-reusable for single-use scenarios
+
+**Acquisition Strategy:**
+
+- **Request Tracking** - All pool requests are counted for monitoring
+- **Hit/Miss Tracking** - Pool effectiveness is measured through hit ratios
+- **Fallback Creation** - New objects are created when pools are empty
+- **Performance Metrics** - Acquisition times and patterns are monitored
+
+**Return and Reset Process:**
+
+- **State Validation** - Only reusable objects are returned to pools
+- **Object Reset** - All object state is cleared before returning to pool
+- **Return Tracking** - Return operations are counted and timed
+- **Pool Replenishment** - Returned objects become available for reuse
+
+### **Pool Performance Monitoring**
+
+Comprehensive metrics provide insights into pool efficiency and system health:
+
+**Usage Statistics Collection:**
+- **Request Counting** - Track total pool requests by object type
+- **Creation Tracking** - Monitor new object allocations when pools are empty
+- **Hit/Miss Ratios** - Measure pool effectiveness through reuse rates
+- **Return Monitoring** - Track successful object returns to pools
+
+**Performance Metrics Analysis:**
+- **Acquisition Times** - Measure how long it takes to get objects from pools
+- **Reset Performance** - Track time spent cleaning objects for reuse
+- **Hit Ratio Calculation** - Determine percentage of requests served from pools
+- **Memory Efficiency** - Calculate memory savings from object reuse
+
+**Key Performance Indicators:**
+- **Channel Pool Hit Ratio** - Typically 85-95% in steady state
+- **Message Pool Efficiency** - Usually 80-90% reuse rate
+- **Response Pool Utilization** - Often 70-85% hit ratio
+- **Total Memory Savings** - Measured reduction in garbage collection pressure
+
+**Monitoring Integration:**
+- **Thread-Safe Access** - All metrics collection is synchronized
+- **Real-Time Updates** - Statistics are updated with each pool operation
+- **Export Capability** - Metrics are available in JSON format for monitoring systems
+- **Alerting Support** - Low hit ratios can trigger performance alerts
+
+---
+
+## Goroutine Management
+
+### **Goroutine Lifecycle Patterns**
+
+```mermaid
+stateDiagram-v2
+    [*] --> Created: go routine()
+    Created --> Running: Execute Function
+    Running --> Waiting: Channel/Mutex Block
+    Waiting --> Running: Unblocked
+    Running --> Syscall: Network I/O
+    Syscall --> Running: I/O Complete
+    Running --> GCAssist: GC Triggered
+    GCAssist --> Running: GC Complete
+    Running --> Terminated: Function Exit
+    Terminated --> [*]: Cleanup
+```
+
+**Goroutine Pool Management Strategy:**
+
+Bifrost's goroutine management ensures optimal resource usage while preventing goroutine leaks:
+
+**Pool Configuration Management:**
+
+- **Goroutine Limits** - Maximum concurrent goroutines prevent resource exhaustion
+- **Active Counting** - Atomic counters track currently running goroutines
+- **Idle Timeouts** - Unused goroutines are cleaned up after configured periods
+- **Resource Boundaries** - Hard limits prevent runaway goroutine creation
+
+**Lifecycle Orchestration:**
+
+- **Spawn Channels** - New goroutine creation is tracked through channels
+- **Completion Monitoring** - Finished goroutines signal completion for cleanup
+- **Shutdown Coordination** - Graceful shutdown ensures all goroutines complete properly
+- **Health Monitoring** - Continuous monitoring tracks goroutine health and performance
+
+**Worker Creation Process:**
+
+- **Limit Enforcement** - Creation fails when maximum goroutine count is reached
+- **Unique Identification** - Each goroutine gets a unique ID for tracking and debugging
+- **Lifecycle Tracking** - Start times and names enable performance analysis
+- **Atomic Operations** - Thread-safe counters prevent race conditions
+
+**Panic Recovery and Error Handling:**
+
+- **Panic Isolation** - Goroutine panics don't crash the entire system
+- **Error Logging** - Panic details are logged with goroutine context
+- **Metrics Updates** - Panic counts are tracked for monitoring and alerting
+- **Resource Cleanup** - Failed goroutines are properly cleaned up and counted
+
+**Health Monitoring System:**
+
+- **Periodic Health Checks** - Regular intervals check goroutine pool health
+- **Completion Tracking** - Finished goroutines are recorded for performance analysis
+- **Shutdown Handling** - Clean shutdown process ensures no goroutine leaks
+
+### **Resource Leak Prevention**
+
+```mermaid
+flowchart TD
+    GoroutineStart[Goroutine Start] --> ResourceCheck[Resource Allocation Check]
+    ResourceCheck --> Timeout[Set Timeout Context]
+    Timeout --> Work[Execute Work]
+
+    Work --> Complete{Work Complete?}
+    Complete -->|Yes| Cleanup[Cleanup Resources]
+    Complete -->|No| TimeoutCheck{Timeout?}
+
+    TimeoutCheck -->|Yes| ForceCleanup[Force Cleanup]
+    TimeoutCheck -->|No| Work
+
+    Cleanup --> Return[Return Resources to Pool]
+    ForceCleanup --> Return
+    Return --> End[Goroutine End]
+````
+
+**Resource Leak Prevention:**
+
+```go
+func (worker *Worker) ExecuteWithCleanup(job *Job) {
+    // Set timeout context
+    ctx, cancel := context.WithTimeout(
+        context.Background(),
+        worker.config.ProcessTimeout,
+    )
+    defer cancel()
+
+    // Acquire resources with timeout
+    resources, err := worker.acquireResources(ctx)
+    if err != nil {
+        job.resultChan <- &Result{Error: err}
+        return
+    }
+
+    // Ensure cleanup happens
+    defer func() {
+        // Always return resources
+        worker.returnResources(resources)
+
+        // Handle panics
+        if r := recover(); r != nil {
+            worker.metrics.IncPanics()
+            job.resultChan <- &Result{
+                Error: fmt.Errorf("worker panic: %v", r),
+            }
+        }
+    }()
+
+    // Execute job with context
+    result := worker.processJob(ctx, job, resources)
+
+    // Return result
+    select {
+    case job.resultChan <- result:
+        // Success
+    case <-ctx.Done():
+        // Timeout - result channel might be closed
+        worker.metrics.IncTimeouts()
+    }
+}
+```
+
+---
+
+## Concurrency Optimization Strategies
+
+### **Load-Based Worker Scaling** (Planned)
+
+```mermaid
+graph TB
+    subgraph "Load Monitoring"
+        QueueDepth[Queue Depth<br/>Monitoring]
+        ResponseTime[Response Time<br/>Tracking]
+        WorkerUtil[Worker Utilization<br/>Metrics]
+    end
+
+    subgraph "Scaling Decisions"
+        ScaleUp{Scale Up?<br/>Load > 80%}
+        ScaleDown{Scale Down?<br/>Load < 30%}
+        Maintain[Maintain<br/>Current Size]
+    end
+
+    subgraph "Actions"
+        AddWorkers[Spawn Additional<br/>Workers]
+        RemoveWorkers[Graceful Worker<br/>Shutdown]
+        NoAction[No Action<br/>Monitor Continue]
+    end
+
+    QueueDepth --> ScaleUp
+    ResponseTime --> ScaleUp
+    WorkerUtil --> ScaleDown
+
+    ScaleUp -->|Yes| AddWorkers
+    ScaleUp -->|No| ScaleDown
+    ScaleDown -->|Yes| RemoveWorkers
+    ScaleDown -->|No| Maintain
+
+    Maintain --> NoAction
+```
+
+**Adaptive Scaling Implementation:**
+
+```go
+type AdaptiveScaler struct {
+    pool           *ProviderWorkerPool
+    config         ScalingConfig
+    metrics        *ScalingMetrics
+    lastScaleTime  time.Time
+    scalingMutex   sync.Mutex
+}
+
+func (scaler *AdaptiveScaler) EvaluateScaling() {
+    scaler.scalingMutex.Lock()
+    defer scaler.scalingMutex.Unlock()
+
+    // Prevent frequent scaling
+    if time.Since(scaler.lastScaleTime) < scaler.config.MinScaleInterval {
+        return
+    }
+
+    current := scaler.getCurrentMetrics()
+
+    // Scale up conditions
+    if current.QueueUtilization > scaler.config.ScaleUpThreshold ||
+       current.AvgResponseTime > scaler.config.MaxResponseTime {
+
+        scaler.scaleUp(current)
+        return
+    }
+
+    // Scale down conditions
+    if current.QueueUtilization < scaler.config.ScaleDownThreshold &&
+       current.AvgResponseTime < scaler.config.TargetResponseTime {
+
+        scaler.scaleDown(current)
+        return
+    }
+}
+
+func (scaler *AdaptiveScaler) scaleUp(metrics *CurrentMetrics) {
+    currentWorkers := scaler.pool.GetWorkerCount()
+    targetWorkers := int(float64(currentWorkers) * scaler.config.ScaleUpFactor)
+
+    // Respect maximum limits
+    if targetWorkers > scaler.config.MaxWorkers {
+        targetWorkers = scaler.config.MaxWorkers
+    }
+
+    additionalWorkers := targetWorkers - currentWorkers
+    if additionalWorkers > 0 {
+        scaler.pool.AddWorkers(additionalWorkers)
+        scaler.lastScaleTime = time.Now()
+        scaler.metrics.RecordScaleUp(additionalWorkers)
+    }
+}
+```
+
+### **Provider-Specific Optimization**
+
+```go
+type ProviderOptimization struct {
+    // Provider characteristics
+    ProviderName     string        `json:"provider_name"`
+    RateLimit        int           `json:"rate_limit"`        // Requests per second
+    AvgLatency       time.Duration `json:"avg_latency"`       // Average response time
+    ErrorRate        float64       `json:"error_rate"`        // Historical error rate
+
+    // Optimal configuration
+    OptimalWorkers   int           `json:"optimal_workers"`
+    OptimalBuffer    int           `json:"optimal_buffer"`
+    TimeoutConfig    time.Duration `json:"timeout_config"`
+    RetryStrategy    RetryConfig   `json:"retry_strategy"`
+}
+
+func CalculateOptimalConcurrency(provider ProviderOptimization) ConcurrencyConfig {
+    // Calculate based on rate limits and latency
+    optimalWorkers := provider.RateLimit * int(provider.AvgLatency.Seconds())
+
+    // Adjust for error rate (more workers for higher error rate)
+    errorAdjustment := 1.0 + provider.ErrorRate
+    optimalWorkers = int(float64(optimalWorkers) * errorAdjustment)
+
+    // Buffer should be 2-3x worker count for smooth operation
+    optimalBuffer := optimalWorkers * 3
+
+    return ConcurrencyConfig{
+        Concurrency: optimalWorkers,
+        BufferSize:  optimalBuffer,
+        Timeout:     provider.AvgLatency * 2, // 2x avg latency for timeout
+    }
+}
+```
+
+---
+
+## Concurrency Monitoring & Metrics
+
+### **Key Concurrency Metrics**
+
+```mermaid
+graph TB
+    subgraph "Worker Metrics"
+        ActiveWorkers[Active Workers<br/>Current Count]
+        IdleWorkers[Idle Workers<br/>Available Count]
+        BusyWorkers[Busy Workers<br/>Processing Count]
+    end
+
+    subgraph "Queue Metrics"
+        QueueDepth[Queue Depth<br/>Pending Jobs]
+        QueueThroughput[Queue Throughput<br/>Jobs/Second]
+        QueueWaitTime[Queue Wait Time<br/>Average Delay]
+    end
+
+    subgraph "Performance Metrics"
+        GoroutineCount[Goroutine Count<br/>Total Active]
+        MemoryUsage[Memory Usage<br/>Pool Utilization]
+        GCPressure[GC Pressure<br/>Collection Frequency]
+    end
+
+    subgraph "Health Metrics"
+        ErrorRate[Error Rate<br/>Failed Jobs %]
+        PanicCount[Panic Count<br/>Crashed Goroutines]
+        DeadlockDetection[Deadlock Detection<br/>Blocked Operations]
+    end
+```
+
+**Metrics Collection Strategy:**
+
+Comprehensive concurrency monitoring provides operational insights and performance optimization data:
+
+**Worker Pool Monitoring:**
+
+- **Total Worker Tracking** - Monitor configured vs actual worker counts
+- **Active Worker Monitoring** - Track workers currently processing requests
+- **Idle Worker Analysis** - Identify unused capacity and optimization opportunities
+- **Queue Depth Monitoring** - Track pending job backlog and processing delays
+
+**Performance Data Collection:**
+
+- **Throughput Metrics** - Measure jobs processed per second across all pools
+- **Wait Time Analysis** - Track how long jobs wait in queues before processing
+- **Memory Pool Performance** - Monitor hit/miss ratios for memory pool effectiveness
+- **Goroutine Count Tracking** - Ensure goroutine counts remain within healthy limits
+
+**Health and Reliability Metrics:**
+
+- **Panic Recovery Tracking** - Count and analyze worker panic occurrences
+- **Timeout Monitoring** - Track jobs that exceed processing time limits
+- **Circuit Breaker Events** - Monitor provider isolation events and recoveries
+- **Error Rate Analysis** - Track failure patterns for capacity planning
+
+**Real-Time Updates:**
+
+- **Live Metric Updates** - Worker metrics are updated continuously during operation
+- **Processing Event Recording** - Each job completion updates relevant metrics
+- **Performance Correlation** - Queue times and processing times are correlated for analysis
+- **Success/Failure Tracking** - All job outcomes are recorded for reliability analysis
+
+---
+
+## Deadlock Prevention & Detection
+
+### **Deadlock Prevention Strategies**
+
+```mermaid
+flowchart TD
+    Strategy1[Lock Ordering<br/>Consistent Acquisition]
+    Strategy2[Timeout-Based Locks<br/>Context Cancellation]
+    Strategy3[Channel Select<br/>Non-blocking Operations]
+    Strategy4[Resource Hierarchy<br/>Layered Locking]
+
+    Prevention[Deadlock Prevention<br/>Design Patterns]
+
+    Prevention --> Strategy1
+    Prevention --> Strategy2
+    Prevention --> Strategy3
+    Prevention --> Strategy4
+
+    Strategy1 --> Success[No Deadlocks<br/>Guaranteed Order]
+    Strategy2 --> Success
+    Strategy3 --> Success
+    Strategy4 --> Success
+````
+
+**Deadlock Prevention Implementation Strategy:**
+
+Bifrost employs multiple complementary strategies to prevent deadlocks in concurrent operations:
+
+**Lock Ordering Management:**
+
+- **Consistent Acquisition Order** - All locks are acquired in a predetermined order
+- **Global Lock Registry** - Centralized registry maintains lock ordering relationships
+- **Order Enforcement** - Lock acquisition automatically sorts by predetermined order
+- **Dependency Tracking** - Lock dependencies are mapped to prevent circular waits
+
+**Timeout-Based Protection:**
+
+- **Default Timeouts** - All lock acquisitions have reasonable timeout limits
+- **Context Cancellation** - Operations respect context cancellation for cleanup
+- **Maximum Timeout Limits** - Upper bounds prevent indefinite blocking
+- **Graceful Timeout Handling** - Timeout errors provide meaningful context
+
+**Multi-Lock Acquisition Process:**
+
+- **Ordered Sorting** - Multiple locks are sorted before acquisition attempts
+- **Progressive Acquisition** - Locks are acquired one by one in sorted order
+- **Failure Recovery** - Failed acquisitions trigger automatic cleanup of held locks
+- **Resource Tracking** - All acquired locks are tracked for proper release
+
+**Lock Acquisition Safety:**
+
+- **Non-Blocking Detection** - Channel-based lock attempts prevent indefinite blocking
+- **Timeout Enforcement** - All lock attempts respect configured timeout limits
+- **Error Propagation** - Lock failures are properly propagated with context
+- **Cleanup Guarantees** - Failed operations always clean up partially acquired resources
+
+**Deadlock Detection and Recovery:**
+
+- **Active Monitoring** - Continuous monitoring for potential deadlock conditions
+- **Automatic Recovery** - Detected deadlocks trigger automatic resolution procedures
+- **Resource Release** - Deadlock resolution involves strategic resource release
+- **Prevention Learning** - Deadlock patterns inform prevention strategy improvements
+
+---
+
+## Related Architecture Documentation
+
+- **[Request Flow](./request-flow)** - How concurrency fits in request processing
+- **[Benchmarks](../../benchmarking/getting-started)** - Concurrency performance characteristics
+- **[Plugin System](./plugins)** - Plugin concurrency considerations
+- **[MCP System](./mcp)** - MCP concurrency and worker integration
+
+## Usage Documentation
+
+- **[Provider Configuration](../../quickstart/gateway/provider-configuration)** - Configure concurrency settings per provider
+- **[Performance Analysis](../../benchmarking/getting-started)** - Memory pool configuration and optimization
+- **[Performance Monitoring](../../features/telemetry)** - Monitor concurrency metrics and health
+- **[Go SDK Usage](../../quickstart/go-sdk/setting-up)** - Use Bifrost concurrency in Go applications
+- **[Gateway Setup](../../quickstart/gateway/setting-up)** - Deploy Bifrost with optimal concurrency settings
+
+---
+
+**🎯 Next Step:** Understand how plugins integrate with the concurrency model in **[Plugin System](./plugins)**.
+```
--- a/docs/architecture/core/mcp.mdx
+++ b/docs/architecture/core/mcp.mdx
@@ -0,0 +1,985 @@
+---
+title: "Model Context Protocol (MCP)"
+description: "Deep dive into Bifrost's Model Context Protocol (MCP) integration - how external tool discovery, execution, and integration work internally."
+icon: "toolbox"
+---
+
+## MCP Architecture Overview
+
+### **What is MCP in Bifrost?**
+
+The Model Context Protocol (MCP) system in Bifrost enables AI models to seamlessly discover and execute external tools, transforming static chat models into dynamic, action-capable agents. This architecture bridges the gap between AI reasoning and real-world tool execution.
+
+**Core MCP Principles:**
+
+- **Dynamic Discovery** - Tools are discovered at runtime, not hardcoded
+- **Client-Side Execution** - Bifrost controls all tool execution for security
+- **Multi-Protocol Support** - STDIO, HTTP, and SSE connection types
+- **Request-Level Filtering** - Granular control over tool availability
+- **Async Execution** - Non-blocking tool invocation and response handling
+
+### **MCP System Components**
+
+```mermaid
+graph TB
+    subgraph "MCP Management Layer"
+        MCPMgr[MCP Manager<br/>Central Controller]
+        ClientRegistry[Client Registry<br/>Connection Management]
+        ToolDiscovery[Tool Discovery<br/>Runtime Registration]
+    end
+
+    subgraph "MCP Execution Layer"
+        ToolFilter[Tool Filter<br/>Access Control]
+        ToolExecutor[Tool Executor<br/>Invocation Engine]
+        ResultProcessor[Result Processor<br/>Response Handling]
+    end
+
+    subgraph "Connection Types"
+        STDIOConn[STDIO Connections<br/>Command-line Tools]
+        HTTPConn[HTTP Connections<br/>Web Services]
+        SSEConn[SSE Connections<br/>Real-time Streams]
+    end
+
+    subgraph "External MCP Servers"
+        FileSystem[Filesystem Tools<br/>File Operations]
+        WebSearch[Web Search<br/>Information Retrieval]
+        Database[Database Tools<br/>Data Access]
+        Custom[Custom Tools<br/>Business Logic]
+    end
+
+    MCPMgr --> ClientRegistry
+    ClientRegistry --> ToolDiscovery
+    ToolDiscovery --> ToolFilter
+    ToolFilter --> ToolExecutor
+    ToolExecutor --> ResultProcessor
+
+    ClientRegistry --> STDIOConn
+    ClientRegistry --> HTTPConn
+    ClientRegistry --> SSEConn
+
+    STDIOConn --> FileSystem
+    HTTPConn --> WebSearch
+    HTTPConn --> Database
+    STDIOConn --> Custom
+```
+
+---
+
+## MCP Connection Architecture
+
+### **Multi-Protocol Connection System**
+
+Bifrost supports four MCP connection types, each optimized for different tool deployment patterns:
+
+```mermaid
+graph TB
+    subgraph "InProcess Connections"
+        InProcess[In-Memory Tools<br/>Same Process]
+        InProcessEx[Examples:<br/>• Embedded tools<br/>• High-perf operations<br/>• Testing tools]
+    end
+
+    subgraph "STDIO Connections"
+        STDIO[Command Line Tools<br/>Local Execution]
+        STDIOEx[Examples:<br/>• Filesystem tools<br/>• Local scripts<br/>• CLI utilities]
+    end
+
+    subgraph "HTTP Connections"
+        HTTP[Web Service Tools<br/>Remote APIs]
+        HTTPEx[Examples:<br/>• Web search APIs<br/>• Database services<br/>• External integrations]
+    end
+
+    subgraph "SSE Connections"
+        SSE[Real-time Tools<br/>Streaming Data]
+        SSEEx[Examples:<br/>• Live data feeds<br/>• Real-time monitoring<br/>• Event streams]
+    end
+
+    subgraph "Connection Characteristics"
+        Latency[Latency:<br/>InProcess < STDIO < HTTP < SSE]
+        Security[Security:<br/>InProcess/Local > HTTP > SSE]
+        Scalability[Scalability:<br/>HTTP > SSE > STDIO > InProcess]
+        Complexity[Complexity:<br/>InProcess < STDIO < HTTP < SSE]
+    end
+
+    InProcess --> Latency
+    STDIO --> Latency
+    HTTP --> Security
+    SSE --> Scalability
+    HTTP --> Complexity
+```
+
+### **Connection Type Details**
+
+**InProcess Connections (In-Memory Tools):**
+
+- **Use Case:** Embedded tools, high-performance operations, testing
+- **Performance:** Lowest possible latency (~0.1ms) with no IPC overhead
+- **Security:** Highest security as tools run in the same process
+- **Limitations:** Go package only, cannot be configured via JSON
+
+**STDIO Connections (Local Tools):**
+
+- **Use Case:** Command-line tools, local scripts, filesystem operations
+- **Performance:** Low latency (~1-10ms) due to local execution
+- **Security:** High security with full local control
+- **Limitations:** Single-server deployment, resource sharing
+
+**HTTP Connections (Remote Services):**
+
+- **Use Case:** Web APIs, microservices, cloud functions
+- **Performance:** Network-dependent latency (~10-500ms)
+- **Security:** Configurable with authentication and encryption
+- **Advantages:** Scalable, multi-server deployment, service isolation
+
+**SSE Connections (Streaming Tools):**
+
+- **Use Case:** Real-time data feeds, live monitoring, event streams
+- **Performance:** Variable latency depending on stream frequency
+- **Security:** Similar to HTTP with streaming capabilities
+- **Benefits:** Real-time updates, persistent connections, event-driven
+
+> **MCP Configuration:** [MCP Setup Guide →](../../mcp/overview)
+
+---
+
+## Tool Discovery & Registration
+
+### **Dynamic Tool Discovery Process**
+
+The MCP system discovers tools at runtime rather than requiring static configuration, enabling flexible and adaptive tool availability:
+
+```mermaid
+sequenceDiagram
+    participant Bifrost
+    participant MCPManager
+    participant MCPServer
+    participant ToolRegistry
+    participant AIModel
+
+    Note over Bifrost: System Startup
+    Bifrost->>MCPManager: Initialize MCP System
+    MCPManager->>MCPServer: Establish Connection
+    MCPServer-->>MCPManager: Connection Ready
+
+    MCPManager->>MCPServer: List Available Tools
+    MCPServer-->>MCPManager: Tool Definitions
+    MCPManager->>ToolRegistry: Register Tools
+
+    Note over Bifrost: Runtime Request Processing
+    AIModel->>MCPManager: Request Available Tools
+    MCPManager->>ToolRegistry: Query Tools
+    ToolRegistry-->>MCPManager: Filtered Tool List
+    MCPManager-->>AIModel: Available Tools
+
+    AIModel->>MCPManager: Execute Tool Call
+    MCPManager->>MCPServer: Tool Invocation
+    MCPServer->>MCPServer: Execute Tool Logic
+    MCPServer-->>MCPManager: Tool Result
+    MCPManager-->>AIModel: Enhanced Response
+```
+
+### **Tool Registry Management**
+
+**Registration Process:**
+
+1. **Connection Establishment** - MCP client connects to configured servers
+2. **Capability Exchange** - Server announces available tools and schemas
+3. **Tool Validation** - Bifrost validates tool definitions and security
+4. **Registry Update** - Tools are registered in the internal tool registry
+5. **Availability Notification** - Tools become available for AI model use
+
+**Registry Features:**
+
+- **Dynamic Updates** - Tools can be added/removed during runtime
+- **Version Management** - Support for tool versioning and compatibility
+- **Access Control** - Request-level tool filtering and permissions
+- **Health Monitoring** - Continuous tool availability checking
+
+**Tool Metadata Structure:**
+
+- **Name & Description** - Human-readable tool identification
+- **Parameters Schema** - JSON schema for tool input validation
+- **Return Schema** - Expected response format definition
+- **Capabilities** - Tool feature flags and limitations
+- **Authentication** - Required credentials and permissions
+
+---
+
+## Tool Filtering & Access Control
+
+### **Multi-Level Filtering System**
+
+Bifrost provides granular control over tool availability through a sophisticated filtering system:
+
+```mermaid
+flowchart TD
+    Request[Incoming Request] --> GlobalFilter{Global MCP Filter}
+    GlobalFilter -->|Enabled| ClientFilter[MCP Client Filtering]
+    GlobalFilter -->|Disabled| NoMCP[No MCP Tools]
+
+    ClientFilter --> IncludeClients{Include Clients?}
+    IncludeClients -->|Yes| IncludeList[Include Specified<br/>MCP Clients]
+    IncludeClients -->|No| AllClients[All MCP Clients]
+
+    IncludeList --> ExcludeClients{Exclude Clients?}
+    AllClients --> ExcludeClients
+    ExcludeClients -->|Yes| RemoveClients[Remove Excluded<br/>MCP Clients]
+    ExcludeClients -->|No| ClientsFiltered[Filtered Clients]
+
+    RemoveClients --> ToolFilter[Tool-Level Filtering]
+    ClientsFiltered --> ToolFilter
+
+    ToolFilter --> IncludeTools{Include Tools?}
+    IncludeTools -->|Yes| IncludeSpecific[Include Specified<br/>Tools Only]
+    IncludeTools -->|No| AllTools[All Available Tools]
+
+    IncludeSpecific --> ExcludeTools{Exclude Tools?}
+    AllTools --> ExcludeTools
+    ExcludeTools -->|Yes| RemoveTools[Remove Excluded<br/>Tools]
+    ExcludeTools -->|No| FinalTools[Final Tool Set]
+
+    RemoveTools --> FinalTools
+    FinalTools --> AIModel[Available to AI Model]
+    NoMCP --> AIModel
+```
+
+### **Filtering Configuration Levels**
+
+**Request-Level Filtering:**
+
+```bash
+# Include only specific MCP clients
+curl -X POST http://localhost:8080/v1/chat/completions \
+  -H "x-bf-mcp-include-clients: filesystem,websearch" \
+  -d '{"model": "gpt-4o-mini", "messages": [...]}'
+
+# Include only specific tools
+curl -X POST http://localhost:8080/v1/chat/completions \
+  -H "x-bf-mcp-include-tools: filesystem-read_file,websearch-search" \
+  -d '{"model": "gpt-4o-mini", "messages": [...]}'
+```
+
+**Configuration-Level Filtering:**
+
+- **Client Selection** - Choose which MCP servers to connect to
+- **Tool Blacklisting** - Permanently disable dangerous or unwanted tools
+- **Permission Mapping** - Map user roles to available tool sets
+- **Environment-Based** - Different tool sets for development vs production
+
+**Security Benefits:**
+
+- **Principle of Least Privilege** - Only necessary tools are exposed
+- **Dynamic Access Control** - Per-request tool availability
+- **Audit Trail** - Track which tools are used by which requests
+- **Risk Mitigation** - Prevent access to dangerous operations
+
+> **📖 Tool Filtering:** [MCP Tool Control →](../../mcp/filtering)
+
+---
+
+## Tool Execution Engine
+
+### **Async Tool Execution Architecture**
+
+The MCP execution engine handles tool invocation asynchronously to maintain system responsiveness and enable complex multi-tool workflows:
+
+```mermaid
+sequenceDiagram
+    participant AIModel
+    participant ExecutionEngine
+    participant ToolInvoker
+    participant MCPServer
+    participant ResultProcessor
+
+    AIModel->>ExecutionEngine: Tool Call Request
+    ExecutionEngine->>ExecutionEngine: Validate Tool Call
+    ExecutionEngine->>ToolInvoker: Queue Tool Execution
+
+    Note over ToolInvoker: Async Tool Execution
+    ToolInvoker->>MCPServer: Invoke Tool
+    MCPServer->>MCPServer: Execute Tool Logic
+    MCPServer-->>ToolInvoker: Raw Tool Result
+
+    ToolInvoker->>ResultProcessor: Process Result
+    ResultProcessor->>ResultProcessor: Format & Validate
+    ResultProcessor-->>ExecutionEngine: Processed Result
+
+    ExecutionEngine-->>AIModel: Tool Execution Complete
+
+    Note over AIModel: Multi-turn Conversation
+    AIModel->>ExecutionEngine: Continue with Tool Results
+    ExecutionEngine->>ExecutionEngine: Merge Results into Context
+    ExecutionEngine-->>AIModel: Enhanced Response
+```
+
+### **Execution Flow Characteristics**
+
+**Validation Phase:**
+
+- **Parameter Validation** - Ensure tool arguments match expected schema
+- **Permission Checking** - Verify tool access permissions for the request
+- **Rate Limiting** - Apply per-tool and per-user rate limits
+- **Security Scanning** - Check for potentially dangerous operations
+
+**Execution Phase:**
+
+- **Timeout Management** - Bounded execution time to prevent hanging
+- **Error Handling** - Graceful handling of tool failures and timeouts
+- **Result Streaming** - Support for tools that return streaming responses
+- **Resource Monitoring** - Track tool resource usage and performance
+
+**Response Phase:**
+
+- **Result Formatting** - Convert tool outputs to consistent format
+- **Error Enrichment** - Add context and suggestions for tool failures
+- **Multi-Result Aggregation** - Combine multiple tool outputs coherently
+- **Context Integration** - Merge tool results into conversation context
+
+### **Multi-Turn Conversation Support**
+
+The MCP system enables sophisticated multi-turn conversations where AI models can:
+
+1. **Initial Tool Discovery** - Request available tools for a given context
+2. **Tool Execution** - Execute one or more tools based on user request
+3. **Result Analysis** - Analyze tool outputs and determine next steps
+4. **Follow-up Actions** - Execute additional tools based on previous results
+5. **Response Synthesis** - Combine tool results into coherent user response
+
+**Example Multi-Turn Flow:**
+
+```
+User: "Find recent news about AI and save interesting articles"
+AI: → Execute web_search("AI news recent")
+AI: → Analyze search results
+AI: → Execute save_article() for each interesting result
+AI: → Respond with summary of saved articles
+```
+
+### **Complete User-Controlled Tool Execution Flow**
+
+The following diagram shows the end-to-end user experience with MCP tool execution, highlighting the critical user control points and decision-making process:
+
+```mermaid
+flowchart TD
+    A["👤 User Message<br/>\"List files in current directory\""] --> B["🤖 Bifrost Core"]
+
+    B --> C["🔧 MCP Manager<br/>Auto-discovers and adds<br/>available tools to request"]
+
+    C --> D["🌐 LLM Provider<br/>(OpenAI, Anthropic, etc.)"]
+
+    D --> E{"🔍 Response contains<br/>tool_calls?"}
+
+    E -->|No| F["✅ Final Response<br/>Display to user"]
+
+    E -->|Yes| G["📝 Add assistant message<br/>with tool_calls to history"]
+
+    G --> H["🛡️ YOUR EXECUTION LOGIC<br/>(Security, Approval, Logging)"]
+
+    H --> I{"🤔 User Decision Point<br/>Execute this tool?"}
+
+    I -->|Deny| J["❌ Create denial result<br/>Add to conversation history"]
+
+    I -->|Approve| K["⚙️ client.ExecuteMCPTool()<br/>Bifrost executes via MCP"]
+
+    K --> L["📊 Tool Result<br/>Add to conversation history"]
+
+    J --> M["🔄 Continue conversation loop<br/>Send updated history back to LLM"]
+    L --> M
+
+    M --> D
+
+    style A fill:#e1f5fe
+    style F fill:#e8f5e8
+    style H fill:#fff3e0
+    style I fill:#fce4ec
+    style K fill:#f3e5f5
+```
+
+**Key Flow Characteristics:**
+
+**User Control Points:**
+
+- **Security Layer** - Your application controls all tool execution decisions
+- **Approval Gate** - Users can approve or deny each tool execution
+- **Transparency** - Full visibility into what tools will be executed and why
+- **Conversation Continuity** - Tool results seamlessly integrate into conversation flow
+
+**Security Benefits:**
+
+- **No Automatic Execution** - Tools never execute without explicit approval
+- **Audit Trail** - Complete logging of all tool execution decisions
+- **Contextual Security** - Approval decisions can consider full conversation context
+- **Graceful Denials** - Denied tools result in informative responses, not errors
+
+**Implementation Patterns:**
+
+```go
+// Example tool execution control in your application
+func handleToolExecution(toolCall schemas.ChatToolCall, userContext UserContext) error {
+    // YOUR SECURITY AND APPROVAL LOGIC HERE
+    if !userContext.HasPermission(toolCall.Function.Name) {
+        return createDenialResponse("Tool not permitted for user role")
+    }
+
+    if requiresApproval(toolCall) {
+        approved := promptUserForApproval(toolCall)
+        if !approved {
+            return createDenialResponse("User denied tool execution")
+        }
+    }
+
+    // Execute the tool via Bifrost
+    result, err := client.ExecuteMCPTool(ctx, toolCall)
+    if err != nil {
+        return handleToolError(err)
+    }
+
+    return addToolResultToHistory(result)
+}
+```
+
+This flow ensures that while AI models can discover and request tool usage, all actual execution remains under user control, providing the perfect balance of AI capability and human oversight.
+
+---
+
+## Agent Mode Architecture
+
+Agent Mode transforms Bifrost into an autonomous agent runtime by automatically executing pre-approved tools. This section details the internal architecture of the agent execution loop.
+
+### **Agent Execution Loop**
+
+The agent mode operates as an iterative loop that continues until one of the termination conditions is met:
+
+```mermaid
+flowchart TD
+    subgraph "Agent Mode Entry"
+        A["📥 Incoming Chat Request"] --> B{"🔍 Check MCP Config<br/>Any tools_to_auto_execute?"}
+        B -->|No| C["📤 Standard Flow<br/>Return tool_calls for manual execution"]
+        B -->|Yes| D["🤖 Enter Agent Loop"]
+    end
+
+    subgraph "Agent Execution Loop"
+        D --> E["🌐 Send to LLM Provider<br/>With available tools"]
+        E --> F{"🔧 Response has<br/>tool_calls?"}
+        F -->|No| G["✅ Return Final Response<br/>No more tools needed"]
+        F -->|Yes| H["📋 Classify Tool Calls"]
+
+        H --> I{"🔐 Separate by<br/>auto-execute status"}
+        I --> J["⚡ Auto-Executable Tools"]
+        I --> K["🛡️ Non-Auto-Executable Tools"]
+
+        J --> L["🔄 Execute in Parallel<br/>Via ToolsManager"]
+        L --> M["📊 Collect Results"]
+
+        K --> N{"Any non-auto<br/>tools found?"}
+        N -->|Yes| O["🛑 Exit Loop Early<br/>Return mixed response"]
+        N -->|No| P{"⏱️ Max depth<br/>reached?"}
+
+        M --> P
+        P -->|Yes| Q["⚠️ Return Current State<br/>May have pending tools"]
+        P -->|No| R["📝 Add results to history"]
+        R --> E
+    end
+
+    subgraph "Response Handling"
+        O --> S["📦 Create Mixed Response<br/>• Content: executed results JSON<br/>• tool_calls: pending tools<br/>• finish_reason: stop"]
+        G --> T["📦 Standard Response<br/>Final answer from LLM"]
+        Q --> U["📦 Depth Limit Response<br/>Current state with any pending"]
+    end
+
+    style D fill:#e3f2fd
+    style L fill:#e8f5e9
+    style O fill:#fff3e0
+    style S fill:#fce4ec
+```
+
+### **Tool Classification System**
+
+When the LLM returns tool calls, Bifrost classifies each tool based on the client configuration:
+
+```mermaid
+flowchart LR
+    subgraph "Tool Call Classification"
+        TC["🔧 Tool Call<br/>from LLM Response"] --> CHECK{"Tool in<br/>tools_to_execute?"}
+        CHECK -->|No| SKIP["❌ Skip<br/>Not allowed"]
+        CHECK -->|Yes| AUTO{"Tool in<br/>tools_to_auto_execute?"}
+        AUTO -->|Yes| EXEC["⚡ Auto-Execute<br/>Run immediately"]
+        AUTO -->|No| MANUAL["🛡️ Manual<br/>Return to caller"]
+    end
+
+    subgraph "Configuration Example"
+        CONFIG["MCPClientConfig"]
+        CONFIG --> TE["tools_to_execute: [*]<br/>All tools available"]
+        CONFIG --> TAE["tools_to_auto_execute:<br/>[read_file, list_dir]"]
+    end
+
+    style EXEC fill:#c8e6c9
+    style MANUAL fill:#fff9c4
+    style SKIP fill:#ffcdd2
+```
+
+### **Mixed Tool Response Format**
+
+When a response contains both auto-executable and non-auto-executable tools, the agent creates a special response format:
+
+<AccordionGroup>
+  <Accordion title="Chat API Response Format" icon="message" defaultOpen>
+
+```json
+{
+  "id": "chatcmpl-abc123",
+  "choices": [{
+    "index": 0,
+    "finish_reason": "stop",
+    "message": {
+      "role": "assistant",
+      "content": "The Output from allowed tools calls is - {\"filesystem_read_file\":\"file contents here\",\"filesystem_list_directory\":\"[\\\"file1.txt\\\",\\\"file2.txt\\\"]\"}\n\nNow I shall call these tools next...",
+      "tool_calls": [
+        {
+          "id": "call_write_123",
+          "type": "function",
+          "function": {
+            "name": "filesystem_write_file",
+            "arguments": "{\"path\":\"output.txt\",\"content\":\"...\"}"
+          }
+        }
+      ]
+    }
+  }]
+}
+```
+
+<Note>
+The `content` field contains JSON-formatted results from auto-executed tools. The `tool_calls` array contains only non-auto-executable tools awaiting approval. Setting `finish_reason` to `"stop"` ensures the agent loop exits.
+</Note>
+
+  </Accordion>
+
+  <Accordion title="Responses API Format" icon="code">
+
+```json
+{
+  "id": "resp-abc123",
+  "output": [
+    {
+      "type": "message",
+      "role": "assistant",
+      "content": [{
+        "type": "text",
+        "text": "The Output from allowed tools calls is - {...}\n\nNow I shall call these tools next..."
+      }]
+    },
+    {
+      "type": "function_call",
+      "role": "assistant",
+      "call_id": "call_write_123",
+      "name": "filesystem_write_file",
+      "arguments": "{\"path\":\"output.txt\",\"content\":\"...\"}"
+    }
+  ]
+}
+```
+
+  </Accordion>
+</AccordionGroup>
+
+### **Agent Depth Control**
+
+The `max_agent_depth` setting prevents infinite loops and controls resource usage:
+
+```mermaid
+graph LR
+    subgraph "Depth Tracking"
+        D0["Depth 0<br/>Initial Request"] --> D1["Depth 1<br/>First tool execution"]
+        D1 --> D2["Depth 2<br/>Second iteration"]
+        D2 --> D3["Depth 3<br/>..."]
+        D3 --> DN["Depth N<br/>Max reached"]
+    end
+
+    DN --> EXIT["🛑 Force Exit<br/>Return current state"]
+
+    subgraph "Configuration"
+        CFG["MCPToolManagerConfig"]
+        CFG --> MAX["max_agent_depth: 10<br/>(default)"]
+        CFG --> TIMEOUT["tool_execution_timeout:<br/>30s per tool"]
+    end
+```
+
+<Warning>
+When max depth is reached, the response may contain pending tool calls that weren't executed. Your application should handle this gracefully.
+</Warning>
+
+---
+
+## Code Mode Architecture
+
+Code Mode enables AI models to write and execute Python code (Starlark) that orchestrates multiple MCP tools in a single request. This provides a powerful meta-layer for complex multi-tool workflows.
+
+### **Code Mode System Overview**
+
+```mermaid
+graph TB
+    subgraph "Code Mode Components"
+        VM["🖥️ Starlark Interpreter<br/>Python-like Runtime"]
+        VFS["📁 Virtual File System<br/>Tool Definitions as .pyi"]
+        EXEC["⚙️ Code Executor<br/>Sandboxed Execution"]
+    end
+
+    subgraph "Meta Tools"
+        LIST["listToolFiles()<br/>Discover available servers"]
+        READ["readToolFile(fileName)<br/>Get tool signatures"]
+        DOCS["getToolDocs(server, tool)<br/>Get detailed docs"]
+        CODE["executeToolCode(code)<br/>Run Python code"]
+    end
+
+    subgraph "MCP Integration"
+        TOOLS["🔧 Connected MCP Tools"]
+        RESULTS["📊 Tool Results"]
+    end
+
+    LLM["🤖 LLM"] --> LIST
+    LIST --> VFS
+    VFS --> LLM
+    LLM --> READ
+    READ --> VFS
+    VFS --> LLM
+    LLM --> DOCS
+    DOCS --> VFS
+    VFS --> LLM
+    LLM --> CODE
+    CODE --> VM
+    VM --> EXEC
+    EXEC --> TOOLS
+    TOOLS --> RESULTS
+    RESULTS --> LLM
+
+    style VM fill:#e8eaf6
+    style VFS fill:#e3f2fd
+    style CODE fill:#e8f5e9
+```
+
+### **Virtual File System (VFS)**
+
+Code Mode generates Python stub files (`.pyi`) for all connected MCP tools, providing compact function signatures:
+
+<Tabs>
+  <Tab title="Server-Level Binding">
+
+When `code_mode_binding_level: "server"` (default), tools are grouped by MCP client:
+
+```
+servers/
+├── filesystem.pyi      → All filesystem tools
+├── web_search.pyi      → All web search tools
+└── database.pyi        → All database tools
+```
+
+**Generated Stub Example:**
+```python
+# servers/filesystem.pyi
+# Usage: filesystem.tool_name(param=value)
+# For detailed docs: use getToolDocs(server="filesystem", tool="tool_name")
+
+def read_file(path: str) -> dict:  # Read contents of a file
+def write_file(path: str, content: str) -> dict:  # Write content to a file
+def list_directory(path: str) -> dict:  # List directory contents
+```
+
+**Usage in Code:**
+```python
+files = filesystem.list_directory(path=".")
+content = filesystem.read_file(path=files["entries"][0])
+result = content
+```
+
+  </Tab>
+  <Tab title="Tool-Level Binding">
+
+When `code_mode_binding_level: "tool"`, each tool gets its own file:
+
+```
+servers/
+├── filesystem/
+│   ├── read_file.pyi
+│   ├── write_file.pyi
+│   └── list_directory.pyi
+├── web_search/
+│   └── search.pyi
+└── database/
+    └── query.pyi
+```
+
+**Generated Stub Example:**
+```python
+# servers/filesystem/read_file.pyi
+# Usage: filesystem.read_file(param=value)
+
+def read_file(path: str) -> dict:  # Read contents of a file
+```
+
+**Usage in Code:**
+```python
+content = filesystem.read_file(path="config.json")
+result = content
+```
+
+  </Tab>
+</Tabs>
+
+### **Code Execution Flow**
+
+```mermaid
+sequenceDiagram
+    participant LLM as 🤖 LLM
+    participant CM as 📝 Code Mode Handler
+    participant VM as 🖥️ Starlark Interpreter
+    participant TM as 🔧 Tools Manager
+    participant MCP as 🌐 MCP Servers
+
+    LLM->>CM: executeToolCode({ code: "..." })
+    CM->>VM: Initialize sandbox
+    CM->>VM: Inject tool bindings
+    CM->>VM: Execute Python code
+
+    loop For each tool call in code
+        VM->>TM: server.tool(param=value)
+        TM->>MCP: Execute tool
+        MCP-->>TM: Tool result
+        TM-->>VM: Return result
+    end
+
+    VM-->>CM: Execution result
+    CM-->>LLM: { result, logs }
+```
+
+### **Starlark Sandbox**
+
+The code execution environment is carefully sandboxed using Starlark, a Python-like language designed for configuration and embedded scripting:
+
+<AccordionGroup>
+  <Accordion title="Available Features" icon="check" defaultOpen>
+
+  - ✅ **Python-like syntax** - Familiar Python syntax and semantics
+  - ✅ **Synchronous calls** - No async/await needed, direct function calls
+  - ✅ **List comprehensions** - `[x for x in items if condition]`
+  - ✅ **print()** - Output captured and returned in logs
+  - ✅ **Dict/List operations** - Standard Python data structures
+  - ✅ **Tool bindings** - All connected MCP tools as globals
+  </Accordion>
+
+  <Accordion title="Restricted Features" icon="ban">
+
+  - ❌ **Imports** - No `import` statements (tools are pre-bound)
+  - ❌ **Classes** - Use dicts and functions instead
+  - ❌ **File I/O** - No direct filesystem access (use MCP tools)
+  - ❌ **Network** - No direct network access (use MCP tools)
+  - ❌ **Randomness/Time** - Deterministic execution only
+
+  </Accordion>
+</AccordionGroup>
+
+### **Code Mode Security Model**
+
+```mermaid
+graph TB
+    subgraph "Security Layers"
+        L1["🔒 Code Validation<br/>Syntax checking before execution"]
+        L2["🛡️ Sandboxed Runtime<br/>No external module access"]
+        L3["⏱️ Execution Timeout<br/>Bounded runtime"]
+        L4["🔐 Tool ACL<br/>Only allowed tools accessible"]
+    end
+
+    subgraph "Execution Boundaries"
+        B1["No filesystem access<br/>(except via MCP tools)"]
+        B2["No network access<br/>(except via MCP tools)"]
+        B3["No process spawning"]
+        B4["Memory isolation enforced"]
+    end
+
+    L1 --> L2 --> L3 --> L4
+    L4 --> B1
+    L4 --> B2
+    L4 --> B3
+    L4 --> B4
+```
+
+### **Code Mode Configuration**
+
+<Tabs>
+  <Tab title="Gateway (config.json)">
+
+```json
+{
+  "mcp": {
+    "client_configs": [
+      {
+        "name": "filesystem",
+        "is_code_mode_client": true,
+        "connection_type": "stdio",
+        "stdio_config": {
+          "command": "npx",
+          "args": ["-y", "@anthropic/mcp-filesystem"]
+        },
+        "tools_to_execute": ["*"]
+      }
+    ],
+    "tool_manager_config": {
+      "code_mode_binding_level": "server",
+      "tool_execution_timeout": "30s"
+    }
+  }
+}
+```
+
+  </Tab>
+  <Tab title="Go SDK">
+
+```go
+mcpConfig := &schemas.MCPConfig{
+    ClientConfigs: []schemas.MCPClientConfig{
+        {
+            Name:             "filesystem",
+            IsCodeModeClient: true,
+            ConnectionType:   schemas.MCPConnectionTypeSTDIO,
+            StdioConfig: &schemas.MCPStdioConfig{
+                Command: "npx",
+                Args:    []string{"-y", "@anthropic/mcp-filesystem"},
+            },
+            ToolsToExecute: []string{"*"},
+        },
+    },
+    ToolManagerConfig: &schemas.MCPToolManagerConfig{
+        CodeModeBindingLevel: schemas.CodeModeBindingLevelServer,
+        ToolExecutionTimeout: 30 * time.Second,
+    },
+}
+```
+
+  </Tab>
+</Tabs>
+
+### **Code Mode vs Agent Mode**
+
+| Aspect | Agent Mode | Code Mode |
+|--------|------------|-----------|
+| **Execution Model** | LLM decides one tool at a time | LLM writes code orchestrating multiple tools |
+| **Iterations** | Multiple LLM round-trips | Single LLM call, code handles orchestration |
+| **Complexity** | Simple tool chains | Complex workflows with conditionals/loops |
+| **Latency** | Higher (multiple LLM calls) | Lower (single LLM call + code execution) |
+| **Control** | Per-tool approval possible | Code runs atomically |
+| **Best For** | Interactive agents | Batch operations, complex data processing |
+
+---
+
+## MCP Integration Patterns
+
+### **Common Integration Scenarios**
+
+**1. Filesystem Operations**
+
+- **Tools:** `list_files`, `read_file`, `write_file`, `create_directory`
+- **Use Cases:** Code analysis, document processing, file management
+- **Security:** Sandboxed file access, path validation, permission checks
+- **Performance:** Local execution for fast file operations
+
+**2. Web Search & Information Retrieval**
+
+- **Tools:** `web_search`, `fetch_url`, `extract_content`, `summarize`
+- **Use Cases:** Research assistance, fact-checking, content gathering
+- **Integration:** External search APIs, content parsing services
+- **Caching:** Response caching for repeated queries
+
+**3. Database Operations**
+
+- **Tools:** `query_database`, `insert_record`, `update_record`, `schema_info`
+- **Use Cases:** Data analysis, report generation, database administration
+- **Security:** Read-only access by default, query validation, injection prevention
+- **Performance:** Connection pooling, query optimization
+
+**4. API Integrations**
+
+- **Tools:** Custom business logic tools, third-party service integration
+- **Use Cases:** CRM operations, payment processing, notification sending
+- **Authentication:** API key management, OAuth token handling
+- **Error Handling:** Retry logic, fallback mechanisms
+
+### **MCP Server Development Patterns**
+
+**Simple STDIO Server:**
+
+- **Language:** Any language that can read/write JSON to stdin/stdout
+- **Deployment:** Single executable, minimal dependencies
+- **Use Case:** Local tools, development utilities, simple scripts
+
+**HTTP Service Server:**
+
+- **Architecture:** RESTful API with MCP protocol endpoints
+- **Scalability:** Horizontal scaling, load balancing
+- **Use Case:** Shared tools, enterprise integrations, cloud services
+
+**Hybrid Approach:**
+
+- **Local + Remote:** Combine STDIO tools for local operations with HTTP for remote services
+- **Failover:** Use local fallbacks when remote services are unavailable
+- **Optimization:** Route tool calls to most appropriate execution environment
+
+> **📖 MCP Development:** [Tool Development Guide →](../../mcp/overview)
+
+---
+
+## Security & Safety Considerations
+
+### **MCP Security Architecture**
+
+```mermaid
+graph TB
+    subgraph "Security Layers"
+        L1[Connection Security<br/>Authentication & Encryption]
+        L2[Tool Validation<br/>Schema & Permission Checks]
+        L3[Execution Security<br/>Sandboxing & Limits]
+        L4[Result Security<br/>Output Validation & Filtering]
+    end
+
+    subgraph "Threat Mitigation"
+        T1[Malicious Tools<br/>Code Injection Prevention]
+        T2[Resource Abuse<br/>Rate Limiting & Quotas]
+        T3[Data Exposure<br/>Output Sanitization]
+        T4[System Access<br/>Privilege Isolation]
+    end
+
+    L1 --> T1
+    L2 --> T2
+    L3 --> T4
+    L4 --> T3
+```
+
+**Security Measures:**
+
+**Connection Security:**
+
+- **Authentication** - API keys, certificates, or token-based auth for HTTP/SSE
+- **Encryption** - TLS for HTTP connections, secure pipes for STDIO
+- **Network Isolation** - Firewall rules and network segmentation
+
+**Execution Security:**
+
+- **Sandboxing** - Isolated execution environments for tools
+- **Resource Limits** - CPU, memory, and time constraints
+- **Permission Model** - Principle of least privilege for tool access
+
+**Operational Security:**
+
+- **Regular Updates** - Keep MCP servers and tools updated
+- **Monitoring** - Continuous security monitoring and alerting
+- **Incident Response** - Procedures for security incidents involving tools
+
+---
+
+## Related Architecture Documentation
+
+- **[Request Flow](./request-flow)** - MCP integration in request processing
+- **[Concurrency Model](./concurrency)** - MCP concurrency and worker integration
+- **[Plugin System](./plugins)** - Integration between MCP and plugin systems
+- **[Benchmarks](../../benchmarking/getting-started)** - MCP performance impact and optimization
+
+
+
--- a/docs/architecture/core/plugins.mdx
+++ b/docs/architecture/core/plugins.mdx
@@ -0,0 +1,552 @@
+---
+title: "Plugins"
+description: "Deep dive into Bifrost's extensible plugin architecture - how plugins work internally, lifecycle management, execution model, and integration patterns."
+icon: "puzzle-piece"
+---
+
+## Plugin Architecture Philosophy
+
+### **Core Design Principles**
+
+Bifrost's plugin system is built around five key principles that ensure extensibility without compromising performance or reliability:
+
+| Principle                     | Implementation                                   | Benefit                                          |
+| ----------------------------- | ------------------------------------------------ | ------------------------------------------------ |
+| **Plugin-First Design**    | Core logic designed around plugin hook points    | Maximum extensibility without core modifications |
+| **Zero-Copy Integration**  | Direct memory access to request/response objects | Minimal performance overhead                     |
+| **Lifecycle Management**   | Complete plugin lifecycle with automatic cleanup | Resource safety and leak prevention              |
+| **Interface-Based Safety** | Well-defined interfaces for type safety          | Compile-time validation and consistency          |
+| **Failure Isolation**      | Plugin errors don't crash the core system        | Fault tolerance and system stability             |
+
+### **Plugin System Overview**
+
+```mermaid
+graph TB
+    subgraph "Plugin Management Layer"
+        PluginMgr[Plugin Manager<br/>Central Controller]
+        Registry[Plugin Registry<br/>Discovery & Loading]
+        Lifecycle[Lifecycle Manager<br/>State Management]
+    end
+
+    subgraph "Plugin Execution Layer"
+        Pipeline[Plugin Pipeline<br/>Execution Orchestrator]
+        PreHooks[Pre-Processing Hooks<br/>Request Modification]
+        PostHooks[Post-Processing Hooks<br/>Response Enhancement]
+    end
+
+    subgraph "Plugin Categories"
+        Auth[Authentication<br/>& Authorization]
+        RateLimit[Rate Limiting<br/>& Throttling]
+        Transform[Data Transformation<br/>& Validation]
+        Monitor[Monitoring<br/>& Analytics]
+        Custom[Custom Business<br/>Logic]
+    end
+
+    PluginMgr --> Registry
+    Registry --> Lifecycle
+    Lifecycle --> Pipeline
+
+    Pipeline --> PreHooks
+    Pipeline --> PostHooks
+
+    PreHooks --> Auth
+    PreHooks --> RateLimit
+    PostHooks --> Transform
+    PostHooks --> Monitor
+    PostHooks --> Custom
+```
+
+---
+
+## Plugin Lifecycle Management
+
+### **Complete Lifecycle States**
+
+Every plugin goes through a well-defined lifecycle that ensures proper resource management and error handling:
+
+```mermaid
+stateDiagram-v2
+    [*] --> PluginInit: Plugin Creation
+    PluginInit --> Registered: Add to BifrostConfig
+    Registered --> PreHookCall: Request Received
+
+    PreHookCall --> ModifyRequest: Normal Flow
+    PreHookCall --> ShortCircuitResponse: Return Response
+    PreHookCall --> ShortCircuitError: Return Error
+
+    ModifyRequest --> ProviderCall: Send to Provider
+    ProviderCall --> PostHookCall: Receive Response
+
+    ShortCircuitResponse --> PostHookCall: Skip Provider
+    ShortCircuitError --> PostHookCall: Pipeline Symmetry
+
+    PostHookCall --> ModifyResponse: Process Result
+    PostHookCall --> RecoverError: Error Recovery
+    PostHookCall --> FallbackCheck: Check AllowFallbacks
+    PostHookCall --> ResponseReady: Pass Through
+
+    FallbackCheck --> TryFallback: AllowFallbacks=true/nil
+    FallbackCheck --> ResponseReady: AllowFallbacks=false
+    TryFallback --> PreHookCall: Next Provider
+
+    ModifyResponse --> ResponseReady: Modified
+    RecoverError --> ResponseReady: Recovered
+    ResponseReady --> [*]: Return to Client
+
+    Registered --> CleanupCall: Bifrost Shutdown
+    CleanupCall --> [*]: Plugin Destroyed
+```
+
+### **Lifecycle Phase Details**
+
+**Discovery Phase:**
+
+- **Purpose:** Find and catalog available plugins
+- **Sources:** Command line, environment variables, JSON configuration, directory scanning
+- **Validation:** Basic existence and format checks
+- **Output:** Plugin descriptors with metadata
+
+**Loading Phase:**
+
+- **Purpose:** Load plugin binaries into memory
+- **Security:** Digital signature verification and checksum validation
+- **Compatibility:** Interface implementation validation
+- **Resource:** Memory and capability assessment
+
+**Initialization Phase:**
+
+- **Purpose:** Configure plugin with runtime settings
+- **Timeout:** Bounded initialization time to prevent hanging
+- **Dependencies:** External service connectivity verification
+- **State:** Internal state setup and resource allocation
+
+**Runtime Phase:**
+
+- **Purpose:** Active request processing
+- **Monitoring:** Continuous health checking and performance tracking
+- **Recovery:** Automatic error recovery and degraded mode handling
+- **Metrics:** Real-time performance and health metrics collection
+
+> **Plugin Lifecycle:** [Plugin Management →](../../enterprise/custom-plugins)
+
+---
+
+## Plugin Execution Pipeline
+
+### **Request Processing Flow**
+
+The plugin pipeline ensures consistent, predictable execution while maintaining high performance:
+
+#### **Normal Execution Flow (No Short-Circuit)**
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant Bifrost
+    participant Plugin1
+    participant Plugin2
+    participant Provider
+
+    Client->>Bifrost: Request
+    Bifrost->>Plugin1: PreLLMHook(request)
+    Plugin1-->>Bifrost: modified request
+    Bifrost->>Plugin2: PreLLMHook(request)
+    Plugin2-->>Bifrost: modified request
+    Bifrost->>Provider: API Call
+    Provider-->>Bifrost: response
+    Bifrost->>Plugin2: PostLLMHook(response)
+    Plugin2-->>Bifrost: modified response
+    Bifrost->>Plugin1: PostLLMHook(response)
+    Plugin1-->>Bifrost: modified response
+    Bifrost-->>Client: Final Response
+```
+
+**Execution Order:**
+
+1. **PreHooks:** Execute in registration order (1 → 2 → N)
+2. **Provider Call:** If no short-circuit occurred
+3. **PostHooks:** Execute in reverse order (N → 2 → 1)
+
+#### **Short-Circuit Response Flow (Cache Hit)**
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant Bifrost
+    participant Cache
+    participant Auth
+    participant Provider
+
+    Client->>Bifrost: Request
+    Bifrost->>Auth: PreLLMHook(request)
+    Auth-->>Bifrost: modified request
+    Bifrost->>Cache: PreLLMHook(request)
+    Cache-->>Bifrost: LLMPluginShortCircuit{Response}
+    Note over Provider: Provider call skipped
+    Bifrost->>Cache: PostLLMHook(response)
+    Cache-->>Bifrost: modified response
+    Bifrost->>Auth: PostLLMHook(response)
+    Auth-->>Bifrost: modified response
+    Bifrost-->>Client: Cached Response
+```
+
+#### **Streaming Response Flow**
+
+For streaming responses, the plugin pipeline executes post-hooks for every delta/chunk received from the provider:
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant Bifrost
+    participant Plugin1
+    participant Plugin2
+    participant Provider
+
+    Client->>Bifrost: Stream Request
+    Bifrost->>Plugin1: PreLLMHook(request)
+    Plugin1-->>Bifrost: modified request
+    Bifrost->>Plugin2: PreLLMHook(request)
+    Plugin2-->>Bifrost: modified request
+    Bifrost->>Provider: Stream API Call
+
+    loop For Each Delta
+        Provider-->>Bifrost: stream delta
+        Bifrost->>Plugin2: PostLLMHook(delta)
+        Plugin2-->>Bifrost: modified delta
+        Bifrost->>Plugin1: PostLLMHook(delta)
+        Plugin1-->>Bifrost: modified delta
+        Bifrost-->>Client: Send Delta
+    end
+
+    Provider-->>Bifrost: final chunk (finish reason)
+    Bifrost->>Plugin2: PostLLMHook(final)
+    Plugin2-->>Bifrost: modified final
+    Bifrost->>Plugin1: PostLLMHook(final)
+    Plugin1-->>Bifrost: modified final
+    Bifrost-->>Client: Final Chunk
+```
+
+**Streaming Execution Characteristics:**
+
+1. **Delta Processing:**
+   - Each stream delta (chunk) goes through all post-hooks
+   - Plugins can modify/transform each delta before it reaches the client
+   - Deltas can contain: text content, tool calls, role changes, or usage info
+
+2. **Special Delta Types:**
+   - **Start Event:** Initial delta with role information
+   - **Content Delta:** Regular text or tool call content
+   - **Usage Update:** Token usage statistics (if enabled)
+   - **Final Chunk:** Contains finish reason and any final metadata
+
+3. **Plugin Considerations:**
+   - Plugins must handle streaming responses efficiently
+   - Each delta should be processed quickly to maintain stream responsiveness
+   - Plugins can track state across deltas using context
+   - Heavy processing should be done asynchronously
+
+4. **Error Handling:**
+   - If a post-hook returns an error, it's sent as an error stream chunk
+   - Stream is terminated after error chunks
+   - Plugins can recover from errors by providing valid responses
+
+5. **Performance Optimization:**
+   - Lightweight delta processing to minimize latency
+   - Object pooling for common data structures
+   - Non-blocking operations for logging and metrics
+   - Efficient memory management for stream processing
+
+> **Streaming Details:** [Streaming Guide →](../../quickstart/gateway/streaming)
+
+**Short-Circuit Rules:**
+
+- **Provider Skipped:** When plugin returns short-circuit response/error
+- **PostLLMHook Guarantee:** All executed PreHooks get corresponding PostLLMHook calls
+- **Reverse Order:** PostHooks execute in reverse order of PreHooks
+
+#### **Short-Circuit Error Flow (Allow Fallbacks)**
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant Bifrost
+    participant Plugin1
+    participant Provider1
+    participant Provider2
+
+    Client->>Bifrost: Request (Provider1 + Fallback Provider2)
+    Bifrost->>Plugin1: PreLLMHook(request)
+    Plugin1-->>Bifrost: LLMPluginShortCircuit{Error, AllowFallbacks=true}
+    Note over Provider1: Provider1 call skipped
+    Bifrost->>Plugin1: PostLLMHook(error)
+    Plugin1-->>Bifrost: error unchanged
+
+    Note over Bifrost: Try fallback provider
+    Bifrost->>Plugin1: PreLLMHook(request for Provider2)
+    Plugin1-->>Bifrost: modified request
+    Bifrost->>Provider2: API Call
+    Provider2-->>Bifrost: response
+    Bifrost->>Plugin1: PostLLMHook(response)
+    Plugin1-->>Bifrost: modified response
+    Bifrost-->>Client: Final Response
+```
+
+#### **Error Recovery Flow**
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant Bifrost
+    participant Plugin1
+    participant Plugin2
+    participant Provider
+    participant RecoveryPlugin
+
+    Client->>Bifrost: Request
+    Bifrost->>Plugin1: PreLLMHook(request)
+    Plugin1-->>Bifrost: modified request
+    Bifrost->>Plugin2: PreLLMHook(request)
+    Plugin2-->>Bifrost: modified request
+    Bifrost->>RecoveryPlugin: PreLLMHook(request)
+    RecoveryPlugin-->>Bifrost: modified request
+    Bifrost->>Provider: API Call
+    Provider-->>Bifrost: error
+    Bifrost->>RecoveryPlugin: PostLLMHook(error)
+    RecoveryPlugin-->>Bifrost: recovered response
+    Bifrost->>Plugin2: PostLLMHook(response)
+    Plugin2-->>Bifrost: modified response
+    Bifrost->>Plugin1: PostLLMHook(response)
+    Plugin1-->>Bifrost: modified response
+    Bifrost-->>Client: Recovered Response
+```
+
+**Error Recovery Features:**
+
+- **Error Transformation:** Plugins can convert errors to successful responses
+- **Graceful Degradation:** Provide fallback responses for service failures
+- **Context Preservation:** Error context is maintained through recovery process
+
+### **Complex Plugin Decision Flow**
+
+Real-world plugin interactions involving authentication, rate limiting, and caching with different decision paths:
+
+```mermaid
+graph TD
+    A["Client Request"] --> B["Bifrost"]
+    B --> C["Auth Plugin PreLLMHook"]
+    C --> D{"Authenticated?"}
+    D -->|No| E["Return Auth Error<br/>AllowFallbacks=false"]
+    D -->|Yes| F["RateLimit Plugin PreLLMHook"]
+    F --> G{"Rate Limited?"}
+    G -->|Yes| H["Return Rate Error<br/>AllowFallbacks=nil"]
+    G -->|No| I["Cache Plugin PreLLMHook"]
+    I --> J{"Cache Hit?"}
+    J -->|Yes| K["Return Cached Response"]
+    J -->|No| L["Provider API Call"]
+    L --> M["Cache Plugin PostLLMHook"]
+    M --> N["Store in Cache"]
+    N --> O["RateLimit Plugin PostLLMHook"]
+    O --> P["Auth Plugin PostLLMHook"]
+    P --> Q["Final Response"]
+
+    E --> R["Skip Fallbacks"]
+    H --> S["Try Fallback Provider"]
+    K --> T["Skip Provider Call"]
+```
+
+### **Execution Characteristics**
+
+**Symmetric Execution Pattern:**
+
+- **Pre-processing:** Plugins execute in priority order (high to low)
+- **Post-processing:** Plugins execute in reverse order (low to high)
+- **Rationale:** Ensures proper cleanup and state management (last in, first out)
+
+**Performance Optimizations:**
+
+- **Timeout Boundaries:** Each plugin has configurable execution timeouts
+- **Panic Recovery:** Plugin panics are caught and logged without crashing the system
+- **Resource Limits:** Memory and CPU limits prevent runaway plugins
+- **Circuit Breaking:** Repeated failures trigger plugin isolation
+
+**Error Handling Strategies:**
+
+- **Continue:** Use original request/response if plugin fails
+- **Fail Fast:** Return error immediately if critical plugin fails
+- **Retry:** Attempt plugin execution with exponential backoff
+- **Fallback:** Use alternative plugin or default behavior
+
+> **Plugin Execution:** [Request Flow →](./request-flow#stage-3-plugin-pipeline-processing)
+
+---
+
+## Security & Validation
+
+### **Multi-Layer Security Model**
+
+Plugin security operates at multiple layers to ensure system integrity:
+
+```mermaid
+graph TB
+    subgraph "Security Validation Layers"
+        L1[Layer 1: Binary Validation<br/>Signature & Checksum]
+        L2[Layer 2: Interface Validation<br/>Type Safety & Compatibility]
+        L3[Layer 3: Runtime Validation<br/>Resource Limits & Timeouts]
+        L4[Layer 4: Execution Isolation<br/>Panic Recovery & Error Handling]
+    end
+
+    subgraph "Security Benefits"
+        Integrity[Code Integrity<br/>Verified Authenticity]
+        Safety[Type Safety<br/>Compile-time Checks]
+        Stability[System Stability<br/>Isolated Failures]
+        Performance[Performance Protection<br/>Resource Limits]
+    end
+
+    L1 --> Integrity
+    L2 --> Safety
+    L3 --> Performance
+    L4 --> Stability
+```
+
+### **Validation Process**
+
+**Binary Security:**
+
+- **Digital Signatures:** Cryptographic verification of plugin authenticity
+- **Checksum Validation:** File integrity verification
+- **Source Verification:** Trusted source requirements
+
+**Interface Security:**
+
+- **Type Safety:** Interface implementation verification
+- **Version Compatibility:** Plugin API version checking
+- **Memory Safety:** Safe memory access patterns
+
+**Runtime Security:**
+
+- **Resource Quotas:** Memory and CPU usage limits
+- **Execution Timeouts:** Bounded execution time
+- **Sandbox Execution:** Isolated execution environment
+
+**Operational Security:**
+
+- **Health Monitoring:** Continuous plugin health assessment
+- **Error Tracking:** Plugin error rate monitoring
+- **Automatic Recovery:** Failed plugin restart and recovery
+
+---
+
+## Plugin Performance & Monitoring
+
+### **Comprehensive Metrics System**
+
+Bifrost provides detailed metrics for plugin performance and health monitoring:
+
+```mermaid
+graph TB
+    subgraph "Execution Metrics"
+        ExecTime[Execution Time<br/>Latency per Plugin]
+        ExecCount[Execution Count<br/>Request Volume]
+        SuccessRate[Success Rate<br/>Error Percentage]
+        Throughput[Throughput<br/>Requests/Second]
+    end
+
+    subgraph "Resource Metrics"
+        MemoryUsage[Memory Usage<br/>Per Plugin Instance]
+        CPUUsage[CPU Utilization<br/>Processing Time]
+        IOMetrics[I/O Operations<br/>Network/Disk Activity]
+        PoolUtilization[Pool Utilization<br/>Resource Efficiency]
+    end
+
+    subgraph "Health Metrics"
+        ErrorRate[Error Rate<br/>Failed Executions]
+        PanicCount[Panic Recovery<br/>Crash Events]
+        TimeoutCount[Timeout Events<br/>Slow Executions]
+        RecoveryRate[Recovery Success<br/>Failure Handling]
+    end
+
+    subgraph "Business Metrics"
+        AddedLatency[Added Latency<br/>Plugin Overhead]
+        SystemImpact[System Impact<br/>Overall Performance]
+        FeatureUsage[Feature Usage<br/>Plugin Utilization]
+        CostImpact[Cost Impact<br/>Resource Consumption]
+    end
+```
+
+### **Performance Characteristics**
+
+**Plugin Execution Performance:**
+
+- **Typical Overhead:** 1-10μs per plugin for simple operations
+- **Authentication Plugins:** 1-5μs for key validation
+- **Rate Limiting Plugins:** 500ns for quota checks
+- **Monitoring Plugins:** 200ns for metric collection
+- **Transformation Plugins:** 2-10μs depending on complexity
+
+**Resource Usage Patterns:**
+
+- **Memory Efficiency:** Object pooling reduces allocations
+- **CPU Optimization:** Minimal processing overhead
+- **Network Impact:** Configurable external service calls
+- **Storage Overhead:** Minimal for stateless plugins
+
+---
+
+## Plugin Integration Patterns
+
+### **Common Integration Scenarios**
+
+**1. Authentication & Authorization**
+
+- **Pre-processing Hook:** Validate API keys or JWT tokens
+- **Configuration:** External identity provider integration
+- **Error Handling:** Return 401/403 responses for invalid credentials
+- **Performance:** Sub-5μs validation with caching
+
+**2. Rate Limiting & Quotas**
+
+- **Pre-processing Hook:** Check request quotas and limits
+- **Storage:** Redis or in-memory rate limit tracking
+- **Algorithms:** Token bucket, sliding window, fixed window
+- **Responses:** 429 Too Many Requests with retry headers
+
+**3. Request/Response Transformation**
+
+- **Dual Hooks:** Pre-processing for requests, post-processing for responses
+- **Use Cases:** Data format conversion, field mapping, content filtering
+- **Performance:** Streaming transformations for large payloads
+- **Compatibility:** Provider-specific format adaptations
+
+**4. Monitoring & Analytics**
+
+- **Post-processing Hook:** Collect metrics and logs after request completion
+- **Destinations:** Prometheus, DataDog, custom analytics systems
+- **Data:** Request/response metadata, performance metrics, error tracking
+- **Privacy:** Configurable data sanitization and filtering
+
+### **Plugin Communication Patterns**
+
+**Plugin-to-Plugin Communication:**
+
+- **Shared Context:** Plugins can store data in request context for downstream plugins
+- **Event System:** Plugin can emit events for other plugins to consume
+- **Data Passing:** Structured data exchange between related plugins
+
+**Plugin-to-External Service Communication:**
+
+- **HTTP Clients:** Built-in HTTP client pools for external API calls
+- **Database Connections:** Connection pooling for database access
+- **Message Queues:** Integration with message queue systems
+- **Caching Systems:** Redis, Memcached integration for state storage
+
+> **📖 Integration Examples:** [Plugin Development Guide →](../../enterprise/custom-plugins)
+
+---
+
+## Related Architecture Documentation
+
+- **[Request Flow](./request-flow)** - Plugin execution in request processing pipeline
+- **[Concurrency Model](./concurrency)** - Plugin concurrency and threading considerations
+- **[Benchmarks](../../benchmarking/getting-started)** - Plugin performance characteristics and optimization
+- **[MCP System](./mcp)** - Integration between plugins and MCP system
+
--- a/docs/architecture/core/providers.mdx
+++ b/docs/architecture/core/providers.mdx
--- a/docs/architecture/core/request-flow.mdx
+++ b/docs/architecture/core/request-flow.mdx
@@ -0,0 +1,527 @@
+---
+title: "Request Flow"
+description: "Deep dive into Bifrost's request processing pipeline - from transport layer ingestion through provider execution to response delivery."
+icon: "route"
+---
+
+## Stage 1: Transport Layer Processing
+
+### **HTTP Transport Flow**
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant HTTPTransport
+    participant Router
+    participant Validation
+
+    Client->>HTTPTransport: POST /v1/chat/completions
+    HTTPTransport->>HTTPTransport: Parse Headers
+    HTTPTransport->>HTTPTransport: Extract Body
+    HTTPTransport->>Validation: Validate JSON Schema
+    Validation->>Router: BifrostRequest
+    Router-->>HTTPTransport: Processing Started
+    HTTPTransport-->>Client: HTTP 200 (async processing)
+```
+
+**Key Processing Steps:**
+
+1. **Request Reception** - FastHTTP server receives request
+2. **Header Processing** - Extract authentication, content-type, custom headers
+3. **Body Parsing** - JSON unmarshaling with schema validation
+4. **Request Transformation** - Convert to internal `BifrostRequest` schema
+5. **Context Creation** - Build request context with metadata
+
+**Performance Characteristics:**
+
+- **Parsing Time:** ~2.1μs for typical requests
+- **Validation Overhead:** ~400ns for schema checks
+- **Memory Allocation:** Zero-copy where possible
+
+### **Go SDK Flow**
+
+```mermaid
+sequenceDiagram
+    participant Application
+    participant SDK
+    participant Core
+    participant Validation
+
+    Application->>SDK: bifrost.ChatCompletion(req)
+    SDK->>SDK: Type Validation
+    SDK->>Core: Direct Function Call
+    Core->>Validation: Schema Validation
+    Validation-->>Core: Validated Request
+    Core-->>SDK: Processing Result
+    SDK-->>Application: Typed Response
+```
+
+**Advantages:**
+
+- **Zero Serialization** - Direct Go struct passing
+- **Type Safety** - Compile-time validation
+- **Lower Latency** - No HTTP/JSON overhead
+- **Memory Efficiency** - No intermediate allocations
+
+---
+
+## Stage 2: Request Routing & Load Balancing
+
+### **Provider Selection Logic**
+
+```mermaid
+flowchart TD
+    Request[Incoming Request] --> ModelCheck{Model Available?}
+    ModelCheck -->|Yes| ProviderDirect[Use Specified Provider]
+    ModelCheck -->|No| ModelMapping[Model → Provider Mapping]
+
+    ProviderDirect --> KeyPool[API Key Pool]
+    ModelMapping --> KeyPool
+
+    KeyPool --> WeightedSelect[Weighted Random Selection]
+    WeightedSelect --> HealthCheck{Provider Healthy?}
+
+    HealthCheck -->|Yes| AssignWorker[Assign Worker]
+    HealthCheck -->|No| CircuitBreaker[Circuit Breaker]
+
+    CircuitBreaker --> FallbackCheck{Fallback Available?}
+    FallbackCheck -->|Yes| FallbackProvider[Try Fallback]
+    FallbackCheck -->|No| ErrorResponse[Return Error]
+
+    FallbackProvider --> KeyPool
+```
+
+**Key Selection Algorithm:**
+
+```go
+// Weighted random key selection
+type KeySelector struct {
+    keys    []APIKey
+    weights []float64
+    total   float64
+}
+
+func (ks *KeySelector) SelectKey() *APIKey {
+    r := rand.Float64() * ks.total
+    cumulative := 0.0
+
+    for i, weight := range ks.weights {
+        cumulative += weight
+        if r <= cumulative {
+            return &ks.keys[i]
+        }
+    }
+    return &ks.keys[len(ks.keys)-1]
+}
+```
+
+**Performance Metrics:**
+
+- **Key Selection Time:** ~10ns (constant time)
+- **Health Check Overhead:** ~50ns (cached results)
+- **Fallback Decision:** ~25ns (configuration lookup)
+
+---
+
+## Stage 3: Plugin Pipeline Processing
+
+### **Pre-Processing Hooks**
+
+```mermaid
+sequenceDiagram
+    participant Request
+    participant AuthPlugin
+    participant RateLimitPlugin
+    participant TransformPlugin
+    participant Core
+
+    Request->>AuthPlugin: ProcessRequest()
+    AuthPlugin->>AuthPlugin: Validate API Key
+    AuthPlugin->>RateLimitPlugin: Authorized Request
+
+    RateLimitPlugin->>RateLimitPlugin: Check Rate Limits
+    RateLimitPlugin->>TransformPlugin: Allowed Request
+
+    TransformPlugin->>TransformPlugin: Modify Request
+    TransformPlugin->>Core: Final Request
+```
+
+**Plugin Execution Model:**
+
+```go
+type PluginManager struct {
+    plugins []Plugin
+}
+
+func (pm *PluginManager) ExecutePreHooks(
+    ctx BifrostContext,
+    req *BifrostRequest,
+) (*BifrostRequest, *BifrostError) {
+    for _, plugin := range pm.plugins {
+        modifiedReq, err := plugin.ProcessRequest(ctx, req)
+        if err != nil {
+            return nil, err
+        }
+        req = modifiedReq
+    }
+    return req, nil
+}
+```
+
+**Plugin Types & Performance:**
+
+| Plugin Type           | Processing Time | Memory Impact | Failure Mode           |
+| --------------------- | --------------- | ------------- | ---------------------- |
+| **Authentication**    | ~1-5μs          | Minimal       | Reject request         |
+| **Rate Limiting**     | ~500ns          | Cache-based   | Throttle/reject        |
+| **Request Transform** | ~2-10μs         | Copy-on-write | Continue with original |
+| **Monitoring**        | ~200ns          | Append-only   | Continue silently      |
+
+---
+
+## Stage 4: MCP Tool Discovery & Integration
+
+### **Tool Discovery Process**
+
+```mermaid
+flowchart TD
+    Request[Request with Model] --> MCPCheck{MCP Enabled?}
+    MCPCheck -->|No| SkipMCP[Skip MCP Processing]
+    MCPCheck -->|Yes| ClientLookup[MCP Client Lookup]
+
+    ClientLookup --> ToolFilter[Tool Filtering]
+    ToolFilter --> ToolInject[Inject Tools into Request]
+
+    ToolFilter --> IncludeCheck{Include Filter?}
+    ToolFilter --> ExcludeCheck{Exclude Filter?}
+
+    IncludeCheck -->|Yes| IncludeTools[Include Specified Tools]
+    IncludeCheck -->|No| AllTools[Include All Tools]
+
+    ExcludeCheck -->|Yes| RemoveTools[Remove Excluded Tools]
+    ExcludeCheck -->|No| KeepFiltered[Keep Filtered Tools]
+
+    IncludeTools --> ToolInject
+    AllTools --> ToolInject
+    RemoveTools --> ToolInject
+    KeepFiltered --> ToolInject
+
+    ToolInject --> EnhancedRequest[Request with Tools]
+    SkipMCP --> EnhancedRequest
+```
+
+**Tool Integration Algorithm:**
+
+```go
+func (mcpm *MCPManager) EnhanceRequest(
+    ctx BifrostContext,
+    req *BifrostChatRequest,
+) (*BifrostRequest, error) {
+    // Extract tool filtering from context
+    includeClients := ctx.GetStringSlice("mcp-include-clients")
+    includeTools := ctx.GetStringSlice("mcp-include-tools")
+
+    // Get available tools
+    availableTools := mcpm.getAvailableTools(includeClients)
+
+    // Filter tools  
+    filteredTools := mcpm.filterTools(availableTools, includeTools)
+
+    // Inject into request
+    if req.Params == nil {
+        req.Params = &ChatParameters{}
+    }
+    req.Params.Tools = append(req.Params.Tools, filteredTools...)
+
+    return req, nil
+}
+```
+
+**MCP Performance Impact:**
+
+- **Tool Discovery:** ~100-500μs (cached after first request)
+- **Tool Filtering:** ~50-200ns per tool
+- **Request Enhancement:** ~1-5μs depending on tool count
+
+---
+
+## Stage 5: Memory Pool Management
+
+### **Object Pool Lifecycle**
+
+```mermaid
+stateDiagram-v2
+    [*] --> PoolInit: System Startup
+    PoolInit --> Available: Objects Pre-allocated
+
+    Available --> Acquired: Request Processing
+    Acquired --> InUse: Object Populated
+    InUse --> Processing: Worker Processing
+    Processing --> Completed: Processing Done
+    Completed --> Reset: Object Cleanup
+    Reset --> Available: Return to Pool
+
+    Available --> Expansion: Pool Exhaustion
+    Expansion --> Available: New Objects Created
+
+    Reset --> GC: Pool Full
+    GC --> [*]: Garbage Collection
+```
+
+**Memory Pool Implementation:**
+
+```go
+type MemoryPools struct {
+    channelPool  sync.Pool
+    messagePool  sync.Pool
+    responsePool sync.Pool
+    bufferPool   sync.Pool
+}
+
+func (mp *MemoryPools) GetChannel() *ProcessingChannel {
+    if ch := mp.channelPool.Get(); ch != nil {
+        return ch.(*ProcessingChannel)
+    }
+    return NewProcessingChannel()
+}
+
+func (mp *MemoryPools) ReturnChannel(ch *ProcessingChannel) {
+    ch.Reset() // Clear previous data
+    mp.channelPool.Put(ch)
+}
+```
+
+---
+
+## Stage 6: Worker Pool Processing
+
+### **Worker Assignment & Execution**
+
+```mermaid
+sequenceDiagram
+    participant Queue
+    participant WorkerPool
+    participant Worker
+    participant Provider
+    participant Circuit
+
+    Queue->>WorkerPool: Enqueue Request
+    WorkerPool->>Worker: Assign Available Worker
+    Worker->>Circuit: Check Circuit Breaker
+    Circuit->>Provider: Forward Request
+
+    Provider-->>Circuit: Response/Error
+    Circuit->>Circuit: Update Health Metrics
+    Circuit-->>Worker: Provider Response
+    Worker-->>WorkerPool: Release Worker
+    WorkerPool-->>Queue: Request Completed
+```
+
+**Worker Pool Architecture:**
+
+```go
+type ProviderWorkerPool struct {
+    workers    chan *Worker
+    queue      chan *ProcessingJob
+    config     WorkerPoolConfig
+    metrics    *PoolMetrics
+}
+
+func (pwp *ProviderWorkerPool) ProcessRequest(job *ProcessingJob) {
+    // Get worker from pool
+    worker := <-pwp.workers
+
+    go func() {
+        defer func() {
+            // Return worker to pool
+            pwp.workers <- worker
+        }()
+
+        // Process request
+        result := worker.Execute(job)
+        job.ResultChan <- result
+    }()
+}
+```
+
+---
+
+## Stage 7: Provider API Communication
+
+### **HTTP Request Execution**
+
+```mermaid
+sequenceDiagram
+    participant Worker
+    participant HTTPClient
+    participant Provider
+    participant CircuitBreaker
+    participant Metrics
+
+    Worker->>HTTPClient: PrepareRequest()
+    HTTPClient->>HTTPClient: Add Headers & Auth
+    HTTPClient->>CircuitBreaker: CheckHealth()
+    CircuitBreaker->>Provider: HTTP Request
+
+    Provider-->>CircuitBreaker: HTTP Response
+    CircuitBreaker->>Metrics: Record Metrics
+    CircuitBreaker-->>HTTPClient: Response/Error
+    HTTPClient-->>Worker: Parsed Response
+```
+
+**Request Preparation Pipeline:**
+
+```go
+func (w *ProviderWorker) ExecuteRequest(job *ProcessingJob) *ProviderResponse {
+    // Prepare HTTP request
+    httpReq := w.prepareHTTPRequest(job.Request)
+
+    // Add authentication
+    w.addAuthentication(httpReq, job.APIKey)
+
+    // Execute with timeout
+    ctx, cancel := context.WithTimeout(context.Background(), job.Timeout)
+    defer cancel()
+
+    httpResp, err := w.httpClient.Do(httpReq.WithContext(ctx))
+    if err != nil {
+        return w.handleError(err, job)
+    }
+
+    // Parse response
+    return w.parseResponse(httpResp, job)
+}
+```
+
+---
+
+## Stage 8: Tool Execution & Response Processing
+
+### **MCP Tool Execution Flow**
+
+```mermaid
+sequenceDiagram
+    participant Provider
+    participant MCPProcessor
+    participant MCPServer
+    participant ToolExecutor
+    participant ResponseBuilder
+
+    Provider->>MCPProcessor: Response with Tool Calls
+    MCPProcessor->>MCPProcessor: Extract Tool Calls
+
+    loop For each tool call
+        MCPProcessor->>MCPServer: Execute Tool
+        MCPServer->>ToolExecutor: Tool Invocation
+        ToolExecutor-->>MCPServer: Tool Result
+        MCPServer-->>MCPProcessor: Tool Response
+    end
+
+    MCPProcessor->>ResponseBuilder: Combine Results
+    ResponseBuilder-->>Provider: Enhanced Response
+```
+
+**Tool Execution Pipeline:**
+
+```go
+func (mcp *MCPProcessor) ProcessToolCalls(
+    response *ProviderResponse,
+) (*ProviderResponse, error) {
+    toolCalls := mcp.extractToolCalls(response)
+    if len(toolCalls) == 0 {
+        return response, nil
+    }
+
+    // Execute tools concurrently
+    results := make(chan ToolResult, len(toolCalls))
+    for _, toolCall := range toolCalls {
+        go func(tc ToolCall) {
+            result := mcp.executeTool(tc)
+            results <- result
+        }(toolCall)
+    }
+
+    // Collect results
+    toolResults := make([]ToolResult, 0, len(toolCalls))
+    for i := 0; i < len(toolCalls); i++ {
+        toolResults = append(toolResults, <-results)
+    }
+
+    // Enhance response
+    return mcp.enhanceResponse(response, toolResults), nil
+}
+```
+
+---
+
+## Stage 9: Post-Processing & Response Formation
+
+### **Plugin Post-Processing**
+
+```mermaid
+sequenceDiagram
+    participant CoreResponse
+    participant LoggingPlugin
+    participant CachePlugin
+    participant MetricsPlugin
+    participant Transport
+
+    CoreResponse->>LoggingPlugin: ProcessResponse()
+    LoggingPlugin->>LoggingPlugin: Log Request/Response
+    LoggingPlugin->>CachePlugin: Response + Logs
+
+    CachePlugin->>CachePlugin: Cache Response
+    CachePlugin->>MetricsPlugin: Cached Response
+
+    MetricsPlugin->>MetricsPlugin: Record Metrics
+    MetricsPlugin->>Transport: Final Response
+```
+
+**Response Enhancement Pipeline:**
+
+```go
+func (pm *PluginManager) ExecutePostHooks(
+    ctx BifrostContext,
+    req *BifrostRequest,
+    resp *BifrostResponse,
+) (*BifrostResponse, error) {
+    for _, plugin := range pm.plugins {
+        enhancedResp, err := plugin.ProcessResponse(ctx, req, resp)
+        if err != nil {
+            // Log error but continue processing
+            pm.logger.Warn("Plugin post-processing error", "plugin", plugin.Name(), "error", err)
+            continue
+        }
+        resp = enhancedResp
+    }
+    return resp, nil
+}
+```
+
+### **Response Serialization**
+
+```mermaid
+flowchart TD
+    Response[BifrostResponse] --> Format{Response Format}
+    Format -->|HTTP| JSONSerialize[JSON Serialization]
+    Format -->|SDK| DirectReturn[Direct Go Struct]
+
+    JSONSerialize --> Compress[Compression]
+    DirectReturn --> TypeCheck[Type Validation]
+
+    Compress --> Headers[Set Headers]
+    TypeCheck --> Return[Return Response]
+
+    Headers --> HTTPResponse[HTTP Response]
+    HTTPResponse --> Client[Client Response]
+    Return --> Client
+```
+
+---
+
+## Related Architecture Documentation
+
+- **[Concurrency Model](./concurrency)** - Worker pools and threading details
+- **[Plugin System](./plugins)** - Plugin execution and lifecycle
+- **[MCP System](./mcp)** - Tool discovery and execution internals
+- **[Benchmarks](../../benchmarking/getting-started)** - Detailed performance analysis