first commit
This commit is contained in:
81
docs/benchmarking/getting-started.mdx
Normal file
81
docs/benchmarking/getting-started.mdx
Normal file
@@ -0,0 +1,81 @@
|
||||
---
|
||||
title: "Getting Started"
|
||||
description: "Introduction to Bifrost's performance capabilities and how to choose the right instance size for your workload."
|
||||
icon: "rocket"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Bifrost has been rigorously tested under high load conditions to ensure optimal performance for production deployments. Our benchmark tests demonstrate exceptional performance characteristics at **5,000 requests per second (RPS)** across different AWS EC2 instance types.
|
||||
|
||||
**Key Performance Highlights:**
|
||||
- **Perfect Success Rate**: 100% request success rate under high load
|
||||
- **Minimal Overhead**: Less than 15µs added latency per request on average
|
||||
- **Efficient Queue Management**: Sub-microsecond queue wait times on optimized instances
|
||||
- **Fast Key Selection**: Near-instantaneous weighted API key selection (~10 ns)
|
||||
|
||||
---
|
||||
|
||||
## Test Environment Summary
|
||||
|
||||
Bifrost was benchmarked on two primary AWS EC2 instance configurations:
|
||||
|
||||
### **t3.medium (2 vCPUs, 4GB RAM)**
|
||||
- **Buffer Size**: 15,000
|
||||
- **Initial Pool Size**: 10,000
|
||||
- **Use Case**: Cost-effective option for moderate workloads
|
||||
|
||||
### **t3.xlarge (4 vCPUs, 16GB RAM)**
|
||||
- **Buffer Size**: 20,000
|
||||
- **Initial Pool Size**: 15,000
|
||||
- **Use Case**: High-performance option for demanding workloads
|
||||
|
||||
---
|
||||
|
||||
## Performance Comparison at a Glance
|
||||
|
||||
| Metric | t3.medium | t3.xlarge | Improvement |
|
||||
|--------|-----------|-----------|-------------|
|
||||
| **Success Rate @ 5k RPS** | 100% | 100% | No failed requests |
|
||||
| **Bifrost Overhead** | 59 µs | 11 µs | **-81%** |
|
||||
| **Average Latency** | 2.12s | 1.61s | **-24%** |
|
||||
| **Queue Wait Time** | 47.13 µs | 1.67 µs | **-96%** |
|
||||
| **JSON Marshaling** | 63.47 µs | 26.80 µs | **-58%** |
|
||||
| **Response Parsing** | 11.30 ms | 2.11 ms | **-81%** |
|
||||
| **Peak Memory Usage** | 1,312.79 MB | 3,340.44 MB | +155% |
|
||||
|
||||
> **Note**: t3.xlarge tests used significantly larger response payloads (~10 KB vs ~1 KB), yet still achieved better performance metrics.
|
||||
|
||||
<Note>
|
||||
All benchmarks are on mocked OpenAI calls, whose latency and payload size are mentioned in the respective analysis pages.
|
||||
</Note>
|
||||
|
||||
---
|
||||
|
||||
## Configuration Flexibility
|
||||
|
||||
One of Bifrost's key strengths is its **configuration flexibility**. You can fine-tune the speed ↔ memory trade-off based on your specific requirements:
|
||||
|
||||
| Configuration Parameter | Effect |
|
||||
|------------------------|--------|
|
||||
| `initial_pool_size` | Higher values = faster performance, more memory usage |
|
||||
| `buffer_size` & `concurrency` | Controls queue depth and max parallel workers (per provider) |
|
||||
| `retry` & `timeout` | Tune aggressiveness for each provider to meet your SLOs |
|
||||
|
||||
**Configuration Philosophy:**
|
||||
- **Higher settings** (like t3.xlarge profile) prioritize raw speed
|
||||
- **Lower settings** (like t3.medium profile) optimize for memory efficiency
|
||||
- **Custom tuning** lets you find the sweet spot for your specific workload
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
### **Detailed Performance Analysis**
|
||||
- **[t3.medium Performance](./t3.medium)** - Deep dive into cost-effective performance
|
||||
- **[t3.xlarge Performance](./t3.xl)** - High-performance configuration analysis
|
||||
|
||||
### **Run Your Own Tests**
|
||||
- **[Run Your Own Benchmarks](./run-your-own-benchmarks)** - Step-by-step guide to benchmark Bifrost in your environment
|
||||
|
||||
Ready to dive deeper? Choose your instance type above or learn how to run your own performance tests.
|
||||
Reference in New Issue
Block a user