82 lines
3.2 KiB
Plaintext
82 lines
3.2 KiB
Plaintext
---
|
|
title: "Getting Started"
|
|
description: "Introduction to Bifrost's performance capabilities and how to choose the right instance size for your workload."
|
|
icon: "rocket"
|
|
---
|
|
|
|
## Overview
|
|
|
|
Bifrost has been rigorously tested under high load conditions to ensure optimal performance for production deployments. Our benchmark tests demonstrate exceptional performance characteristics at **5,000 requests per second (RPS)** across different AWS EC2 instance types.
|
|
|
|
**Key Performance Highlights:**
|
|
- **Perfect Success Rate**: 100% request success rate under high load
|
|
- **Minimal Overhead**: Less than 15µs added latency per request on average
|
|
- **Efficient Queue Management**: Sub-microsecond queue wait times on optimized instances
|
|
- **Fast Key Selection**: Near-instantaneous weighted API key selection (~10 ns)
|
|
|
|
---
|
|
|
|
## Test Environment Summary
|
|
|
|
Bifrost was benchmarked on two primary AWS EC2 instance configurations:
|
|
|
|
### **t3.medium (2 vCPUs, 4GB RAM)**
|
|
- **Buffer Size**: 15,000
|
|
- **Initial Pool Size**: 10,000
|
|
- **Use Case**: Cost-effective option for moderate workloads
|
|
|
|
### **t3.xlarge (4 vCPUs, 16GB RAM)**
|
|
- **Buffer Size**: 20,000
|
|
- **Initial Pool Size**: 15,000
|
|
- **Use Case**: High-performance option for demanding workloads
|
|
|
|
---
|
|
|
|
## Performance Comparison at a Glance
|
|
|
|
| Metric | t3.medium | t3.xlarge | Improvement |
|
|
|--------|-----------|-----------|-------------|
|
|
| **Success Rate @ 5k RPS** | 100% | 100% | No failed requests |
|
|
| **Bifrost Overhead** | 59 µs | 11 µs | **-81%** |
|
|
| **Average Latency** | 2.12s | 1.61s | **-24%** |
|
|
| **Queue Wait Time** | 47.13 µs | 1.67 µs | **-96%** |
|
|
| **JSON Marshaling** | 63.47 µs | 26.80 µs | **-58%** |
|
|
| **Response Parsing** | 11.30 ms | 2.11 ms | **-81%** |
|
|
| **Peak Memory Usage** | 1,312.79 MB | 3,340.44 MB | +155% |
|
|
|
|
> **Note**: t3.xlarge tests used significantly larger response payloads (~10 KB vs ~1 KB), yet still achieved better performance metrics.
|
|
|
|
<Note>
|
|
All benchmarks are on mocked OpenAI calls, whose latency and payload size are mentioned in the respective analysis pages.
|
|
</Note>
|
|
|
|
---
|
|
|
|
## Configuration Flexibility
|
|
|
|
One of Bifrost's key strengths is its **configuration flexibility**. You can fine-tune the speed ↔ memory trade-off based on your specific requirements:
|
|
|
|
| Configuration Parameter | Effect |
|
|
|------------------------|--------|
|
|
| `initial_pool_size` | Higher values = faster performance, more memory usage |
|
|
| `buffer_size` & `concurrency` | Controls queue depth and max parallel workers (per provider) |
|
|
| `retry` & `timeout` | Tune aggressiveness for each provider to meet your SLOs |
|
|
|
|
**Configuration Philosophy:**
|
|
- **Higher settings** (like t3.xlarge profile) prioritize raw speed
|
|
- **Lower settings** (like t3.medium profile) optimize for memory efficiency
|
|
- **Custom tuning** lets you find the sweet spot for your specific workload
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
### **Detailed Performance Analysis**
|
|
- **[t3.medium Performance](./t3.medium)** - Deep dive into cost-effective performance
|
|
- **[t3.xlarge Performance](./t3.xl)** - High-performance configuration analysis
|
|
|
|
### **Run Your Own Tests**
|
|
- **[Run Your Own Benchmarks](./run-your-own-benchmarks)** - Step-by-step guide to benchmark Bifrost in your environment
|
|
|
|
Ready to dive deeper? Choose your instance type above or learn how to run your own performance tests.
|