Files
bifrost/docs/benchmarking/getting-started.mdx
Beyhan Oğur 880f412e2c first commit
2026-04-26 21:52:23 +03:00

82 lines
3.2 KiB
Plaintext

---
title: "Getting Started"
description: "Introduction to Bifrost's performance capabilities and how to choose the right instance size for your workload."
icon: "rocket"
---
## Overview
Bifrost has been rigorously tested under high load conditions to ensure optimal performance for production deployments. Our benchmark tests demonstrate exceptional performance characteristics at **5,000 requests per second (RPS)** across different AWS EC2 instance types.
**Key Performance Highlights:**
- **Perfect Success Rate**: 100% request success rate under high load
- **Minimal Overhead**: Less than 15µs added latency per request on average
- **Efficient Queue Management**: Sub-microsecond queue wait times on optimized instances
- **Fast Key Selection**: Near-instantaneous weighted API key selection (~10 ns)
---
## Test Environment Summary
Bifrost was benchmarked on two primary AWS EC2 instance configurations:
### **t3.medium (2 vCPUs, 4GB RAM)**
- **Buffer Size**: 15,000
- **Initial Pool Size**: 10,000
- **Use Case**: Cost-effective option for moderate workloads
### **t3.xlarge (4 vCPUs, 16GB RAM)**
- **Buffer Size**: 20,000
- **Initial Pool Size**: 15,000
- **Use Case**: High-performance option for demanding workloads
---
## Performance Comparison at a Glance
| Metric | t3.medium | t3.xlarge | Improvement |
|--------|-----------|-----------|-------------|
| **Success Rate @ 5k RPS** | 100% | 100% | No failed requests |
| **Bifrost Overhead** | 59 µs | 11 µs | **-81%** |
| **Average Latency** | 2.12s | 1.61s | **-24%** |
| **Queue Wait Time** | 47.13 µs | 1.67 µs | **-96%** |
| **JSON Marshaling** | 63.47 µs | 26.80 µs | **-58%** |
| **Response Parsing** | 11.30 ms | 2.11 ms | **-81%** |
| **Peak Memory Usage** | 1,312.79 MB | 3,340.44 MB | +155% |
> **Note**: t3.xlarge tests used significantly larger response payloads (~10 KB vs ~1 KB), yet still achieved better performance metrics.
<Note>
All benchmarks are on mocked OpenAI calls, whose latency and payload size are mentioned in the respective analysis pages.
</Note>
---
## Configuration Flexibility
One of Bifrost's key strengths is its **configuration flexibility**. You can fine-tune the speed ↔ memory trade-off based on your specific requirements:
| Configuration Parameter | Effect |
|------------------------|--------|
| `initial_pool_size` | Higher values = faster performance, more memory usage |
| `buffer_size` & `concurrency` | Controls queue depth and max parallel workers (per provider) |
| `retry` & `timeout` | Tune aggressiveness for each provider to meet your SLOs |
**Configuration Philosophy:**
- **Higher settings** (like t3.xlarge profile) prioritize raw speed
- **Lower settings** (like t3.medium profile) optimize for memory efficiency
- **Custom tuning** lets you find the sweet spot for your specific workload
---
## Next Steps
### **Detailed Performance Analysis**
- **[t3.medium Performance](./t3.medium)** - Deep dive into cost-effective performance
- **[t3.xlarge Performance](./t3.xl)** - High-performance configuration analysis
### **Run Your Own Tests**
- **[Run Your Own Benchmarks](./run-your-own-benchmarks)** - Step-by-step guide to benchmark Bifrost in your environment
Ready to dive deeper? Choose your instance type above or learn how to run your own performance tests.