Documentation Index
Fetch the complete documentation index at: https://docs.orq.ai/llms.txt
Use this file to discover all available pages before exploring further.
Quick Start
Distribute requests across multiple providers using weighted routing.Configuration
| Parameter | Type | Required | Description |
|---|---|---|---|
load_balancer | Object | Yes | Load balancer configuration (top-level) |
load_balancer.type | string | Yes | Strategy type (weight_based) |
load_balancer.models | Array | Yes | List of models with weights |
models[].model | string | Yes | Model identifier |
models[].weight | number | No | Relative weight (0.001 - 1.0, default 0.5) |
- Weights are normalized:
[0.4, 0.8]→[33%, 67%] - Higher weight = more traffic
- Minimum weight:
0.001 - Default weight:
0.5
Common Patterns
Use Cases
| Scenario | Weight Strategy | Example |
|---|---|---|
| Cost optimization | Heavy on cheaper models | 80% GPT-3.5, 20% GPT-4 |
| Performance testing | Small traffic to new model | 95% current, 5% experimental |
| Provider redundancy | Split across providers | 60% OpenAI, 40% Anthropic |
| Capacity management | Distribute during peaks | Even split across models |
See also: Organization-level load balancing
To apply load balancing across your organization without changing request code, use Routing Rules to configure Fallback, Weighted, and Round Robin strategies at the workspace level.
Code examples
Monitoring
Track these metrics for optimal load balancing:- Traffic distribution: Actual vs expected percentages
- Cost per model: Monitor spending across providers
- Response times: Compare latency by model
- Error rates: Track failures by provider
Troubleshooting
Uneven distribution- Check if weights are normalized correctly
- Verify sufficient request volume (min 100 requests for accuracy)
- Monitor over longer time periods
- Track actual vs expected cost distribution
- Monitor for expensive model overuse
- Set up cost alerts per provider
- Check latency differences between models
- Monitor for provider-specific slowdowns
- Adjust weights based on performance data
Limitations
- Probabilistic routing: Short-term traffic may not match exact weights
- Minimum volume needed: Requires sufficient requests for statistical accuracy
- Response variations: Different models may return varying output quality
- Cost complexity: Managing billing across multiple providers
- Provider dependencies: Requires API access to all models