Orq.ai Documentation - AI Gateway & LLM Collaboration Platform

Retries

Retry failed requests automatically with exponential backoff. Configure which HTTP error codes trigger retries and how many attempts to make.

Fallbacks

Route to a different model when the primary fails. Define a fallback chain across providers for high availability.

Retries

Automatically retry failed requests with exponential backoff.

Quick Start

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [{ role: "user", content: "Analyze customer feedback" }],
  retry: {
    count: 3,
    on_codes: [429, 500, 502, 503, 504],
  },
});

Configuration

Parameter	Type	Required	Description
`count`	number	Yes	Max retry attempts (1-5)
`on_codes`	number[]	No	HTTP status codes that trigger retries (default: [429])

Error Codes

Code	Meaning	Retry?	Common Cause
`429`	Rate limit exceeded	Yes	Too many requests
`500`	Internal server error	Yes	Provider issue
`501`	Not implemented	Yes	Feature unavailable
`502`	Bad gateway	Yes	Network/Gateway issue
`503`	Service unavailable	Yes	Provider maintenance
`504`	Gateway timeout	Yes	Provider overload
`400`	Bad request	No	Invalid parameters
`401`	Unauthorized	No	Invalid API key
`403`	Forbidden	No	Access denied

Retry Strategies

// Conservative (production)
retry: {
  count: 2,
  on_codes: [429, 503]  // Only rate limits and service unavailable
}

// Balanced (recommended)
retry: {
  count: 3,
  on_codes: [429, 500, 502, 503, 504]  // All transient errors
}

// Aggressive (development)
retry: {
  count: 5,
  on_codes: [429, 500, 502, 503, 504]  // Max retries
}

Backoff Algorithm

Exponential backoff with jitter

Attempt 1: 1s (±25%)
Attempt 2: 2s (±25%)
Attempt 3: 4s (±25%)
Attempt 4: 8s (±25%)
Attempt 5: 16s (±25%)

Maximum total delay: ~31 seconds for 5 retries

Code examples

curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Analyze customer feedback and provide sentiment analysis"
      }
    ],
    "retry": {
      "count": 3,
      "on_codes": [429, 500, 502, 503, 504]
    }
  }'

Best Practices

Production recommendations

Follow the following advice for a best production setup:

Use count: 2-3 for balance of reliability and speed
Always include 429 (rate limits) in on_codes
Monitor retry rates to detect systemic issues
Implement circuit breaker for persistent failures

Error handling

try {
  const response = await openai.chat.completions.create({...});
} catch (error) {
  if (error.status === 400) {
    // Don't retry client errors - fix the request
    console.error('Bad request:', error.message);
  } else if (error.status >= 500) {
    // Server errors might need manual intervention
    console.error('Server error:', error.message);
  }
}

Troubleshooting

High retry rates

Check if you’re hitting rate limits frequently
Verify API keys have sufficient quotas
Monitor provider status pages for outages

Slow response times

Reduce retry count for latency-sensitive apps
Use shorter timeout values with retries
Consider fallbacks for faster alternatives

Still getting errors

Check if error codes are in on_codes list
Verify retry count isn’t exhausted
Review provider-specific error documentation

Monitoring

Track these retry metrics:

const retryMetrics = {
  totalRequests: 0,
  retriedRequests: 0,
  retriesByAttempt: { 1: 0, 2: 0, 3: 0 }, // Retry attempt distribution
  retriesByCode: { 429: 0, 500: 0 }, // By error code
  avgRetryLatency: 0, // Added latency from retries
  finalFailures: 0, // Requests that failed after all retries
};

Limitations

Increased latency: Retries add delay (up to 31s for 5 attempts)
Cost implications: Failed requests may still incur charges
Rate limit consumption: Each retry counts against quotas
Limited retries: Maximum 5 attempts to prevent excessive delays
Non-retryable errors: 4xx client errors are not retried

Advanced Usage

Environment-specific configs:

const retryConfig = {
  development: { count: 1, on_codes: [429] }, // Fast feedback
  staging: { count: 2, on_codes: [429, 503] }, // Light retries
  production: { count: 3, on_codes: [429, 500, 502, 503, 504] }, // Full protection
};

With other features:

{
  "retry": { "count": 3, "on_codes": [429, 503] },
  "timeout": { "call_timeout": 10000 },
  "fallbacks": [{ "model": "backup-model" }],
  "cache": { "type": "exact_match", "ttl": 300 }
}

Custom retry logic (client-side):

const customRetry = async (requestFn, maxAttempts = 3) => {
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      return await requestFn();
    } catch (error) {
      if (attempt === maxAttempts || error.status < 500) {
        throw error; // Final attempt or non-retryable error
      }
      await new Promise(
        (resolve) => setTimeout(resolve, Math.pow(2, attempt) * 1000), // Exponential backoff
      );
    }
  }
};

Fallbacks

Automatically switch to a different model when the primary fails.

Quick Start

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [{ role: "user", content: "Generate a product description" }],
  fallbacks: [{ model: "openai/gpt-4o" }, { model: "azure/gpt-4o" }],
});

Configuration

Parameter	Type	Required	Description
`fallbacks`	Array	Yes	List of fallback models in order of preference
`model`	string	Yes	Model identifier for each fallback

Trigger Conditions

Fallbacks activate on these errors:

Error Code	Description	Triggers Fallback
`429`	Rate limit exceeded	Yes
`500`	Internal server error	Yes
`501`	Not implemented	Yes
`502`	Bad gateway	Yes
`503`	Service unavailable	Yes
`504`	Gateway timeout	Yes
`400`	Bad request	No
`401`	Unauthorized	No
`403`	Forbidden	No

Best Practices

Use a maximum of 3 fallback models. Order them by preference or cost, and choose models with similar capabilities.

// Cost-optimized: cheap then expensive
fallbacks: [{ model: "openai/gpt-3.5-turbo" }, { model: "openai/gpt-4o" }];

// Reliability-optimized: different providers
fallbacks: [
  { model: "openai/gpt-4o" },
  { model: "anthropic/claude-sonnet-4-0" },
  { model: "azure/gpt-4o" },
];

Code examples

curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{ "role": "user", "content": "Generate a product description" }],
    "fallbacks": [
      { "model": "openai/gpt-4o" },
      { "model": "azure/gpt-4o" }
    ]
  }'

Limitations

Response consistency: Different models may return varying output styles
Parameter support: Not all providers support identical parameters
Cost implications: Failed requests may still incur charges from the primary provider
Latency impact: Sequential attempts add processing time
Provider dependencies: Requires API keys for all fallback providers

Get Started

AI Router

Observe

Build

Optimize

Manage Prompts

AI Router | Retries and Fallbacks

Retries

Fallbacks

Retries

Quick Start

Configuration

Error Codes

Retry Strategies

Backoff Algorithm

Exponential backoff with jitter

Code examples

Best Practices

Production recommendations

Error handling

Troubleshooting

Monitoring

Limitations

Advanced Usage

Fallbacks

Quick Start

Configuration

Trigger Conditions

Best Practices

Code examples

Limitations

Get Started

AI Router

Observe

Build

Optimize

Manage Prompts

Documentation Index

Retries

Fallbacks

​Retries

​Quick Start

​Configuration

​Error Codes

​Retry Strategies

​Backoff Algorithm

​Exponential backoff with jitter

​Code examples

​Best Practices

​Production recommendations

​Error handling

​Troubleshooting

​Monitoring

​Limitations

​Advanced Usage

​Fallbacks

​Quick Start

​Configuration

​Trigger Conditions

​Best Practices

​Code examples

​Limitations

Retries

Quick Start

Configuration

Error Codes

Retry Strategies

Backoff Algorithm

Exponential backoff with jitter

Code examples

Best Practices

Production recommendations

Error handling

Troubleshooting

Monitoring

Limitations

Advanced Usage

Fallbacks

Quick Start

Configuration

Trigger Conditions

Best Practices

Code examples

Limitations