Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.orq.ai/llms.txt

Use this file to discover all available pages before exploring further.

Retries

Retry failed requests automatically with exponential backoff. Configure which HTTP error codes trigger retries and how many attempts to make.

Fallbacks

Route to a different model when the primary fails. Define a fallback chain across providers for high availability.

Retries

Automatically retry failed requests with exponential backoff.

Quick Start

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [{ role: "user", content: "Analyze customer feedback" }],
  retry: {
    count: 3,
    on_codes: [429, 500, 502, 503, 504],
  },
});

Configuration

ParameterTypeRequiredDescription
countnumberYesMax retry attempts (1-5)
on_codesnumber[]NoHTTP status codes that trigger retries (default: [429])

Error Codes

CodeMeaningRetry?Common Cause
429Rate limit exceeded YesToo many requests
500Internal server error YesProvider issue
501Not implemented YesFeature unavailable
502Bad gateway YesNetwork/Gateway issue
503Service unavailable YesProvider maintenance
504Gateway timeout YesProvider overload
400Bad request NoInvalid parameters
401Unauthorized NoInvalid API key
403Forbidden NoAccess denied

Retry Strategies

// Conservative (production)
retry: {
  count: 2,
  on_codes: [429, 503]  // Only rate limits and service unavailable
}

// Balanced (recommended)
retry: {
  count: 3,
  on_codes: [429, 500, 502, 503, 504]  // All transient errors
}

// Aggressive (development)
retry: {
  count: 5,
  on_codes: [429, 500, 502, 503, 504]  // Max retries
}

Backoff Algorithm

Exponential backoff with jitter

  • Attempt 1: 1s (±25%)
  • Attempt 2: 2s (±25%)
  • Attempt 3: 4s (±25%)
  • Attempt 4: 8s (±25%)
  • Attempt 5: 16s (±25%)
Maximum total delay: ~31 seconds for 5 retries

Code examples

curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Analyze customer feedback and provide sentiment analysis"
      }
    ],
    "retry": {
      "count": 3,
      "on_codes": [429, 500, 502, 503, 504]
    }
  }'

Best Practices

Production recommendations

Follow the following advice for a best production setup:
  • Use count: 2-3 for balance of reliability and speed
  • Always include 429 (rate limits) in on_codes
  • Monitor retry rates to detect systemic issues
  • Implement circuit breaker for persistent failures

Error handling

try {
  const response = await openai.chat.completions.create({...});
} catch (error) {
  if (error.status === 400) {
    // Don't retry client errors - fix the request
    console.error('Bad request:', error.message);
  } else if (error.status >= 500) {
    // Server errors might need manual intervention
    console.error('Server error:', error.message);
  }
}

Troubleshooting

High retry rates
  • Check if you’re hitting rate limits frequently
  • Verify API keys have sufficient quotas
  • Monitor provider status pages for outages
Slow response times
  • Reduce retry count for latency-sensitive apps
  • Use shorter timeout values with retries
  • Consider fallbacks for faster alternatives
Still getting errors
  • Check if error codes are in on_codes list
  • Verify retry count isn’t exhausted
  • Review provider-specific error documentation

Monitoring

Track these retry metrics:
const retryMetrics = {
  totalRequests: 0,
  retriedRequests: 0,
  retriesByAttempt: { 1: 0, 2: 0, 3: 0 }, // Retry attempt distribution
  retriesByCode: { 429: 0, 500: 0 }, // By error code
  avgRetryLatency: 0, // Added latency from retries
  finalFailures: 0, // Requests that failed after all retries
};

Limitations

  • Increased latency: Retries add delay (up to 31s for 5 attempts)
  • Cost implications: Failed requests may still incur charges
  • Rate limit consumption: Each retry counts against quotas
  • Limited retries: Maximum 5 attempts to prevent excessive delays
  • Non-retryable errors: 4xx client errors are not retried

Advanced Usage

Environment-specific configs:
const retryConfig = {
  development: { count: 1, on_codes: [429] }, // Fast feedback
  staging: { count: 2, on_codes: [429, 503] }, // Light retries
  production: { count: 3, on_codes: [429, 500, 502, 503, 504] }, // Full protection
};
With other features:
{
  "retry": { "count": 3, "on_codes": [429, 503] },
  "timeout": { "call_timeout": 10000 },
  "fallbacks": [{ "model": "backup-model" }],
  "cache": { "type": "exact_match", "ttl": 300 }
}
Custom retry logic (client-side):
const customRetry = async (requestFn, maxAttempts = 3) => {
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      return await requestFn();
    } catch (error) {
      if (attempt === maxAttempts || error.status < 500) {
        throw error; // Final attempt or non-retryable error
      }
      await new Promise(
        (resolve) => setTimeout(resolve, Math.pow(2, attempt) * 1000), // Exponential backoff
      );
    }
  }
};

Fallbacks

Automatically switch to a different model when the primary fails.

Quick Start

const response = await openai.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [{ role: "user", content: "Generate a product description" }],
  fallbacks: [{ model: "openai/gpt-4o" }, { model: "azure/gpt-4o" }],
});

Configuration

ParameterTypeRequiredDescription
fallbacksArrayYesList of fallback models in order of preference
modelstringYesModel identifier for each fallback

Trigger Conditions

Fallbacks activate on these errors:
Error CodeDescriptionTriggers Fallback
429Rate limit exceeded Yes
500Internal server error Yes
501Not implemented Yes
502Bad gateway Yes
503Service unavailable Yes
504Gateway timeout Yes
400Bad request No
401Unauthorized No
403Forbidden No

Best Practices

Use a maximum of 3 fallback models. Order them by preference or cost, and choose models with similar capabilities.
// Cost-optimized: cheap then expensive
fallbacks: [{ model: "openai/gpt-3.5-turbo" }, { model: "openai/gpt-4o" }];

// Reliability-optimized: different providers
fallbacks: [
  { model: "openai/gpt-4o" },
  { model: "anthropic/claude-sonnet-4-0" },
  { model: "azure/gpt-4o" },
];

Code examples

curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{ "role": "user", "content": "Generate a product description" }],
    "fallbacks": [
      { "model": "openai/gpt-4o" },
      { "model": "azure/gpt-4o" }
    ]
  }'

Limitations

  • Response consistency: Different models may return varying output styles
  • Parameter support: Not all providers support identical parameters
  • Cost implications: Failed requests may still incur charges from the primary provider
  • Latency impact: Sequential attempts add processing time
  • Provider dependencies: Requires API keys for all fallback providers