← Back to portfolio

Multi-Provider LLM Orchestration: Architecture & Implementation

Relying on a single LLM provider is risky. Claude is great. OpenAI is robust. But neither is always available, always cheap, or always best for every task.

Production systems need multi-provider orchestration: request routing, automatic failover, cost optimization, and rate limiting across providers. Here's how to build it.

Why Multi-Provider Architecture?

Three compelling reasons:

"Single-provider LLM systems are single points of failure. Production requires redundancy."

The Architecture: Four Layers

┌─────────────────────────────────┐
│   User Request                  │
└────────────────┬────────────────┘
                 │
┌────────────────▼────────────────┐
│   Router                        │ ← Decides which provider
├────────────────────────────────┤
│   - Analyze request features    │
│   - Check provider health       │
│   - Apply rate limits           │
└────────────────┬────────────────┘
                 │
┌────────────────▼────────────────┐
│   Provider Abstraction          │ ← Unified interface
├────────────────────────────────┤
│   - Claude client               │
│   - OpenAI client               │
│   - Gemini client               │
└────────────────┬────────────────┘
                 │
┌────────────────▼────────────────┐
│   Provider APIs                 │
└────────────────────────────────┘

Layer 1: The Router (Routing Logic)

The router decides which provider handles the request. Decisions based on:

function selectProvider(request) {
  // 1. Check health
  if (!providers.claude.healthy) {
    return providers.openai
  }

  // 2. Route by task
  if (request.task === "code_generation") {
    return providers.openai
  }

  // 3. Route by cost
  if (request.inputTokens > 100000) {
    return providers.gemini  // cheaper
  }

  // 4. Default
  return providers.claude
}

Layer 2: Failover (Resilience)

When a provider fails, automatically retry with another:

async function callWithFailover(request) {
  const providers = [
    providers.claude,    // primary
    providers.openai,    // secondary
    providers.gemini,    // tertiary
  ]

  for (const provider of providers) {
    try {
      const result = await provider.call(request)
      recordSuccess(provider)
      return result
    } catch (error) {
      recordFailure(provider)
      if (!shouldRetry(error)) {
        throw error  // unrecoverable
      }
      continue  // try next provider
    }
  }

  throw new Error("All providers exhausted")
}

Layer 3: Rate Limiting & Quotas

Each provider has limits. Track them:

class RateLimiter {
  async acquire(provider, tokens) {
    const limit = this.limits[provider]

    // Check if we have quota
    if (limit.used + tokens > limit.max) {
      throw new Error(`Rate limit exceeded for ${provider}`)
    }

    limit.used += tokens

    // Reset daily
    if (Date.now() - limit.lastReset > 24 * 60 * 60 * 1000) {
      limit.used = 0
      limit.lastReset = Date.now()
    }
  }
}

Layer 4: Cost Optimization

Track spending and optimize:

async function callOptimized(request) {
  // 1. Check cache first
  const cached = await cache.get(request.hash)
  if (cached) {
    return cached
  }

  // 2. Route intelligently
  let provider = selectProvider(request)
  let result = await provider.call(request)

  // 3. Cache result
  await cache.set(request.hash, result, TTL)

  // 4. Log cost
  recordCost(provider, result.tokens)

  return result
}

Real Example: Incident Resolution at alt.bank

Our incident resolution agent needed multi-provider setup:

Result: 70% cost reduction while improving latency through smart routing.

Observability: What to Log

Log everything for optimization:

{
  "request_id": "uuid",
  "selected_provider": "claude",
  "failover_attempts": 0,
  "input_tokens": 2500,
  "output_tokens": 800,
  "cost_usd": 0.012,
  "latency_ms": 450,
  "cache_hit": false,
  "timestamp": "2026-05-09T10:30:00Z"
}

Analyze this data weekly. Find patterns like "Gemini always slower on code tasks" and adjust routing.

Key Takeaways

  1. Single provider = single point of failure. Use multi-provider for production.
  2. Route based on task type, input size, provider health, and cost.
  3. Implement automatic failover with exponential backoff.
  4. Cache responses to reduce redundant calls.
  5. Use prompt caching (Claude, GPT-4o) for expensive system prompts.
  6. Log everything; optimize based on data.
  7. Monitor cost weekly—LLM bills compound fast without controls.

Multi-provider orchestration is complex but necessary for production LLM systems. Start simple, add layers as you scale.

Building LLM infrastructure?

I specialize in production AI systems, multi-provider orchestration, and cost optimization.

Back to Portfolio