Virtual Models

Multi-Provider Routing with Automatic Failover

Virtual Models combine multiple AI providers behind a single model identifier, giving you automatic failover, intelligent load balancing, and provider redundancy without changing your code.

What are Virtual Models?

A Virtual Model is an abstraction layer that routes requests to multiple underlying provider models based on configurable strategies. Instead of calling openai/gpt-4o directly, you call a virtual model like zaguanai/gpt-oss-120b which intelligently routes to the best available provider.

Traditional Approach

model: "openai/gpt-4o"

Single provider, single point of failure

Virtual Model

model: "zaguanai/gpt-oss-120b"

Routes to OpenAI, Novita, or DeepSeek automatically

Key Benefits

  • Automatic Failover: If one provider is down or rate-limited, requests automatically route to the next available provider.
  • Load Balancing: Distribute traffic across multiple providers to avoid hitting rate limits and maximize throughput.
  • Cost Optimization: Route to lower-cost providers while maintaining quality and performance standards.
  • Zero Code Changes: Switch between single provider and multi-provider routing by changing only the model identifier.

Routing Strategies

  • Health-Aware Round Robin: Distributes requests evenly across healthy providers, skipping any that fail health checks.
  • Weighted Random: Routes based on configured weights, allowing you to prefer certain providers while maintaining redundancy.
  • Priority Failover: Always tries the highest-priority provider first, falling back to lower-priority options only when needed.

Available Virtual Models

Zaguán currently offers 29 Virtual Models, each optimized for different use cases and price points:

Claude Haiku 4.5 Latest

zaguanai/claude-haiku-4.5-latest

Fast and efficient Claude model for quick responses and cost-effective operations.

• Health-Aware Round Robin

Claude Sonnet 4.5 Latest

zaguanai/claude-sonnet-4.5-latest

Balanced Claude model offering strong performance for general-purpose tasks.

• Health-Aware Round Robin

DeepSeek R1 0528

zaguanai/deepseek-r1-0528

Advanced DeepSeek reasoning model with enhanced capabilities.

• Health-Aware Round Robin

DeepSeek v3

zaguanai/deepseek-v3

Latest DeepSeek v3 model with improved performance and accuracy.

• Health-Aware Round Robin

DeepSeek v3.1 Terminus

zaguanai/deepseek-v3.1-terminus

Specialized DeepSeek v3.1 variant optimized for complex reasoning tasks.

• Health-Aware Round Robin

GLM 4.5

zaguanai/glm-4.5

Advanced GLM model with strong reasoning and language understanding capabilities.

• Health-Aware Round Robin

GLM 4.6

zaguanai/glm-4.6

Latest GLM model with enhanced capabilities.

• Health-Aware Round Robin

Google Gemini Flash Latest

zaguanai/gemini-flash-latest

Fast and efficient Gemini model optimized for quick responses.

• Health-Aware Round Robin

Google Gemini Flash Lite Latest

zaguanai/gemini-flash-lite-latest

Lightweight Gemini Flash variant for efficient processing.

• Health-Aware Round Robin

Google Gemini Pro Latest

zaguanai/gemini-pro-latest

Professional-grade Gemini model with advanced capabilities.

• Health-Aware Round Robin

GPT OSS 120B

zaguanai/gpt-oss-120b

Cost-effective 120B parameter model for general-purpose tasks.

• Health-Aware Round Robin

GPT OSS 20B

zaguanai/gpt-oss-20b

Lightweight 20B parameter model for simple tasks.

• Health-Aware Round Robin

Grok 4 Latest

zaguanai/grok-4-latest

Latest Grok model with advanced reasoning and real-time capabilities.

• Health-Aware Round Robin

Kimi K2 Instruct 0905

zaguanai/kimi-k2-instruct-0905

Instruction-tuned Kimi model with balanced performance.

• Health-Aware Round Robin

Kimi K2 Thinking

zaguanai/kimi-k2-thinking

Reasoning-focused Kimi model with extended thinking capabilities.

• Health-Aware Round Robin

Llama 4 Maverick 17B-128E Instruct

zaguanai/llama-4-maverick-17b-128e-instruct

Experimental Llama 4 variant with enhanced capabilities.

• Health-Aware Round Robin

Llama 4 Scout 17B-16E Instruct

zaguanai/llama-4-scout-17b-16e-instruct

Compact Llama 4 Scout model optimized for efficiency.

• Health-Aware Round Robin

MiniMax M2

zaguanai/minimax-m2

MiniMax M2 model with advanced capabilities.

• Health-Aware Round Robin

GPT-5 Chat Latest

zaguanai/gpt-5-chat-latest

Latest GPT-5 model with state-of-the-art performance.

• Health-Aware Round Robin

GPT-5 Mini Latest

zaguanai/gpt-5-mini-latest

Compact GPT-5 variant for efficient processing.

• Health-Aware Round Robin

GPT-5 Nano Latest

zaguanai/gpt-5-nano-latest

Ultra-lightweight GPT-5 model for cost-effective operations.

• Health-Aware Round Robin

Qwen3 235B A22B Instruct

zaguanai/qwen3-235b-a22b-instruct

Large-scale Qwen3 model with 235B parameters.

• Health-Aware Round Robin

Qwen3 235B A22B Thinking

zaguanai/qwen3-235b-a22b-thinking

Reasoning-enhanced Qwen3 model with thinking capabilities.

• Health-Aware Round Robin

Qwen3 30B A3B

zaguanai/qwen3-30b-a3b

Mid-size Qwen3 model with 30B parameters.

• Health-Aware Round Robin

Qwen3 Coder 30B A3B Instruct

zaguanai/qwen3-coder-30b-a3b-instruct

Specialized coding model with 30B parameters.

• Health-Aware Round Robin

Qwen3 Coder 480B A35B Instruct

zaguanai/qwen3-coder-480b-a35b-instruct

Large-scale coding model with 480B parameters.

• Health-Aware Round Robin

Qwen3 Max

zaguanai/qwen3-max

Flagship Qwen3 model with maximum capabilities.

• Health-Aware Round Robin

Qwen3 Next 80B A3B Instruct

zaguanai/qwen3-next-80b-a3b-instruct

Next-generation Qwen model with 80B parameters.

• Health-Aware Round Robin

Qwen3 Next 80B A3B Thinking

zaguanai/qwen3-next-80b-a3b-thinking

Reasoning-enhanced Qwen model with thinking capabilities.

• Health-Aware Round Robin

How to Use Virtual Models

1. Choose Your Virtual Model

Select a virtual model based on your use case, budget, and performance requirements.

2. Use Like Any Other Model

Virtual models work exactly like regular models - just use the identifier in your API calls:

const response = await openai.chat.completions.create({
  model: "zaguanai/deepseek-r1-0528",
  messages: [{ role: "user", content: "Hello!" }]
});

3. Automatic Routing

Zaguán automatically routes your request to the best available provider based on health checks, load balancing, and the configured routing strategy. If one provider fails, the request is automatically retried with the next provider - all transparent to your application.