Multi-Provider Routing with Automatic Failover
Virtual Models combine multiple AI providers behind a single model identifier, giving you automatic failover, intelligent load balancing, and provider redundancy without changing your code.
What are Virtual Models?
A Virtual Model is an abstraction layer that routes requests to multiple underlying provider models based on configurable strategies. Instead of calling openai/gpt-4o directly, you call a virtual model like zaguanai/gpt-oss-120b which intelligently routes to the best available provider.
Traditional Approach
model: "openai/gpt-4o"Single provider, single point of failure
Virtual Model
model: "zaguanai/gpt-oss-120b"Routes to OpenAI, Novita, or DeepSeek automatically
Key Benefits
- Automatic Failover: If one provider is down or rate-limited, requests automatically route to the next available provider.
- Load Balancing: Distribute traffic across multiple providers to avoid hitting rate limits and maximize throughput.
- Cost Optimization: Route to lower-cost providers while maintaining quality and performance standards.
- Zero Code Changes: Switch between single provider and multi-provider routing by changing only the model identifier.
Routing Strategies
- •Health-Aware Round Robin: Distributes requests evenly across healthy providers, skipping any that fail health checks.
- •Weighted Random: Routes based on configured weights, allowing you to prefer certain providers while maintaining redundancy.
- •Priority Failover: Always tries the highest-priority provider first, falling back to lower-priority options only when needed.
Available Virtual Models
Zaguán currently offers 29 Virtual Models, each optimized for different use cases and price points:
Claude Haiku 4.5 Latest
zaguanai/claude-haiku-4.5-latestFast and efficient Claude model for quick responses and cost-effective operations.
Claude Sonnet 4.5 Latest
zaguanai/claude-sonnet-4.5-latestBalanced Claude model offering strong performance for general-purpose tasks.
DeepSeek R1 0528
zaguanai/deepseek-r1-0528Advanced DeepSeek reasoning model with enhanced capabilities.
DeepSeek v3
zaguanai/deepseek-v3Latest DeepSeek v3 model with improved performance and accuracy.
DeepSeek v3.1 Terminus
zaguanai/deepseek-v3.1-terminusSpecialized DeepSeek v3.1 variant optimized for complex reasoning tasks.
GLM 4.5
zaguanai/glm-4.5Advanced GLM model with strong reasoning and language understanding capabilities.
GLM 4.6
zaguanai/glm-4.6Latest GLM model with enhanced capabilities.
Google Gemini Flash Latest
zaguanai/gemini-flash-latestFast and efficient Gemini model optimized for quick responses.
Google Gemini Flash Lite Latest
zaguanai/gemini-flash-lite-latestLightweight Gemini Flash variant for efficient processing.
Google Gemini Pro Latest
zaguanai/gemini-pro-latestProfessional-grade Gemini model with advanced capabilities.
GPT OSS 120B
zaguanai/gpt-oss-120bCost-effective 120B parameter model for general-purpose tasks.
GPT OSS 20B
zaguanai/gpt-oss-20bLightweight 20B parameter model for simple tasks.
Grok 4 Latest
zaguanai/grok-4-latestLatest Grok model with advanced reasoning and real-time capabilities.
Kimi K2 Instruct 0905
zaguanai/kimi-k2-instruct-0905Instruction-tuned Kimi model with balanced performance.
Kimi K2 Thinking
zaguanai/kimi-k2-thinkingReasoning-focused Kimi model with extended thinking capabilities.
Llama 4 Maverick 17B-128E Instruct
zaguanai/llama-4-maverick-17b-128e-instructExperimental Llama 4 variant with enhanced capabilities.
Llama 4 Scout 17B-16E Instruct
zaguanai/llama-4-scout-17b-16e-instructCompact Llama 4 Scout model optimized for efficiency.
MiniMax M2
zaguanai/minimax-m2MiniMax M2 model with advanced capabilities.
GPT-5 Chat Latest
zaguanai/gpt-5-chat-latestLatest GPT-5 model with state-of-the-art performance.
GPT-5 Mini Latest
zaguanai/gpt-5-mini-latestCompact GPT-5 variant for efficient processing.
GPT-5 Nano Latest
zaguanai/gpt-5-nano-latestUltra-lightweight GPT-5 model for cost-effective operations.
Qwen3 235B A22B Instruct
zaguanai/qwen3-235b-a22b-instructLarge-scale Qwen3 model with 235B parameters.
Qwen3 235B A22B Thinking
zaguanai/qwen3-235b-a22b-thinkingReasoning-enhanced Qwen3 model with thinking capabilities.
Qwen3 30B A3B
zaguanai/qwen3-30b-a3bMid-size Qwen3 model with 30B parameters.
Qwen3 Coder 30B A3B Instruct
zaguanai/qwen3-coder-30b-a3b-instructSpecialized coding model with 30B parameters.
Qwen3 Coder 480B A35B Instruct
zaguanai/qwen3-coder-480b-a35b-instructLarge-scale coding model with 480B parameters.
Qwen3 Max
zaguanai/qwen3-maxFlagship Qwen3 model with maximum capabilities.
Qwen3 Next 80B A3B Instruct
zaguanai/qwen3-next-80b-a3b-instructNext-generation Qwen model with 80B parameters.
Qwen3 Next 80B A3B Thinking
zaguanai/qwen3-next-80b-a3b-thinkingReasoning-enhanced Qwen model with thinking capabilities.
How to Use Virtual Models
1. Choose Your Virtual Model
Select a virtual model based on your use case, budget, and performance requirements.
2. Use Like Any Other Model
Virtual models work exactly like regular models - just use the identifier in your API calls:
const response = await openai.chat.completions.create({
model: "zaguanai/deepseek-r1-0528",
messages: [{ role: "user", content: "Hello!" }]
});3. Automatic Routing
Zaguán automatically routes your request to the best available provider based on health checks, load balancing, and the configured routing strategy. If one provider fails, the request is automatically retried with the next provider - all transparent to your application.