Multi-Provider Routing with Automatic Failover
Virtual Models combine multiple AI providers behind a single model identifier, giving you automatic failover, intelligent load balancing, and provider redundancy without changing your code.
What are Virtual Models?
A Virtual Model is an abstraction layer that routes requests to multiple underlying provider models based on configurable strategies. Instead of calling openai/gpt-4o directly, you call a virtual model like promptshield/gpt-oss-120b which intelligently routes to the best available provider.
Traditional Approach
model: "openai/gpt-4o"Single provider, single point of failure
Virtual Model
model: "promptshield/gpt-oss-120b"Routes to OpenAI, Novita, or DeepSeek automatically
Key Benefits
- Automatic Failover: If one provider is down or rate-limited, requests automatically route to the next available provider.
- Load Balancing: Distribute traffic across multiple providers to avoid hitting rate limits and maximize throughput.
- Cost Optimization: Route to lower-cost providers while maintaining quality and performance standards.
- Zero Code Changes: Switch between single provider and multi-provider routing by changing only the model identifier.
Routing Strategies
- •Health-Aware Round Robin: Distributes requests evenly across healthy providers, skipping any that fail health checks.
- •Weighted Random: Routes based on configured weights, allowing you to prefer certain providers while maintaining redundancy.
- •Priority Failover: Always tries the highest-priority provider first, falling back to lower-priority options only when needed.
Available Virtual Models
Zaguán currently offers the following Virtual Models, each optimized for different use cases and price points:
DeepSeek R1 0528
promptshield/deepseek-r1-0528Advanced reasoning model with 2x pricing multiplier. Routes between Novita and DeepSeek providers.
DeepSeek v3
promptshield/deepseek-v3Latest DeepSeek model with enhanced capabilities. 2x pricing multiplier across 2 providers.
GPT OSS 120B
promptshield/gpt-oss-120bCost-effective 120B parameter model. 0.5x pricing multiplier with 3 provider options.
GPT OSS 20B
promptshield/gpt-oss-20bLightweight 20B parameter model for simple tasks. 0.5x pricing multiplier.
GLM 4.5
promptshield/glm-4.5Advanced GLM model with strong reasoning capabilities. 2x pricing multiplier.
GLM 4.6
promptshield/glm-4.6Latest GLM model with enhanced capabilities. 2x pricing multiplier.
Kimi K2 Instruct 0905 (Virtual)
promptshield/kimi-k2-instruct-0905Instruction-tuned model with balanced performance. 0.5x pricing multiplier.
Kimi K2 Thinking
promptshield/kimi-k2-thinkingReasoning-focused model with extended thinking capabilities. 2x pricing multiplier.
Llama 4 Maverick 17B-128E Instruct
promptshield/llama-4-maverick-17b-128e-instructExperimental Llama 4 variant with enhanced capabilities. 1x pricing multiplier.
Llama 4 Scout 17B-16E Instruct
promptshield/llama-4-scout-17b-16e-instructCompact Llama 4 Scout model optimized for efficiency. 0.5x pricing multiplier.
MiniMax M2
promptshield/minimax-m2MiniMax M2 model with advanced capabilities. 2x pricing multiplier.
Qwen3 Coder 30B A3B Instruct
promptshield/qwen3-coder-30b-a3b-instructSpecialized coding model with 30B parameters. 2x pricing multiplier.
Qwen3 Coder 480B A35B Instruct
promptshield/qwen3-coder-480b-a35b-instructLarge-scale coding model with 480B parameters. 2x pricing multiplier.
Qwen3 Max
promptshield/qwen3-maxFlagship Qwen3 model with maximum capabilities. 2x pricing multiplier.
Qwen3 Next 80B A3B Instruct
promptshield/qwen3-next-80b-a3b-instructNext-generation Qwen model with 80B parameters. 2x pricing multiplier.
Qwen3 Next 80B-A3B Thinking
promptshield/qwen3-next-80b-a3b-thinkingReasoning-enhanced Qwen model with thinking capabilities. 2x pricing multiplier.
How to Use Virtual Models
1. Choose Your Virtual Model
Select a virtual model based on your use case, budget, and performance requirements. Consider the pricing band and number of providers for redundancy.
2. Use Like Any Other Model
Virtual models work exactly like regular models - just use the identifier in your API calls:
const response = await openai.chat.completions.create({
model: "promptshield/deepseek-r1-0528",
messages: [{ role: "user", content: "Hello!" }]
});3. Automatic Routing
Zaguán automatically routes your request to the best available provider based on health checks, load balancing, and the configured routing strategy. If one provider fails, the request is automatically retried with the next provider - all transparent to your application.
Pricing Bands
0.5x
Cost-effective models for simple tasks
0.5x-1.5x
Balanced performance and cost
2x
Premium models with advanced capabilities