Virtual Models

Multi-Provider Routing with Automatic Failover

Virtual Models combine multiple AI providers behind a single model identifier, giving you automatic failover, intelligent load balancing, and provider redundancy without changing your code.

What are Virtual Models?

A Virtual Model is an abstraction layer that routes requests to multiple underlying provider models based on configurable strategies. Instead of calling openai/gpt-4o directly, you call a virtual model like promptshield/gpt-oss-120b which intelligently routes to the best available provider.

Traditional Approach

model: "openai/gpt-4o"

Single provider, single point of failure

Virtual Model

model: "promptshield/gpt-oss-120b"

Routes to OpenAI, Novita, or DeepSeek automatically

Key Benefits

  • Automatic Failover: If one provider is down or rate-limited, requests automatically route to the next available provider.
  • Load Balancing: Distribute traffic across multiple providers to avoid hitting rate limits and maximize throughput.
  • Cost Optimization: Route to lower-cost providers while maintaining quality and performance standards.
  • Zero Code Changes: Switch between single provider and multi-provider routing by changing only the model identifier.

Routing Strategies

  • Health-Aware Round Robin: Distributes requests evenly across healthy providers, skipping any that fail health checks.
  • Weighted Random: Routes based on configured weights, allowing you to prefer certain providers while maintaining redundancy.
  • Priority Failover: Always tries the highest-priority provider first, falling back to lower-priority options only when needed.

Available Virtual Models

Zaguán currently offers the following Virtual Models, each optimized for different use cases and price points:

DeepSeek R1 0528

promptshield/deepseek-r1-0528
Band C - Advanced

Advanced reasoning model with 2x pricing multiplier. Routes between Novita and DeepSeek providers.

• 2 providers• Health-Aware Round Robin• 2x multiplier

DeepSeek v3

promptshield/deepseek-v3
Band C - Advanced

Latest DeepSeek model with enhanced capabilities. 2x pricing multiplier across 2 providers.

• 2 providers• Health-Aware Round Robin• 2x multiplier

GPT OSS 120B

promptshield/gpt-oss-120b
Band A - Basic

Cost-effective 120B parameter model. 0.5x pricing multiplier with 3 provider options.

• 3 providers• Health-Aware Round Robin• 0.5x multiplier

GPT OSS 20B

promptshield/gpt-oss-20b
Band A - Basic

Lightweight 20B parameter model for simple tasks. 0.5x pricing multiplier.

• 1 provider• Health-Aware Round Robin• 0.5x multiplier

GLM 4.5

promptshield/glm-4.5
Band C - Advanced

Advanced GLM model with strong reasoning capabilities. 2x pricing multiplier.

• 2 providers• Health-Aware Round Robin• 2x multiplier

GLM 4.6

promptshield/glm-4.6
Band C - Advanced

Latest GLM model with enhanced capabilities. 2x pricing multiplier.

• 2 providers• Health-Aware Round Robin• 2x multiplier

Kimi K2 Instruct 0905 (Virtual)

promptshield/kimi-k2-instruct-0905
Band B - Standard

Instruction-tuned model with balanced performance. 0.5x pricing multiplier.

• 4 providers• Health-Aware Round Robin• 0.5x multiplier

Kimi K2 Thinking

promptshield/kimi-k2-thinking
Band C - Advanced

Reasoning-focused model with extended thinking capabilities. 2x pricing multiplier.

• 1 provider• Health-Aware Round Robin• 2x multiplier

Llama 4 Maverick 17B-128E Instruct

promptshield/llama-4-maverick-17b-128e-instruct
Band B - Standard

Experimental Llama 4 variant with enhanced capabilities. 1x pricing multiplier.

• 3 providers• Health-Aware Round Robin• 1x multiplier

Llama 4 Scout 17B-16E Instruct

promptshield/llama-4-scout-17b-16e-instruct
Band B - Basic

Compact Llama 4 Scout model optimized for efficiency. 0.5x pricing multiplier.

• 2 providers• Health-Aware Round Robin• 0.5x multiplier

MiniMax M2

promptshield/minimax-m2
Band C - Advanced

MiniMax M2 model with advanced capabilities. 2x pricing multiplier.

• 1 provider• Health-Aware Round Robin• 2x multiplier

Qwen3 Coder 30B A3B Instruct

promptshield/qwen3-coder-30b-a3b-instruct
Band C - Advanced

Specialized coding model with 30B parameters. 2x pricing multiplier.

• 2 providers• Health-Aware Round Robin• 2x multiplier

Qwen3 Coder 480B A35B Instruct

promptshield/qwen3-coder-480b-a35b-instruct
Band C - Advanced

Large-scale coding model with 480B parameters. 2x pricing multiplier.

• 2 providers• Health-Aware Round Robin• 2x multiplier

Qwen3 Max

promptshield/qwen3-max
Band C - Advanced

Flagship Qwen3 model with maximum capabilities. 2x pricing multiplier.

• 2 providers• Health-Aware Round Robin• 2x multiplier

Qwen3 Next 80B A3B Instruct

promptshield/qwen3-next-80b-a3b-instruct
Band C - Advanced

Next-generation Qwen model with 80B parameters. 2x pricing multiplier.

• 3 providers• Health-Aware Round Robin• 2x multiplier

Qwen3 Next 80B-A3B Thinking

promptshield/qwen3-next-80b-a3b-thinking
Band C - Advanced

Reasoning-enhanced Qwen model with thinking capabilities. 2x pricing multiplier.

• 2 providers• Health-Aware Round Robin• 2x multiplier

How to Use Virtual Models

1. Choose Your Virtual Model

Select a virtual model based on your use case, budget, and performance requirements. Consider the pricing band and number of providers for redundancy.

2. Use Like Any Other Model

Virtual models work exactly like regular models - just use the identifier in your API calls:

const response = await openai.chat.completions.create({
  model: "promptshield/deepseek-r1-0528",
  messages: [{ role: "user", content: "Hello!" }]
});

3. Automatic Routing

Zaguán automatically routes your request to the best available provider based on health checks, load balancing, and the configured routing strategy. If one provider fails, the request is automatically retried with the next provider - all transparent to your application.

Pricing Bands

Band ABasic

0.5x

Cost-effective models for simple tasks

Band BStandard

0.5x-1.5x

Balanced performance and cost

Band CAdvanced

2x

Premium models with advanced capabilities