Routing policies

Automatically route AI requests across providers to optimize cost, performance, and reliability

Routing policies

Automatically route AI requests across providers to optimize cost, performance, and reliability

Routing policies control which AI provider and model handles each request. Instead of hardcoding a single provider, policies automatically select the best option based on your optimization goals, whether that’s minimizing cost, maximizing uptime, or balancing both with ML-powered routing.

For details on how a policy gets applied to a specific request and how to use default_routing, see Using routing policies.

Strategy comparison

Strategy	Description	Configuration	Detail page
Single	Always route to one provider	`type: "fallback"`, 1 provider	Single provider
Priority	Try providers in order with automatic failover	`type: "fallback"`, multiple providers	Priority
Least Latency	Route to the fastest provider	Dashboard only	Performance
Lowest Cost	Route to the cheapest provider	Dashboard only	Performance
Cost Optimized	ML-based routing, ~70% traffic to cheaper models	`type: "intelligent", axis: "cost"`	Intelligent
Balanced	ML-based routing, even cost/quality split	`type: "intelligent", axis: "performance"`	Intelligent
Quality First	ML-based routing, ~70% traffic to capable models	`type: "intelligent", axis: "intelligence"`	Intelligent

Examples

Request using a routing policy

The project’s policy picks the provider and model. No model field is needed.

1 from merge_gateway import MergeGateway
2 
3 client = MergeGateway(api_key="YOUR_API_KEY")
4 
5 # The "production" project has a Cost Optimized routing policy.
6 response = client.responses.create(
7     input=[
8         {"type": "message", "role": "user", "content": "Summarize this quarter's earnings call."},
9     ],
10     project_id="production",
11 )
12 
13 print(response.output[0].content[0].text)

With no project_id, the same request falls back to the org default routing policy.

Policy definition

Policies are created in the dashboard. These JSON bodies describe the policy configuration itself. They are never sent in a POST /responses request.

Cost Optimized

1 {
2   "name": "Production - Cost Optimized",
3   "default_strategy": {
4     "type": "intelligent",
5     "axis": "cost",
6     "providers": [
7       { "provider": "openai", "model": "gpt-5-mini" },
8       { "provider": "anthropic", "model": "claude-sonnet-4-20250514" },
9       { "provider": "openai", "model": "gpt-5.2" }
10     ]
11   }
12 }

Priority (failover)

1 {
2   "name": "HA Failover",
3   "default_strategy": {
4     "type": "fallback",
5     "providers": [
6       { "provider": "openai", "model": "gpt-5.2", "priority": 1 },
7       { "provider": "anthropic", "model": "claude-sonnet-4-20250514", "priority": 2 }
8     ]
9   }
10 }

Choosing the right strategy

Your priority	Recommended strategy	Notes
Simplicity / dev environment	Single	One provider, no failover
High availability / failover	Priority	Ordered failover across providers
Fastest response time	Least Latency	Dashboard only
Lowest cost (same model, multiple providers)	Lowest Cost	Dashboard only
Lowest cost (mixed models, ML-driven)	Cost Optimized	~40-60% savings
General production optimization	Balanced	~20-35% savings
Maximum output quality	Quality First	Routes most traffic to capable models

Tag-based routing

You can attach tags to requests (user tier, region, environment, and so on) and use them to route to different policies. Rules are evaluated in priority order: the first matching rule applies, and unmatched requests fall through to the default strategy.

Conditions support AND/OR logic and operators like eq, gt, in, contains, starts_with, and exists. Configure tag-based routing through the dashboard.

FAQs

What do the JSON examples on the strategy subpages represent?

They are policy definitions, the configuration used when creating a routing policy in the dashboard. They are not request-body fields for POST /responses.

Does intelligent routing add latency?

The complexity scoring step adds ~1-4ms. Negligible compared to LLM inference time.

How accurate is the complexity scoring?

Clean separation below 0.4 (simple) and above 0.6 (complex). Edge cases around 0.5 route conservatively to more capable models.

What happens if the complexity scorer fails?

Gateway falls back to the most capable model in your policy. Quality is never compromised by a scorer failure.

Can I use any model with intelligent routing?

Yes. New models work immediately, with capabilities inferred from pricing data.

What if I only want specific models, not auto-selection?

The router only selects from models in your policy, never outside of it