Routing policies

Automatically route AI requests across providers to optimize cost, performance, and reliability

Routing policies control which AI provider and model handles each request. Instead of hardcoding a single provider, policies automatically select the best option based on your optimization goals, whether that’s minimizing cost, maximizing uptime, or balancing both with ML-powered routing.

For details on how a policy gets applied to a specific request and how to use default_routing, see Using routing policies.

Strategy comparison

StrategyDescriptionConfigurationDetail page
SingleAlways route to one providertype: "fallback", 1 providerSingle provider
PriorityTry providers in order with automatic failovertype: "fallback", multiple providersPriority
Least LatencyRoute to the fastest providerDashboard onlyPerformance
Lowest CostRoute to the cheapest providerDashboard onlyPerformance
Cost OptimizedML-based routing, ~70% traffic to cheaper modelstype: "intelligent", axis: "cost"Intelligent
BalancedML-based routing, even cost/quality splittype: "intelligent", axis: "performance"Intelligent
Quality FirstML-based routing, ~70% traffic to capable modelstype: "intelligent", axis: "intelligence"Intelligent
Build Your Own RouterScore-based routing across the benchmarks you pickDashboard onlyBuild Your Own Router

Examples

Request using a routing policy

The project’s policy picks the provider and model. No model field is needed.

1from merge_gateway import MergeGateway
2
3client = MergeGateway(api_key="YOUR_API_KEY")
4
5# The "production" project has a Cost Optimized routing policy.
6response = client.responses.create(
7 input=[
8 {"type": "message", "role": "user", "content": "Summarize this quarter's earnings call."},
9 ],
10 project_id="production",
11)
12
13print(response.output[0].content[0].text)

With no project_id, the same request falls back to the org default routing policy.

Policy definition

Policies are created in the dashboard. These JSON bodies describe the policy configuration itself. They are never sent in a POST /responses request.

Cost Optimized
1{
2 "name": "Production - Cost Optimized",
3 "default_strategy": {
4 "type": "intelligent",
5 "axis": "cost",
6 "providers": [
7 { "provider": "openai", "model": "gpt-5-mini" },
8 { "provider": "anthropic", "model": "claude-sonnet-4-20250514" },
9 { "provider": "openai", "model": "gpt-5.2" }
10 ]
11 }
12}
Priority (failover)
1{
2 "name": "HA Failover",
3 "default_strategy": {
4 "type": "fallback",
5 "providers": [
6 { "provider": "openai", "model": "gpt-5.2", "priority": 1 },
7 { "provider": "anthropic", "model": "claude-sonnet-4-20250514", "priority": 2 }
8 ]
9 }
10}

Build Your Own Router is configured entirely in the dashboard, not via JSON. See Build Your Own Router for the setup.

Choosing the right strategy

Your priorityRecommended strategyNotes
Simplicity / dev environmentSingleOne provider, no failover
High availability / failoverPriorityOrdered failover across providers
Fastest response timeLeast LatencyDashboard only
Lowest cost (same model, multiple providers)Lowest CostDashboard only
Lowest cost (mixed models, ML-driven)Cost Optimized~40-60% savings
General production optimizationBalanced~20-35% savings
Maximum output qualityQuality FirstRoutes most traffic to capable models
Routing on your own benchmarks or eval scoresBuild Your Own RouterDashboard-configured, mix curated and custom benchmarks

Tag-based routing

You can attach tags to requests (user tier, region, environment, and so on) and use them to route to different policies. Rules are evaluated in priority order: the first matching rule applies, and unmatched requests fall through to the default strategy.

Conditions support AND/OR logic and operators like eq, gt, in, contains, starts_with, and exists. Configure tag-based routing through the dashboard.

FAQs

They are policy definitions, the configuration used when creating a routing policy in the dashboard. They are not request-body fields for POST /responses.

The complexity scoring step adds ~1-4ms. Negligible compared to LLM inference time.

Clean separation below 0.4 (simple) and above 0.6 (complex). Edge cases around 0.5 route conservatively to more capable models.

Gateway falls back to the most capable model in your policy. Quality is never compromised by a scorer failure.

Yes. New models work immediately, with capabilities inferred from pricing data.

The router only selects from models in your policy, never outside of it

Use Build Your Own Router. It lets you mix curated benchmarks with custom benchmarks (run evals or upload scores) and pick a weighted blend