Routing policies

Automatically route AI requests across providers to optimize cost, performance, and reliability.

Routing policies control which AI provider and model handles each request. Instead of hardcoding a single provider, policies automatically select the best option based on your optimization goals — whether that’s minimizing cost, maximizing uptime, or balancing both with ML-powered routing.

Strategy comparison

StrategyDescriptionConfigurationDetail page
SingleAlways route to one providerDashboard or policy API — type: "fallback", 1 providerSingle provider
PriorityTry providers in order with automatic failoverDashboard or policy API — type: "fallback", multiple providersPriority
Least LatencyRoute to the fastest providerDashboard onlyPerformance
Lowest CostRoute to the cheapest providerDashboard onlyPerformance
Cost OptimizedML-based routing — ~70% traffic to cheaper modelsDashboard or policy API — type: "intelligent", axis: "cost"Intelligent
BalancedML-based routing — even cost/quality splitDashboard or policy API — type: "intelligent", axis: "performance"Intelligent
Quality FirstML-based routing — ~70% traffic to capable modelsDashboard or policy API — type: "intelligent", axis: "intelligence"Intelligent

How routing policies are applied

Routing policies are attached to projects or set as the org default. They are not passed per-request. When Gateway receives a request, it resolves the routing policy in this order:

  1. Project-scoped policy — if the request includes a project (via the project_id body field) and that project has a routing policy, Gateway uses it.
  2. Org default policy — if no project is specified, or the project has no routing policy, Gateway falls back to the org-level default policy.
  3. No policy — direct model calls — if neither exists, Gateway routes directly to the model specified in the request.

When a routing policy is active, the model field in your request is optional — the policy selects the provider and model automatically. Omit it for Priority and Intelligent routing. model is only required when no policy resolves for the request.

Switching strategies per request

You cannot pass a routing strategy in the request body. To route different requests through different strategies:

  1. Create one project per strategy — e.g., one with Cost Optimized, another with Quality First.
  2. Pass the appropriate project_id field in the request body per request. All projects share the same API key.

This is useful when different features or user tiers need different cost/quality trade-offs.

Examples

Request using a routing policy

The project’s policy picks the provider and model. No model field is needed.

1from merge_gateway import MergeGateway
2
3client = MergeGateway(api_key="YOUR_API_KEY")
4
5# The "production" project has a Cost Optimized routing policy.
6response = client.responses.create(
7 input=[
8 {"type": "message", "role": "user", "content": "Summarize this quarter's earnings call."},
9 ],
10 project_id="production",
11)
12
13print(response.output[0].content[0].text)

With no project_id, the same request falls back to the org default routing policy.

Policy definition

Policies are created in the dashboard (or the policy management API). These JSON bodies describe the policy itself — they are never sent in a POST /responses request.

Cost Optimized
1{
2 "name": "Production — Cost Optimized",
3 "default_strategy": {
4 "type": "intelligent",
5 "axis": "cost",
6 "providers": [
7 { "provider": "openai", "model": "gpt-5-mini" },
8 { "provider": "anthropic", "model": "claude-sonnet-4-20250514" },
9 { "provider": "openai", "model": "gpt-5.2" }
10 ]
11 }
12}
Priority (failover)
1{
2 "name": "HA Failover",
3 "default_strategy": {
4 "type": "fallback",
5 "providers": [
6 { "provider": "openai", "model": "gpt-5.2", "priority": 1 },
7 { "provider": "anthropic", "model": "claude-sonnet-4-20250514", "priority": 2 }
8 ]
9 }
10}

Choosing the right strategy

Your priorityRecommended strategyNotes
Simplicity / dev environmentSingleOne provider, no failover
High availability / failoverPriorityOrdered failover across providers
Fastest response timeLeast LatencyDashboard only
Lowest cost (same model, multiple providers)Lowest CostDashboard only
Lowest cost (mixed models, ML-driven)Cost Optimized~40–60% savings
General production optimizationBalanced~20–35% savings
Maximum output qualityQuality FirstRoutes most traffic to capable models

Tag-based routing

You can attach tags to requests — like user tier, region, or environment — and use them to route to different policies. Rules are evaluated in priority order: the first matching rule applies, and unmatched requests fall through to the default strategy.

Conditions support AND/OR logic and operators like eq, gt, in, contains, starts_with, and exists. Configure tag-based routing through the dashboard or the API.

FAQs

No. Routing strategies live on policies, which are attached to projects or set as the org default. There is no type, axis, or strategy field on the POST /responses request body. To switch strategies per request, use different projects with different policies and pass the matching project_id field in the request body.

Only when no routing policy applies. If a policy is active (via project or org default), omit model — the policy picks the provider and model for you. This works for Priority and Intelligent routing.

Create one project per strategy and switch via the project_id field in the request body. All projects share the same API key, so you don’t need multiple keys. Omit the field to hit the org default.

They are policy definitions — the configuration used when creating a routing policy (via the dashboard or policy API). They are not request-body fields for POST /responses.

The complexity scoring step adds ~1–4ms. Negligible compared to LLM inference time.

Clean separation below 0.4 (simple) and above 0.6 (complex). Edge cases around 0.5 route conservatively to more capable models.

Gateway falls back to the most capable model in your policy. Quality is never compromised by a scorer failure.

Yes. New models work immediately — capabilities are inferred from pricing data.

The router only selects from models in your policy, never outside of it.

Gateway returns an error after all failover attempts are exhausted. Provider health is tracked automatically so requests skip providers that are currently down.

No. One org-level default. You can have one additional policy per project for project-scoped routing.