For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Logo
Resources
Log inGet a demo
GuidesAPI reference
GuidesAPI reference
    • Get started
    • Install skills
  • Features
    • Projects
    • Cost governance and savings
      • Overview
      • Using policies
      • Single provider
      • Priority
      • Performance
      • Intelligent
    • Tool calling
    • Web search
    • Context compression
  • Security & Compliance
    • Customer blocklist
    • Geo-location routing
    • Prompt injection protection
    • Data loss prevention
    • Audit trail
    • Roles and permissions
    • Zero data retention
    • Provider terms

Get started

  • Overview
  • Introduction
  • Unified API
  • Linked Account
  • Merge Link
  • Use cases

Implementation

  • Sandboxes
  • SDKs
  • API access
  • Syncing data
  • Writing data
  • Data minimization
  • Supplemental data
  • Errors
  • Integration metadata

API reference

  • ATS
  • HRIS
  • Accounting
  • Ticketing
  • CRM
  • File Storage
  • Knowledge Base
  • Chat

Resources

  • Help Center
  • Merge.dev
  • Changelog
© Merge 2026Terms of usePrivacy policy
UnifiedAgent HandlerGateway
UnifiedAgent HandlerGateway
Resources
Log inGet a demo
On this page
  • Capability tiers
  • Cost optimized
  • Balanced
  • Quality first
FeaturesRouting policies

Intelligent routing

ML-powered routing that matches prompt complexity to the right model tier

Was this page helpful?
Previous

Performance strategies

Next

Multimodal

Intelligent routing analyzes each request and automatically routes it to the most appropriate model based on prompt complexity. Gateway embeds the prompt, scores its complexity from 0 (simple) to 1 (complex), and maps the score to a model tier based on your chosen strategy. This adds ~1-4ms of latency, negligible compared to LLM inference time.

The JSON examples on this page show routing policy definitions, the configuration you set when creating a policy in the dashboard. They are not fields you pass in individual POST /responses requests.

Capability tiers

Models are automatically classified into five tiers based on output token cost:

TierOutput Cost (per 1M tokens)Example Models
Frontier>= $5.00Claude Opus 4, GPT-4.5
Advanced2.00−2.00 - 2.00−5.00Claude Sonnet 4, GPT-4 Turbo
Standard1.50−1.50 - 1.50−2.00Claude 3.5 Sonnet, GPT-4o
Efficient0.10−0.10 - 0.10−1.50Claude Haiku 3.5, GPT-4o-mini
Basic< $0.10Older / small models

You don’t need to manually classify models. Gateway infers tiers from provider pricing data. Any model works, including new releases.

Cost optimized

Maximizes cost savings while maintaining quality for complex tasks. The complexity threshold is set low, so ~70% of traffic routes to cheaper models.

Best for: Customer support chatbots, general-purpose assistants, mixed-complexity workloads. Expected savings: 40-60%.

1{
2 "name": "Cost Optimized Chat",
3 "default_strategy": {
4 "type": "intelligent",
5 "axis": "cost",
6 "providers": [
7 { "provider": "openai", "model": "gpt-5-mini" },
8 { "provider": "anthropic", "model": "claude-sonnet-4-20250514" },
9 { "provider": "openai", "model": "gpt-5.2" }
10 ]
11 }
12}

Balanced

Equal consideration of cost and quality. Complexity scores map linearly to model tiers, with roughly a 50/50 split between cheaper and more capable models.

Best for: General production workloads where quality and cost are equally important. Expected savings: 20-35%.

1{
2 "name": "Balanced Production",
3 "default_strategy": {
4 "type": "intelligent",
5 "axis": "performance",
6 "providers": [
7 { "provider": "openai", "model": "gpt-5-mini" },
8 { "provider": "anthropic", "model": "claude-sonnet-4-20250514" },
9 { "provider": "openai", "model": "gpt-5.2" }
10 ]
11 }
12}

Quality first

Prioritizes response quality with cost savings as secondary. Most traffic goes to capable models. Only clearly simple prompts route to cheaper tiers.

Best for: Enterprise applications, professional/technical use cases, domains where output quality is critical. Expected savings: 10-20%.

1{
2 "name": "Enterprise Quality",
3 "default_strategy": {
4 "type": "intelligent",
5 "axis": "intelligence",
6 "providers": [
7 { "provider": "openai", "model": "gpt-5-mini" },
8 { "provider": "anthropic", "model": "claude-sonnet-4-20250514" },
9 { "provider": "openai", "model": "gpt-5.2" }
10 ]
11 }
12}

If the complexity scorer fails for any reason, Gateway falls back to the most capable model in your policy. Quality is never compromised by a scorer failure.