Reasoning

Use extended thinking and inspect reasoning support per model route

Reasoning lets supported models spend additional tokens planning before they answer. Use it for multi-step coding, math, analysis, tool-use planning, and other tasks where a slower request is worth a better final answer.

Gateway treats reasoning as a vendor-route capability. The same canonical model can expose different reasoning behavior depending on the vendor that serves the request, so check the exact route before sending reasoning controls.

Reasoning can increase latency and output-token cost. Some providers also return thinking by default for specific routes. Use /v1/models to inspect behavior before enabling reasoning broadly.

How reasoning works

Gateway supports two reasoning patterns:

  1. Gateway-controlled thinking uses the top-level thinking request field. Gateway translates it for routes that support that request style.
  2. Provider-native reasoning controls use provider-specific fields such as reasoning_effort, when the selected route advertises that control.

In responses, Gateway normalizes separate provider reasoning into thinking content blocks.

1{
2 "type": "thinking",
3 "thinking": "The model's reasoning text appears here.",
4 "signature": null
5}

Always render or log these blocks separately from final text. Some providers treat reasoning text as intermediate output, and some applications should hide it from end users.

Discover support

Use GET /v1/models and inspect the vendor route you plan to use.

cURL
$curl "https://api-gateway.merge.dev/v1/models?vendor=bedrock" \
> -H "Authorization: Bearer YOUR_API_KEY"

A reasoning-capable route has supports_reasoning: true and a reasoning object.

1{
2 "model": "anthropic/claude-sonnet-4-6",
3 "vendors": {
4 "bedrock": {
5 "capabilities": {
6 "supports_reasoning": true,
7 "reasoning": {
8 "configurable": true,
9 "disable_supported": true,
10 "default_enabled": false,
11 "controls": ["thinking.budget_tokens"],
12 "output_style": "reasoning_content"
13 }
14 }
15 }
16 }
17}
FieldMeaning
supports_reasoningThe route can produce separate reasoning or thinking output
reasoning.configurableThe route accepts at least one reasoning control
reasoning.default_enabledThe route may produce reasoning without an explicit request
reasoning.disable_supportedThe route supports a disable control
reasoning.controlsSupported request controls for that route
reasoning.output_styleHow Gateway exposes reasoning in the response

Check the route you will actually execute. If you use a routing policy, inspect every candidate route that policy can select.

Enable Gateway-controlled thinking

Use the thinking field when the route supports Gateway-controlled thinking. The budget_tokens value is required when type is "enabled" because providers that support explicit thinking need a budget.

$curl https://api-gateway.merge.dev/v1/responses \
> -H "Authorization: Bearer YOUR_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "model": "anthropic/claude-sonnet-4-6",
> "vendor": "bedrock",
> "max_tokens": 4096,
> "thinking": {
> "type": "enabled",
> "budget_tokens": 1024
> },
> "input": [
> {
> "type": "message",
> "role": "user",
> "content": "Design a migration plan for splitting a monolith into services. Include risks and sequencing."
> }
> ]
> }'

Pick a reasoning budget that leaves enough room for the final answer. If you set max_tokens: 4096 and budget_tokens: 1024, the provider still needs remaining output capacity for visible text.

Use provider-native controls

Some routes advertise provider-native controls instead of the top-level thinking field. For example, GPT-OSS routes may expose reasoning_effort.

Only send provider-native controls when /v1/models lists them under reasoning.controls.

1{
2 "model": "openai/gpt-oss-120b",
3 "vendor": "bedrock",
4 "reasoning_effort": "low",
5 "input": [
6 {
7 "type": "message",
8 "role": "user",
9 "content": "Review this incident timeline and identify the most likely root cause."
10 }
11 ]
12}

Provider-native controls are not interchangeable. A control that works on one vendor route can be ignored or stripped on another route for the same canonical model.

Disable reasoning

If a route advertises disable_supported: true, you can request thinking.type: "disabled".

1{
2 "model": "deepseek/deepseek-v4-flash",
3 "vendor": "deepseek",
4 "thinking": {
5 "type": "disabled"
6 },
7 "input": [
8 {
9 "type": "message",
10 "role": "user",
11 "content": "Summarize this changelog in three bullets."
12 }
13 ]
14}

Use disable controls for latency-sensitive requests, short classification tasks, or prompts where reasoning tokens are not worth the extra cost. If the route does not advertise disable_supported, Gateway may strip unsupported disable fields before sending the request upstream.

Read thinking blocks

Reasoning output appears as thinking content blocks before or alongside text blocks.

1{
2 "output": [
3 {
4 "type": "message",
5 "role": "assistant",
6 "content": [
7 {
8 "type": "thinking",
9 "thinking": "I need to compare the constraints, then propose a sequence.",
10 "signature": null
11 },
12 {
13 "type": "text",
14 "text": "Start by isolating the billing workflow..."
15 }
16 ]
17 }
18 ]
19}

In code, branch on content.type.

1for item in response.output:
2 for block in item.content:
3 if block.type == "thinking":
4 save_internal_reasoning(block.thinking)
5 elif block.type == "text":
6 print(block.text)

Some providers may include reasoning-like text inside a normal text block. Treat the public /v1/models route metadata as the source of truth for whether Gateway expects separate reasoning blocks.

Reasoning with routing policies

Routing policies can choose among multiple vendors and models. If a request includes thinking, every route that might serve the request must support Gateway-controlled reasoning, or the request can fail during capability checks.

For deterministic behavior, pin model and vendor when you need a specific reasoning mode.

1{
2 "model": "anthropic/claude-sonnet-4-6",
3 "vendor": "bedrock",
4 "thinking": {
5 "type": "enabled",
6 "budget_tokens": 1024
7 },
8 "input": [
9 {
10 "type": "message",
11 "role": "user",
12 "content": "Find the safest rollback plan for this deployment."
13 }
14 ]
15}

Use a routing policy when you only need a reasoning-capable route, not a specific provider behavior. In that case, configure the policy with models whose selected vendor routes all support the same reasoning control.

Streaming

Reasoning works with stream: true on supported routes. Gateway accumulates provider reasoning deltas and returns them as thinking content in the streamed response.

1{
2 "model": "anthropic/claude-sonnet-4-6",
3 "vendor": "bedrock",
4 "stream": true,
5 "max_tokens": 4096,
6 "thinking": {
7 "type": "enabled",
8 "budget_tokens": 1024
9 },
10 "input": [
11 {
12 "type": "message",
13 "role": "user",
14 "content": "Compare two database migration strategies."
15 }
16 ]
17}

Streaming can delay visible text because the model may spend tokens on thinking before it emits final answer text.

Common errors

The selected route does not support the thinking field. Use GET /v1/models, inspect vendors.<vendor>.capabilities.reasoning.controls, and choose a route that supports thinking.budget_tokens or another compatible control.

Add a positive budget_tokens value when thinking.type is "enabled". Use a smaller budget for latency-sensitive requests and a larger budget for complex analysis.

The route may not have produced separate reasoning for that prompt, the route may expose reasoning as normal text, or the selected vendor may not be the route you expected. Include include_routing_metadata: true while debugging and verify routing.vendor_used.

Next steps