Merge API Documentation

Gateway accepts multimodal content natively. Include image or document content blocks in your messages and Gateway routes to a capable model. No configuration needed. Gateway automatically detects which models support each modality and translates content to the provider’s format.

Supported content types

Type	Content Blocks	Source Types	Example Models
Images	`image`, `image_url`	base64, URL	GPT-5.1, Claude Sonnet 4, Gemini 2.0 Flash
Documents	`document`	base64, URL	Claude Sonnet 4, Gemini 2.0 Flash

Quick example

1 from merge_gateway import MergeGateway
2 
3 client = MergeGateway(api_key="YOUR_API_KEY")
4 
5 response = client.responses.create(
6     model="openai/gpt-5.1",
7     input=[
8         {
9             "type": "message",
10             "role": "user",
11             "content": [
12                 {"type": "text", "text": "What's in this image?"},
13                 {"type": "image_url", "url": "https://example.com/photo.jpg"},
14             ],
15         }
16     ],
17 )
18 
19 print(response.output[0].content[0].text)

Model compatibility

Gateway auto-detects multimodal capabilities from vendor-specific model metadata. Use GET /v1/models and inspect vendors.<vendor>.capabilities.input to see whether the route you plan to use supports image or document inputs.

Provider	Images	Documents
OpenAI	GPT-5.1, GPT-4o	None
Anthropic	Claude Sonnet 4, Claude Haiku 3.5	Claude Sonnet 4, Claude Haiku 3.5
Google	Gemini 2.0 Flash, Gemini 2.5 Pro	Gemini 2.0 Flash, Gemini 2.5 Pro
Bedrock	Varies by model	Varies by model

Context compression automatically protects multimodal messages. When trimming is needed, text-only messages are removed first, so your images and documents are preserved.

Next steps

Images

Send images via URL or base64 to vision-enabled models

Documents

Send PDFs and documents to document-understanding models