Question 1

What's the actual SLA on Batch API?

Accepted Answer

All four major providers (Anthropic, OpenAI, Google, DeepSeek) commit to 24-hour completion. Most actual returns are 1-6 hours; spikes during peak demand can push toward the 24h cap. If you need guaranteed faster turnaround, you must use real-time API at full price.

Question 2

Are all model variants supported in batch?

Accepted Answer

Most are, but check provider docs. Anthropic supports Sonnet, Haiku, Opus in batch. OpenAI supports GPT-4o, GPT-4o-mini, o1, o3-mini in batch. Google supports Gemini 1.5/2.x Pro and Flash in batch. DeepSeek supports V3 and R1 in batch. Some specialty endpoints (Anthropic’s computer-use, OpenAI’s real-time API, vision-only models) are not batchable.

Question 3

Does the 50% discount apply to cached input?

Accepted Answer

Provider-dependent. Anthropic prompt-caching pricing remains separate from batch — you can stack cache + batch in some cases for compounded savings. OpenAI’s Batch + cached input give similar layered discounts. Read the per-provider pricing pages carefully; the savings can be substantial when stacked.

Question 4

How do I switch a workload to batch?

Accepted Answer

Three steps: (1) tag your async workloads — anything that doesn't need a live response. (2) Modify the API endpoint URL — instead of POSTing to /v1/messages or /v1/chat/completions, you upload a JSONL file of requests to /v1/batches. (3) Poll for completion or set up a webhook. Most SDKs (Anthropic Python, OpenAI Python) have built-in batch helpers.

Question 5

Are there minimum batch sizes?

Accepted Answer

No strict minimums, but the per-batch overhead means very small batches (1-10 requests) don’t save much in operational time. Sweet spot is 100-10,000 requests per batch. Anthropic caps at 100,000 per batch; OpenAI/Google have similar high caps. Split larger workloads across multiple batches.

Question 6

What about rate limits?

Accepted Answer

Batch API has separate rate limits from real-time API at all four providers — typically much higher daily token caps because the workload is async. Anthropic publishes batch-specific rate limits in their console. Plan accordingly: batch is great for huge volumes that would exceed real-time RPM/TPM caps.

Provider	Real-time	Batch	SLA	Savings
Claude (Anthropic)	$18,750	$9,375	24h	$9,375
OpenAI (GPT-5)	$13,750	$6,875	24h	$6,875
Gemini 2.5 Pro	$6,875	$3,437.5	24h	$3,437.5
DeepSeek (off-peak)	$750	$375	8h	$375

Batch Api Savings Calculator

Nasıl Kullanılır

Ne Zaman Kullanılır

Ne Zaman Kullanılmaz

Yaygın Kullanım Senaryoları

Sık Sorulan Sorular