Burnwise is an AI cost copilot that analyzes where your LLM budget actually goes, links costs to product features, and gives you concrete decisions to cut AI spending by 40% without sacrificing quality.

Which AI providers does Burnwise support?

Burnwise supports all major LLM providers including OpenAI (GPT-5.2, GPT-4), Anthropic (Claude 4.5), Google (Gemini 3.0), Mistral, xAI (Grok), DeepSeek, and Perplexity.

How long does it take to integrate Burnwise?

Burnwise can be integrated in under 5 minutes with just a few lines of code. Simply install the SDK, initialize with your API key, and wrap your existing AI client.

Does Burnwise track my prompts or completions?

No. Burnwise only tracks metadata like token counts, model names, costs, and latency. We never track prompt content, completion content, or any user data within prompts.

How much can I save with Burnwise?

Teams using Burnwise typically reduce their LLM costs by 20-40% through model arbitrage, feature-level optimization, and eliminating waste - all while maintaining output quality.

LLM Batch Processing: Save 50% on OpenAI, Claude & Gemini APIs

Batch processing offers a flat 50% discount on LLM API costs from all major providers. If your workload doesn't need real-time responses, you're leaving money on the table. This guide covers everything from basic concepts to production implementation across OpenAI, Anthropic, and Google.

Batch Processing Quick Facts (January 2026)

Cost Savings: 50% discount on input AND output tokens
Completion Time: Results within 24 hours (often much faster)
Providers: OpenAI, Anthropic Claude, Google Gemini all support it
Rate Limits: Significantly higher (250M+ tokens enqueued)

What Is LLM Batch Processing?

Batch processing is an asynchronous API pattern where you submit multiple requests together and receive results within 24 hours instead of immediately. In exchange for giving up real-time responses, providers offer a 50% discount on all tokens.

The trade-off is simple:

Real-time API: Instant responses, full price
Batch API: 24-hour window, 50% off

For many workloads—data processing, content generation, evaluations—this trade-off is a no-brainer.

How Does the Batch API Work?

The workflow is straightforward but different from standard API calls:

Create a JSONL file — Each line is a valid JSON request identical to real-time API format
Upload the file — Send the file to the provider's servers
Submit a batch job — Reference the uploaded file
Poll for completion — Check status until results are ready
Download results — Retrieve and map responses to original requests

Important: Result order may differ from input order. Always include a custom_id in each request to match responses to their original queries.

Batch Pricing Comparison (January 2026)

OpenAI Batch Pricing

Model	Standard Input/1M	Batch Input/1M	Standard Output/1M	Batch Output/1M
GPT-5.2	$1.75	$0.875	$14.00	$7.00
GPT-5-mini	$0.30	$0.15	$1.00	$0.50
GPT-4.1	$2.00	$1.00	$8.00	$4.00
o4-mini	$1.10	$0.55	$4.40	$2.20

Anthropic Claude Batch Pricing

Model	Standard Input/1M	Batch Input/1M	Standard Output/1M	Batch Output/1M
Claude Opus 4.5	$5.00	$2.50	$25.00	$12.50
Claude Sonnet 4.5	$3.00	$1.50	$15.00	$7.50
Claude Haiku 4.5	$1.00	$0.50	$5.00	$2.50

Google Gemini Batch Pricing

Model	Standard Input/1M	Batch Input/1M	Standard Output/1M	Batch Output/1M
Gemini 3.0 Pro	$2.00	$1.00	$12.00	$6.00
Gemini 3.0 Flash	$0.50	$0.25	$3.00	$1.50
Gemini 2.5 Pro	$1.25	$0.625	$10.00	$5.00

Ideal Use Cases for Batch Processing

Perfect for Batch Processing

Bulk content generation: Blog posts, product descriptions, marketing copy
Data extraction at scale: Processing thousands of documents
Training data generation: Creating datasets for fine-tuning
Prompt evaluations: Testing prompts against large datasets
Document classification: Categorizing large document collections
Nightly analytics jobs: Processing daily data pipelines
Embedding generation: Vectorizing large document corpora

NOT Suitable for Batch Processing

User-facing chat: Users expect immediate responses
Real-time assistants: Interactive applications need instant feedback
Streaming responses: Progressive rendering requires real-time API
Time-sensitive decisions: Trading, alerts, urgent notifications

Rule of thumb: If a user is waiting for the response, use real-time API. If it's a background job, use batch.

OpenAI Batch API Implementation

Step 1: Create the JSONL File

import json

# Create batch requests
requests = [
    {
        "custom_id": "request-1",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-5-mini",
            "messages": [
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": "Summarize this document..."}
            ],
            "max_tokens": 500
        }
    },
    {
        "custom_id": "request-2",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-5-mini",
            "messages": [
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": "Extract key entities..."}
            ],
            "max_tokens": 500
        }
    }
]

# Write to JSONL file
with open("batch_requests.jsonl", "w") as f:
    for req in requests:
        f.write(json.dumps(req) + "\n")

Step 2: Upload and Submit Batch

from openai import OpenAI

client = OpenAI()

# Upload the batch file
batch_file = client.files.create(
    file=open("batch_requests.jsonl", "rb"),
    purpose="batch"
)

# Create the batch job
batch_job = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

print(f"Batch ID: {batch_job.id}")
print(f"Status: {batch_job.status}")

Step 3: Poll for Completion

import time

def wait_for_batch(client, batch_id, poll_interval=60):
    """Poll batch status until completion."""
    while True:
        batch = client.batches.retrieve(batch_id)
        print(f"Status: {batch.status}")

        if batch.status == "completed":
            return batch
        elif batch.status in ["failed", "expired", "cancelled"]:
            raise Exception(f"Batch failed with status: {batch.status}")

        time.sleep(poll_interval)

# Wait for completion
completed_batch = wait_for_batch(client, batch_job.id)

Step 4: Download and Process Results

# Download results file
result_file = client.files.content(completed_batch.output_file_id)
results = result_file.text

# Parse results (JSONL format)
for line in results.strip().split("\n"):
    result = json.loads(line)
    custom_id = result["custom_id"]
    response = result["response"]["body"]["choices"][0]["message"]["content"]
    print(f"{custom_id}: {response[:100]}...")

Anthropic Claude Batch API

Anthropic offers batch processing with the same 50% discount. The API is similar but uses Anthropic's message format.

import anthropic

client = anthropic.Anthropic()

# Create batch request
batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": "doc-1",
            "params": {
                "model": "claude-sonnet-4-5-20250929",
                "max_tokens": 1024,
                "messages": [
                    {"role": "user", "content": "Summarize this document..."}
                ]
            }
        },
        {
            "custom_id": "doc-2",
            "params": {
                "model": "claude-sonnet-4-5-20250929",
                "max_tokens": 1024,
                "messages": [
                    {"role": "user", "content": "Extract key insights..."}
                ]
            }
        }
    ]
)

print(f"Batch ID: {batch.id}")

Combining Batch with Prompt Caching

Anthropic uniquely allows stacking discounts. You can combine batch processing (50% off) with prompt caching (90% off cached tokens):

# Batch + Prompt Caching combined
batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": "cached-1",
            "params": {
                "model": "claude-sonnet-4-5-20250929",
                "max_tokens": 1024,
                "system": [
                    {
                        "type": "text",
                        "text": "Long system prompt with context...",
                        "cache_control": {"type": "ephemeral"}
                    }
                ],
                "messages": [
                    {"role": "user", "content": "Question 1"}
                ]
            }
        }
        # More requests with same cached system prompt...
    ]
)

Best Practices for Batch Processing

1. Always Use Custom IDs

Results may return in different order than submitted. Always include a unique custom_id to map responses back to requests.

2. Implement Retry Logic

Some requests in a batch may fail. Check the error file and retry failed requests:

def handle_batch_errors(client, batch):
    """Process errors from completed batch."""
    if batch.error_file_id:
        errors = client.files.content(batch.error_file_id).text
        failed_ids = []
        for line in errors.strip().split("\n"):
            error = json.loads(line)
            failed_ids.append(error["custom_id"])
            print(f"Failed: {error['custom_id']} - {error['error']}")
        return failed_ids
    return []

3. Optimize Batch Size

Research shows diminishing returns beyond batch sizes of 64 for throughput. However, for cost optimization, larger batches are fine—just be aware of the 24-hour completion window.

4. Set Appropriate Timeouts

Default API client timeouts (5 seconds) are too short for batch operations. Increase to 60+ seconds:

client = OpenAI(timeout=60.0)

5. Monitor Batch Status

Poll status every 30-60 seconds. Don't poll too frequently—it's unnecessary and may hit rate limits.

6. Handle Partial Failures

A batch can complete with some requests failed. Always check both the output file AND the error file.

Real Cost Calculation Example

Let's calculate savings for processing 10,000 documents with GPT-5-mini:

Metric	Real-time API	Batch API
Documents	10,000	10,000
Avg input tokens/doc	2,000	2,000
Avg output tokens/doc	500	500
Total input tokens	20M	20M
Total output tokens	5M	5M
Input cost	$6.00	$3.00
Output cost	$5.00	$2.50
Total cost	$11.00	$5.50
Savings	-	$5.50 (50%)

For larger workloads, savings scale linearly. Processing 1M documents monthly saves $550 with GPT-5-mini alone.

Batch Processing vs Async Concurrency

Don't confuse batch processing with async/concurrent API calls:

Feature	Batch API	Async Concurrency
Response time	Up to 24 hours	Seconds
Cost	50% discount	Full price
Rate limits	Much higher (250M+)	Standard limits
Use case	Background jobs	Real-time throughput

Use async concurrency when you need fast responses at scale. Use batch when you can wait 24 hours for 50% savings.

Common Mistakes to Avoid

1. Using Batch for User-Facing Features

Users won't wait 24 hours. Batch is for background processing only.

2. Not Handling Partial Failures

Some requests in a batch may fail. Always check error files and implement retry logic.

3. Forgetting Custom IDs

Without custom IDs, you can't map responses to requests. Always include them.

4. Polling Too Frequently

Checking status every second wastes resources. Poll every 30-60 seconds.

5. Ignoring the 24-Hour Window

Plan your pipelines around the 24-hour completion window. Most batches complete much faster, but don't rely on it.

Combining with Other Optimizations

Batch processing stacks with other cost optimization techniques:

Batch + Prompt Caching

Anthropic allows combining batch (50% off) with prompt caching (90% off cached tokens). For repeated context across batch requests, this can yield 95%+ savings on cached portions.

Batch + Model Routing

Use model routing within your batch to send simple tasks to cheaper models. Combined with batch discount, you can achieve 75-90% total savings.

Batch + Smaller Models

For classification and extraction tasks, GPT-5-mini or Claude Haiku 4.5 often suffice. Batch + cheap model = maximum savings.

Decision Framework: When to Use Batch

Use Batch Processing When:

Processing data overnight or during off-hours
Generating training data or embeddings
Running evaluations or benchmarks
Bulk content generation for queued publishing
Data transformation pipelines

Use Real-Time API When:

Users are waiting for responses
You need streaming for progressive display
Response latency is critical
Interactive applications

Track Your Batch Processing Costs with Burnwise

Monitor batch vs real-time usage, track savings, and get recommendations for which workloads to move to batch. One-line SDK integration.

Start Free Trial

Next Steps

Audit your workloads: Identify which don't need real-time responses
Start with one pipeline: Move a single batch job first
Measure savings: Track actual cost reduction
Expand gradually: Move more workloads as you gain confidence
Combine optimizations: Add prompt caching and model routing

For more cost optimization strategies, see our Prompt Caching Guide (50-90% savings) and Model Routing Guide (85% cost reduction).

Check our AI Pricing page for current model costs or use the LLM Cost Calculator to estimate your batch savings.

LLM Batch Processing: Save 50% on OpenAI, Claude & Gemini APIs

Batch Processing Quick Facts (January 2026)

What Is LLM Batch Processing?

How Does the Batch API Work?

Batch Pricing Comparison (January 2026)

OpenAI Batch Pricing

Anthropic Claude Batch Pricing

Google Gemini Batch Pricing

Ideal Use Cases for Batch Processing

Perfect for Batch Processing

NOT Suitable for Batch Processing

OpenAI Batch API Implementation

Step 1: Create the JSONL File

Step 2: Upload and Submit Batch

Step 3: Poll for Completion

Step 4: Download and Process Results

Anthropic Claude Batch API

Combining Batch with Prompt Caching

Best Practices for Batch Processing

1. Always Use Custom IDs

2. Implement Retry Logic

3. Optimize Batch Size

4. Set Appropriate Timeouts

5. Monitor Batch Status

6. Handle Partial Failures

Real Cost Calculation Example

Batch Processing vs Async Concurrency

Common Mistakes to Avoid

1. Using Batch for User-Facing Features

2. Not Handling Partial Failures

3. Forgetting Custom IDs

4. Polling Too Frequently

5. Ignoring the 24-Hour Window

Combining with Other Optimizations

Batch + Prompt Caching

Batch + Model Routing

Batch + Smaller Models

Decision Framework: When to Use Batch

Use Batch Processing When:

Use Real-Time API When:

Track Your Batch Processing Costs with Burnwise

Next Steps

Related Articles

How to Reduce OpenAI API Costs by 40% in 2026

The Complete Guide to LLM Cost Optimization (2026)

Prompt Caching: Save 50-90% on LLM API Costs [2026 Guide]

Put These Insights Into Practice