Prompt Caching

AI & LLM Glossary

What is Prompt Caching?

Prompt Caching: Prompt caching stores processed prompts to reduce cost and latency for repeated requests with similar prefixes.

Prompt Caching Explained

Prompt caching (also called KV-cache) allows providers to reuse the processed representation of your prompt prefix. If you make multiple requests that share a common prefix (like a system prompt), only the new content needs to be processed. This can reduce costs by 50-90% for repeated requests. Anthropic, OpenAI, and Google all offer caching options.

Examples

  • *Cache a long system prompt for all requests
  • *Cache document content for Q&A sessions
  • *Cache few-shot examples

Related Terms

Track Your LLM Costs

Burnwise monitors every metric automatically. Start optimizing today.

Start Free Trial