Prompt Caching
AI & LLM Glossary
What is Prompt Caching?
Prompt Caching: Prompt caching stores processed prompts to reduce cost and latency for repeated requests with similar prefixes.
Prompt Caching Explained
Prompt caching (also called KV-cache) allows providers to reuse the processed representation of your prompt prefix. If you make multiple requests that share a common prefix (like a system prompt), only the new content needs to be processed. This can reduce costs by 50-90% for repeated requests. Anthropic, OpenAI, and Google all offer caching options.
Examples
- *
Cache a long system prompt for all requests - *
Cache document content for Q&A sessions - *
Cache few-shot examples