Latency
AI & LLM Glossary
What is Latency?
Latency: Latency is the time between sending a request and receiving the complete response. It includes time-to-first-token and generation time.
Latency Explained
LLM latency has two components: time-to-first-token (TTFT) and token generation rate. TTFT depends on input processing, while generation rate is relatively constant per model. Larger models are typically slower. For interactive applications, consider streaming responses to improve perceived latency. Latency also affects cost through timeout handling and user experience.
Examples
- *
GPT-5-mini: ~200ms TTFT, 100+ tokens/sec - *
Claude Opus 4.5: ~500ms TTFT, 30 tokens/sec - *
Gemini 3.0 Flash: ~150ms TTFT, 150+ tokens/sec