Latency

AI & LLM Glossary

What is Latency?

Latency: Latency is the time between sending a request and receiving the complete response. It includes time-to-first-token and generation time.

Latency Explained

LLM latency has two components: time-to-first-token (TTFT) and token generation rate. TTFT depends on input processing, while generation rate is relatively constant per model. Larger models are typically slower. For interactive applications, consider streaming responses to improve perceived latency. Latency also affects cost through timeout handling and user experience.

Examples

  • *GPT-5-mini: ~200ms TTFT, 100+ tokens/sec
  • *Claude Opus 4.5: ~500ms TTFT, 30 tokens/sec
  • *Gemini 3.0 Flash: ~150ms TTFT, 150+ tokens/sec

Related Terms

Track Your LLM Costs

Burnwise monitors every metric automatically. Start optimizing today.

Start Free Trial