Streaming
AI & LLM Glossary
What is Streaming?
Streaming: Streaming delivers model responses token-by-token as they're generated, improving perceived latency for end users.
Streaming Explained
Streaming responses allow your application to display output as it's generated rather than waiting for the complete response. This dramatically improves user experience for long responses. Most LLM APIs support streaming via Server-Sent Events (SSE). With streaming, users see the first token quickly even if the full response takes several seconds.