The OpenRouter API allows streaming responses from any model. This is useful for building chat interfaces or other applications where the UI should update as the model generates the response.
To enable streaming, you can set the stream parameter to true in your request. The model will then stream the response to the client in chunks, rather than returning the entire response at once.
Here is an example of how to stream a response, and process it:
For SSE (Server-Sent Events) streams, OpenRouter occasionally sends comments to prevent connection timeouts. These comments look like:
Comment payload can be safely ignored per the SSE specs. However, you can leverage it to improve UX as needed, e.g. by showing a dynamic loading indicator.
The generation ID is returned in the X-Generation-Id response header for all endpoints (chat completions, completions, responses, and messages), which can be useful for debugging and correlating requests.
Some SSE client implementations might not parse the payload according to spec, which leads to an uncaught error when you JSON.stringify the non-JSON payloads. We recommend the following clients:
Streaming requests can be cancelled by aborting the connection. For supported providers, this immediately stops model processing and billing.
Supported
Not Currently Supported
To implement stream cancellation:
Cancellation only works for streaming requests with supported providers. For non-streaming requests or unsupported providers, the model will continue processing and you will be billed for the complete response.
OpenRouter handles errors differently depending on when they occur during the streaming process:
If an error occurs before any tokens have been streamed to the client, OpenRouter returns a standard JSON error response with the appropriate HTTP status code. This follows the standard error format:
Common HTTP status codes include:
If an error occurs after some tokens have already been streamed to the client, OpenRouter cannot change the HTTP status code (which is already 200 OK). Instead, the error is sent as a Server-Sent Event (SSE) with a unified structure:
Key characteristics of mid-stream errors:
choices array is included with finish_reason: "error" to properly terminate the streamHere’s how to properly handle both types of errors in your streaming implementation:
Different API endpoints may handle streaming errors slightly differently:
ErrorResponse directly if no chunks were processed, or includes error information in the response if some chunks were processedcontext_length_exceeded) into a successful response with finish_reason: "length" instead of treating them as errors