The OpenRouter API provides built-in Usage Accounting that allows you to track AI model usage without making additional API calls. This feature provides detailed information about token counts, costs, and caching status directly in your API responses.
OpenRouter automatically returns detailed usage information with every response, including:
This information is included in the last SSE message for streaming responses, or in the complete response for non-streaming requests. No additional parameters are required.
The usage: { include: true } and stream_options: { include_usage: true } parameters are deprecated and have no effect. Full usage details are now always included automatically in every response.
Every response includes a usage object with detailed token information:
cached_tokens is the number of tokens that were read from the cache. cache_write_tokens is the number of tokens that were written to the cache (only returned for models with explicit caching and cache write pricing).
The usage response includes detailed cost information:
cost: The total amount charged to your accountcost_details.upstream_inference_cost: The actual cost charged by the upstream AI providerNote: The upstream_inference_cost field only applies to BYOK (Bring Your Own Key) requests.
You can also retrieve usage information asynchronously by using the generation ID returned from your API calls. This is particularly useful when you want to fetch usage statistics after the completion has finished or when you need to audit historical usage.
To use this method:
id field in the response/generation endpointFor more details on this approach, see the Get a Generation documentation.
This example shows how to handle usage information in streaming mode: