The Impulse Labs API enforces a sliding window rate limit per user per 60-second period.Documentation Index
Fetch the complete documentation index at: https://docs.impulselabs.ai/llms.txt
Use this file to discover all available pages before exploring further.
Limits by plan
| Plan | Requests / 60 seconds |
|---|---|
| Pro | 120 |
| Team | 300 |
| Enterprise | 1 000 |
Limits apply to validated inference requests — each call to
POST /infer counts as one request against your limit.Rate limit headers
Every response fromPOST /infer includes these headers:
| Header | Type | Description |
|---|---|---|
X-RateLimit-Limit | integer | Maximum requests allowed in the window |
X-RateLimit-Remaining | integer | Requests remaining in the current window |
X-RateLimit-Reset | integer | Unix timestamp (seconds) when the window resets |
Retry-After | integer | Seconds to wait (only present on 429 responses) |
Reading headers
Handling 429 Too Many Requests
When you exceed the limit, the API returns:Recommended retry strategy
- Read the
Retry-Afterheader and wait that many seconds before retrying. - Do not retry immediately in a tight loop — this will keep triggering the limit.
- For batch workloads, spread requests over time or use exponential backoff.
Python — respect Retry-After