AI providers limit how fast you can use their APIs — measured in tokens per minute and requests per minute. Understanding these limits explains why AI chat occasionally returns an error, and it gives you the tools to avoid hitting the ceiling in Maverick.

1. Rate Limits Are Set by Your AI Provider, Not Maverick

Maverick forwards your AI prompt to the provider's API and returns the response. The rate limit is enforced at the provider's end. When you exceed it, the provider rejects the request before Maverick can do anything with it. Changing settings in Maverick will not raise your limit — that happens in your provider account, by upgrading to a higher usage tier or switching to a provider with more generous quotas.

2. Both Your Prompt and the AI's Response Count Toward Your Per-Minute Quota

Every token that flows in either direction reduces your tokens-per-minute (TPM) budget for that minute. Your message counts as input tokens; any conversation history Maverick sends as context adds more input tokens; and the AI's reply adds output tokens on top of that. A single detailed prompt with a long project context and a verbose response can consume several thousand tokens, which adds up quickly if multiple team members are active at the same time.

3. A 429 Error in the AI Chat Means Your Quota Window Is Full

When you exceed your rate limit, the provider returns HTTP 429 "Too Many Requests." Maverick displays this as an error message in the AI chat panel. The request is not partially completed — it is rejected entirely. The fix is straightforward: wait for the 60-second quota window to reset, then send the prompt again. You do not need to reconfigure anything. The limit resets automatically each minute.

4. Assigning Separate API Keys to Each Employee Gives Everyone Their Own Quota

A shared API key means the entire team competes for one rate limit bucket. When several people send prompts in the same minute, they exhaust the shared quota faster, and some requests will fail. Assigning each employee their own API key in Maverick gives each person an independent per-minute budget. One team member's heavy usage does not affect anyone else's quota. This is the most effective structural fix for teams that hit rate limit errors regularly.

5. Groq and Local Ollama Models Offer the Highest Available Limits

If you regularly exceed rate limits on your current provider's free tier, two alternatives offer significantly more headroom. Groq provides high tokens-per-minute limits on its free tier while hosting popular open models including Llama and Mistral at very low cost — it is a practical upgrade for teams that need more throughput without a large budget. Ollama runs models locally on your own hardware with no API limits at all: no TPM ceiling, no RPM ceiling, no 429 errors. The tradeoff is that inference speed depends entirely on the hardware running Ollama, rather than a cloud provider's optimized infrastructure.