The OpenAI API is refusing your requests because you’re sending them too fast.

Here’s what’s actually breaking: your client application is sending requests to the OpenAI API endpoints at a rate exceeding the limits defined for your API key. The API’s gateway, designed to protect its resources and ensure fair usage, is actively rejecting these excess requests with a 429 Too Many Requests status code. This isn’t a bug in your code’s logic; it’s a direct consequence of overwhelming the service.

Common Causes and Fixes

  1. Hitting the Per-Minute Limit:

    • Diagnosis: Monitor your request rate. If you’re seeing 429 errors consistently, especially after a burst of activity, you’re likely exceeding the tokens-per-minute (TPM) or requests-per-minute (RPM) limits. OpenAI’s default limits vary by model and account tier but are often around 60 RPM for gpt-4 and 200 RPM for gpt-3.5-turbo.
    • Fix: Implement an exponential backoff with jitter strategy. For a 429 error, wait a random amount of time between 1 and 10 seconds before retrying. If the retry also fails, double the wait time and add more jitter.
    • Why it works: This prevents your client from immediately retrying and hitting the limit again, giving the API a chance to recover and process your request. Jitter prevents multiple clients from retrying simultaneously, creating a thundering herd problem.
    • Example (Python requests with tenacity):
      from tenacity import (
          retry,
          stop_after_attempt,
          wait_random_exponential,
          before_sleep_log
      )
      import logging
      import openai
      
      logging.basicConfig(level=logging.INFO)
      logger = logging.getLogger(__name__)
      
      @retry(
          wait=wait_random_exponential(min=1, max=60), # Exponential backoff with jitter
          stop=stop_after_attempt(6), # Stop after 6 attempts
          before_sleep=before_sleep_log(logger, logging.INFO) # Log before sleeping
      )
      def call_openai_with_retry(prompt):
          try:
              response = openai.Completion.create(
                  model="text-davinci-003", # Example model
                  prompt=prompt,
                  max_tokens=150
              )
              return response
          except openai.error.RateLimitError as e:
              logger.warning(f"Rate limit exceeded, retrying: {e}")
              raise # Re-raise to trigger tenacity
          except Exception as e:
              logger.error(f"An unexpected error occurred: {e}")
              raise # Re-raise other exceptions
      
      # Example usage:
      # result = call_openai_with_retry("Tell me a story.")
      
  2. Exceeding Per-Day or Per-Month Limits:

    • Diagnosis: If 429 errors occur after sustained usage over a longer period, you might be hitting aggregate limits. Check your OpenAI dashboard for usage tiers and limits.
    • Fix: For immediate needs, request a limit increase from OpenAI support. For ongoing solutions, implement client-side rate limiting to prevent exceeding these limits proactively. This involves tracking your usage over the relevant period (day/month) and pausing requests when approaching the threshold.
    • Why it works: Proactive client-side throttling ensures you never send requests that will be rejected by the API, keeping you within the broader usage quotas.
  3. Concurrent Requests Overloading:

    • Diagnosis: If your application makes many API calls simultaneously (e.g., across multiple threads or workers), the sum of these requests might exceed your allowed concurrency.
    • Fix: Limit the number of concurrent requests your application makes. For example, use a semaphore or a limited thread pool to ensure no more than, say, 5-10 requests are active at any given moment.
    • Why it works: This caps the peak request rate your application sends to the API, preventing sudden spikes that trigger rate limiting.
  4. Incorrect Retry Logic:

    • Diagnosis: You might be retrying too aggressively, with no delay or insufficient backoff, essentially retrying the request before the rate limit window has reset.
    • Fix: Ensure your retry mechanism includes a wait strategy. The wait_random_exponential in the tenacity library example above is crucial. A simple fixed delay (e.g., 5 seconds) might work for low traffic, but exponential backoff is more robust for higher error rates.
    • Why it works: A proper wait period allows the API to reset its counters for your key, making the subsequent retry more likely to succeed.
  5. Shared API Key Issues:

    • Diagnosis: If multiple applications or users share a single API key, their combined usage can easily exceed limits, even if each individual component is within its own reasonable bounds.
    • Fix: Assign unique API keys to different applications or services where possible. If not, implement strict internal rate limiting within your application that accounts for the total shared quota.
    • Why it works: Isolating usage by key allows for more granular control and easier debugging of who or what is contributing to rate limit hits.
  6. Model-Specific Limits:

    • Diagnosis: Different models have different rate limits. You might be fine with gpt-3.5-turbo but hitting limits with gpt-4. Check the OpenAI documentation for current limits per model.
    • Fix: If you’re frequently hitting limits with a specific model, consider switching to a less rate-limited model for less critical tasks, or implement more aggressive client-side throttling when using the more restricted models.
    • Why it works: By acknowledging and respecting model-specific constraints, you can manage your usage more effectively across your different API interactions.

The next error you’ll likely encounter after fixing rate limiting is a 503 Service Unavailable if you’ve managed to hit OpenAI’s backend infrastructure limits, or potentially a 400 Bad Request if your retries are still malformed.

Want structured learning?

Take the full Openai-api course →