Rate limiting in AWS API Gateway is actually two distinct but related concepts: throttling and quotas.

Here’s how you can set them up:

Throttling

Throttling limits the number of requests your API can handle per second. It’s designed to protect your backend services from being overwhelmed by traffic spikes.

API Gateway Console:

  1. Navigate to your API in the API Gateway console.
  2. In the left-hand navigation pane, select Settings.
  3. Under the Throttling section, you’ll see two fields:
    • Rate: The maximum number of requests per second your API can accept.
    • Burst: The maximum number of requests that can be sent in a short period (e.g., a single second) before the rate limit is enforced. This allows for short bursts of traffic without immediately hitting the limit.

Example Setup:

Let’s say you want to allow 100 requests per second with a burst capacity of 200 requests:

  • Rate: 100
  • Burst: 200

How it Works:

API Gateway monitors incoming requests. If the rate of requests exceeds the configured Rate, API Gateway starts returning 429 Too Many Requests errors. The Burst setting allows for a temporary surge in requests above the Rate, as long as the average rate over time stays within the limit. This is useful for handling sudden, short-lived spikes in traffic.

Quotas

Quotas limit the number of requests a specific client can make over a longer period, such as a day or a month. This is often used for tiered pricing models or to prevent abuse by individual API consumers.

API Gateway Console:

  1. Navigate to your API in the API Gateway console.
  2. In the left-hand navigation pane, select Usage Plans.
  3. Click Create to create a new Usage Plan.
  4. Name: Give your Usage Plan a descriptive name (e.g., BasicTierPlan).
  5. Description: (Optional) Provide a brief description.
  6. Throttling: You can also configure throttling settings specifically for this Usage Plan if you want to override or complement the API-level throttling.
  7. Quotas:
    • Limit: The maximum number of requests allowed.
    • Period: The time frame over which the limit applies (e.g., DAY, WEEK, MONTH).
    • Period Amount: The number of periods (e.g., if Period is DAY and Period Amount is 7, the quota is for 7 days).

Example Setup:

For a "Basic Tier" that allows 10,000 requests per month:

  • Limit: 10000
  • Period: MONTH
  • Period Amount: 1

Associating API Stages and API Keys:

After creating a Usage Plan, you need to associate it with your API stages and API keys:

  1. In the Usage Plan details, go to the Associated API stages section and click Add API stage. Select your API and the stage (e.g., MyAPI and prod).
  2. Go to the Associated API keys section and click Add API key. You can add existing API keys or create new ones.

How it Works:

When a client makes a request using an API key associated with a Usage Plan, API Gateway tracks the request count against the quota. Once the Limit is reached within the specified Period, subsequent requests from that API key will result in a 403 Forbidden error.

The Counter-Intuitive Truth About Burst

Many people assume Burst is just a buffer that gets replenished. In reality, the Burst setting in API Gateway’s throttling is more about allowing a peak capacity for a very short duration, rather than a pool of available requests. When you set Rate to 100 and Burst to 200, it means API Gateway can handle up to 200 requests in the first second if no requests have been made previously. However, in subsequent seconds, it will strictly enforce the Rate of 100 requests per second. If you continuously send requests at a rate of 150 per second, even if your Burst is 200, you will start seeing 429s after the initial burst capacity is used, because the sustained rate exceeds the Rate limit. The burst capacity is primarily for handling sudden arrivals of traffic, not for sustained higher throughput.

The next thing you’ll likely encounter is understanding how to deploy these changes and monitor their effectiveness using CloudWatch metrics.

Want structured learning?

Take the full Rate-limiting course →