An API endpoint is slow because it’s spending most of its time waiting for something else, and understanding what it’s waiting for is the key to making it fast.

Let’s watch a real request slow down and then speed it up. Imagine an API that fetches user data, including their recent orders.

// Request to GET /users/123
{
  "userId": 123,
  "userData": {
    "name": "Alice Smith",
    "email": "alice@example.com",
    "address": "123 Main St"
  },
  "orders": [
    {
      "orderId": "A1B2C3D4",
      "items": [
        {"productId": "P987", "quantity": 1},
        {"productId": "P654", "quantity": 2}
      ],
      "totalAmount": 150.75,
      "timestamp": "2023-10-27T10:00:00Z"
    },
    // ... more orders
  ]
}

This looks simple, but if GET /users/123 is taking 2 seconds, something’s amiss.

The Bottleneck Hunt

The most common reason for slow API endpoints isn’t your code’s computation, but external dependencies. Your API likely calls other services or databases.

  1. Database Queries: This is the usual suspect. A poorly optimized query, missing index, or fetching too much data can kill performance.

    • Diagnosis: Use your database’s slow query log. For PostgreSQL, SHOW log_min_duration_ms; and SELECT * FROM pg_stat_statements;. For MySQL, SHOW VARIABLES LIKE 'slow_query_log%'; and SELECT * FROM mysql.slow_log;. Look for queries from your API’s IP and user taking longer than your acceptable threshold (e.g., 500ms).
    • Fix: Add an index. If your query is SELECT * FROM orders WHERE user_id = 123 ORDER BY timestamp DESC;, and you don’t have an index on (user_id, timestamp), create one: CREATE INDEX idx_orders_user_ts ON orders (user_id, timestamp DESC);. This allows the database to quickly find and sort the relevant orders without scanning the whole table.
    • Why it works: Indexes are like a book’s index, letting the database jump directly to the data it needs, bypassing full table scans.
  2. External Service Calls: Your API might be calling another microservice, a third-party API, or a caching layer. If that service is slow, your API will be slow.

    • Diagnosis: Implement distributed tracing. Tools like Jaeger or Zipkin can show you the request flow across multiple services. Look for spans where the time is spent waiting for a downstream service. If your API is written in Python with requests, you might see a long requests.get('http://order-service/orders?userId=123') call.
    • Fix: Optimize the downstream service. If the order-service is slow, you need to investigate its bottlenecks (database, its own dependencies, etc.). Alternatively, if your API can tolerate slightly stale data, implement a cache. For a Python Flask API, using Redis:
      import redis
      r = redis.Redis(host='redis-cache', port=6379, db=0)
      
      cache_key = f"user_orders:{user_id}"
      cached_orders = r.get(cache_key)
      if cached_orders:
          return json.loads(cached_orders)
      
      # ... fetch orders from order-service ...
      orders = fetch_orders_from_service(user_id)
      
      r.set(cache_key, json.dumps(orders), ex=300) # Cache for 5 minutes
      return orders
      
    • Why it works: Caching stores the result of a slow operation (like fetching orders) in a fast, in-memory store (Redis). Subsequent requests for the same data hit the cache, avoiding the slow external call.
  3. Network Latency: If your API and its dependencies are in different network zones, or if there are network congestion issues, round trips can add up.

    • Diagnosis: Use ping or traceroute from your API server to the dependency’s server. Look for high rtt (round trip time) or packet loss. In your tracing tool, look for the time spent in the network layer of the HTTP client.
    • Fix: Co-locate your services. If your API and the order service are frequently talking, ensure they are in the same availability zone or even the same VPC. For critical, chatty inter-service communication, consider using a more efficient protocol like gRPC over HTTP/2 instead of REST over HTTP/1.1.
    • Why it works: Reducing the physical distance and overhead of network communication directly lowers latency.
  4. Serialization/Deserialization: Converting data structures to/from formats like JSON can be surprisingly CPU-intensive for large payloads.

    • Diagnosis: Profile your application code. In Python, use cProfile: python -m cProfile -o profile.prof your_api_script.py. Analyze the output (pstats) to see where CPU time is spent. Look for functions like json.dumps or json.loads consuming significant time.
    • Fix: Use a faster serialization library. For Python, orjson is often significantly faster than the standard json library:
      import orjson
      
      # Instead of json.dumps(data)
      return orjson.dumps(data)
      
    • Why it works: Libraries like orjson are written in Rust or C and use optimized algorithms for converting data to/from binary or text formats.
  5. Excessive Data Fetching: The endpoint might be fetching more data than it actually needs to return to the client.

    • Diagnosis: Examine the API response and compare it to the client’s actual usage. If the client only displays the order ID and total amount, but your API returns all order details, including item lists and timestamps, that’s a red flag. Look at your database queries and API calls to downstream services – are you selecting * when you only need a few columns?
    • Fix: Select only the necessary fields. In SQL, SELECT order_id, total_amount FROM orders WHERE user_id = 123; is better than SELECT * FROM orders WHERE user_id = 123;. If calling another service, check if it supports field selection or projections.
    • Why it works: Fetching less data means less work for the database, less data to transfer over the network, and less data to serialize, all of which contribute to faster response times.
  6. Too Many Small Operations: Sometimes, the issue isn’t one slow operation, but many small, fast operations that add up. For example, fetching user data, then fetching each order individually in a loop.

    • Diagnosis: This is often revealed by tracing. You’ll see many small, distinct calls to downstream services or database queries. In the user/orders example, instead of GET /users/123 and then GET /orders?userId=123, you might see GET /users/123 followed by GET /order/A1B2C3D4, GET /order/E5F6G7H8, etc.
    • Fix: Batch your operations. Modify the order-service to accept a list of order IDs or to return all orders for a user in a single call. Similarly, if your database allows it, fetch related data in a single query using joins or subqueries.
    • Why it works: Each network round trip or database query has overhead. Batching reduces this overhead by consolidating many small requests into fewer, larger ones.

After addressing these, the next error you’ll likely encounter is a 503 Service Unavailable if one of your dependencies is truly down and you haven’t implemented proper error handling or circuit breakers.

Want structured learning?

Take the full Performance course →