Redis is hitting pauses, and it’s because the operating system is doing a "fork" operation to create a copy of the Redis process’s memory, and that copy is taking too long. This pause, known as Copy-on-Write (COW) latency, directly impacts Redis’s ability to respond to commands.

Common Causes and Fixes for Redis Fork Latency

  1. Large Memory Footprint:

    • Diagnosis: Check your redis_memory_bytes metric in Redis. If this is close to your maxmemory setting or the total RAM available on your server, it’s a prime suspect. You can also use INFO memory in redis-cli and look at used_memory_human.
    • Fix:
      • Increase maxmemory (if applicable and RAM is available): Edit redis.conf and set maxmemory <new_limit>mb. For example, maxmemory 16gb. This tells Redis to evict keys when it reaches a certain threshold, reducing the amount of memory that needs to be copied during a fork.
      • Reduce Redis Memory Usage: Analyze your keyspace using redis-cli --bigkeys or redis-cli KEYS '*' | wc -l (use with caution on production systems) to identify large keys or a high number of keys. Implement an eviction policy (e.g., maxmemory-policy allkeys-lru) in redis.conf.
      • Increase Server RAM: If possible, upgrade your server’s RAM.
    • Why it works: A smaller dataset means less memory to copy. Eviction policies ensure Redis doesn’t grow indefinitely, keeping the dataset manageable.
  2. High System Load / Other Processes Consuming Memory:

    • Diagnosis: Use top or htop on your Redis server. Look for other processes that are consuming significant CPU and memory. If the system is generally under heavy load, memory allocation and deallocation can become slow, impacting the fork.
    • Fix:
      • Identify and Relocate or Optimize Other Heavy Processes: If another application is hogging resources, consider moving it to a different server, optimizing its memory usage, or scheduling its heavy operations during off-peak hours for Redis.
      • Tune vm.swappiness: In /etc/sysctl.conf, set vm.swappiness = 1 (or a very low value). This tells the kernel to avoid swapping out application memory unless absolutely necessary.
      • Restart sysctl: Run sudo sysctl -p.
    • Why it works: Reducing competition for RAM and CPU allows the operating system to perform the fork operation more efficiently. Lowering swappiness prioritizes keeping application memory in RAM, which is faster to access.
  3. Fragmented Memory:

    • Diagnosis: While Redis itself tries to manage memory efficiently, the underlying OS can also experience memory fragmentation. This is harder to diagnose directly from Redis but often correlates with high memory usage and system load. Look for high memory usage across the system that doesn’t directly map to a single large allocation.
    • Fix:
      • Restart Redis (with caution): A full restart of Redis will cause it to reallocate memory, which can sometimes de-fragment it. This will cause a brief downtime.
      • Reboot the Server: A more drastic measure, but a server reboot will clear all memory fragmentation.
      • Consider jemalloc: Ensure you are using jemalloc as your memory allocator. Redis often defaults to it if available. Check redis.conf for jemalloc-malloc-hook yes (though this is often automatic). jemalloc generally offers better fragmentation resistance.
    • Why it works: Memory fragmentation means that even if total free memory is sufficient, it’s broken into small, non-contiguous chunks, making it harder for the OS to allocate a large contiguous block needed for the fork. Restarting or rebooting forces a fresh allocation.
  4. Frequent BGREWRITEAOF or BGRESIZE Operations:

    • Diagnosis: Check Redis logs for frequent messages related to "Background rewriting AOF" or "Background resizing." These operations also involve forking. If they happen concurrently with other fork-triggering events (like SAVE or replication syncs), it exacerbates the problem.
    • Fix:
      • Tune AOF fsync Policy: In redis.conf, change appendfsync everysec (default) to appendfsync no if data durability is less critical than preventing pauses. Or, if everysec is too frequent, consider appendfsync always only if your disk I/O can truly handle it without causing other bottlenecks.
      • Schedule AOF Rewrites: If possible, schedule AOF rewrites to occur during low-traffic periods. You can manually trigger BGREWRITEAOF when you know traffic is low.
      • Disable AOF if not needed: If you are using RDB snapshots and don’t require the durability guarantees of AOF, disable it by setting appendonly no.
    • Why it works: BGREWRITEAOF forks to rewrite the Append-Only File. If this happens frequently or when other forks are occurring, it doubles the fork load. Adjusting fsync reduces the frequency of disk writes, which can indirectly impact fork performance by freeing up I/O.
  5. Replication Syncs:

    • Diagnosis: If you have Redis replicas, a full synchronization (SYNC or PSYNC) to a replica also involves a fork on the master. Monitor your replication lag (INFO replication on the master, look at master_repl_offset vs. slave_repl_offset and lag). If replicas frequently disconnect and reconnect, a full sync might be happening often.
    • Fix:
      • Ensure Stable Network: A stable, high-bandwidth network connection between master and replica is crucial to reduce sync times.
      • Increase repl-diskless-sync: In redis.conf, set repl-diskless-sync yes. This allows the master to send the RDB file directly over the network to the replica without writing it to disk first, speeding up the sync.
      • Optimize RDB Snapshotting: Ensure your RDB snapshots are not too large and are generated efficiently.
    • Why it works: Diskless replication bypasses the disk I/O for RDB transfer, making the sync process much faster and less disruptive to the master’s fork operations.
  6. Running SAVE Command Manually:

    • Diagnosis: Check your Redis command logs or audit logs for manual SAVE commands. The SAVE command is a blocking operation that forks the process. BGSAVE is the non-blocking asynchronous version.
    • Fix:
      • Always Use BGSAVE: Replace any instances of SAVE in your scripts or applications with BGSAVE.
      • Configure RDB Snapshots: Rely on the automatic RDB snapshotting configured in redis.conf (e.g., save 900 1, save 300 10, save 60 10000).
    • Why it works: SAVE is synchronous and will pause Redis for the entire duration of the fork and save operation. BGSAVE performs the fork in the background, allowing Redis to continue serving requests.

The next error you’ll likely encounter after resolving fork latency is a "Redis connection refused" or a client timeout error, as the system recovers and clients re-establish connections.

Want structured learning?

Take the full Redis course →