Redis is hitting pauses, and it’s because the operating system is doing a "fork" operation to create a copy of the Redis process’s memory, and that copy is taking too long. This pause, known as Copy-on-Write (COW) latency, directly impacts Redis’s ability to respond to commands.
Common Causes and Fixes for Redis Fork Latency
-
Large Memory Footprint:
- Diagnosis: Check your
redis_memory_bytesmetric in Redis. If this is close to yourmaxmemorysetting or the total RAM available on your server, it’s a prime suspect. You can also useINFO memoryinredis-cliand look atused_memory_human. - Fix:
- Increase
maxmemory(if applicable and RAM is available): Editredis.confand setmaxmemory <new_limit>mb. For example,maxmemory 16gb. This tells Redis to evict keys when it reaches a certain threshold, reducing the amount of memory that needs to be copied during a fork. - Reduce Redis Memory Usage: Analyze your keyspace using
redis-cli --bigkeysorredis-cli KEYS '*' | wc -l(use with caution on production systems) to identify large keys or a high number of keys. Implement an eviction policy (e.g.,maxmemory-policy allkeys-lru) inredis.conf. - Increase Server RAM: If possible, upgrade your server’s RAM.
- Increase
- Why it works: A smaller dataset means less memory to copy. Eviction policies ensure Redis doesn’t grow indefinitely, keeping the dataset manageable.
- Diagnosis: Check your
-
High System Load / Other Processes Consuming Memory:
- Diagnosis: Use
toporhtopon your Redis server. Look for other processes that are consuming significant CPU and memory. If the system is generally under heavy load, memory allocation and deallocation can become slow, impacting the fork. - Fix:
- Identify and Relocate or Optimize Other Heavy Processes: If another application is hogging resources, consider moving it to a different server, optimizing its memory usage, or scheduling its heavy operations during off-peak hours for Redis.
- Tune
vm.swappiness: In/etc/sysctl.conf, setvm.swappiness = 1(or a very low value). This tells the kernel to avoid swapping out application memory unless absolutely necessary. - Restart
sysctl: Runsudo sysctl -p.
- Why it works: Reducing competition for RAM and CPU allows the operating system to perform the fork operation more efficiently. Lowering
swappinessprioritizes keeping application memory in RAM, which is faster to access.
- Diagnosis: Use
-
Fragmented Memory:
- Diagnosis: While Redis itself tries to manage memory efficiently, the underlying OS can also experience memory fragmentation. This is harder to diagnose directly from Redis but often correlates with high memory usage and system load. Look for high memory usage across the system that doesn’t directly map to a single large allocation.
- Fix:
- Restart Redis (with caution): A full restart of Redis will cause it to reallocate memory, which can sometimes de-fragment it. This will cause a brief downtime.
- Reboot the Server: A more drastic measure, but a server reboot will clear all memory fragmentation.
- Consider
jemalloc: Ensure you are usingjemallocas your memory allocator. Redis often defaults to it if available. Checkredis.confforjemalloc-malloc-hook yes(though this is often automatic).jemallocgenerally offers better fragmentation resistance.
- Why it works: Memory fragmentation means that even if total free memory is sufficient, it’s broken into small, non-contiguous chunks, making it harder for the OS to allocate a large contiguous block needed for the fork. Restarting or rebooting forces a fresh allocation.
-
Frequent
BGREWRITEAOForBGRESIZEOperations:- Diagnosis: Check Redis logs for frequent messages related to "Background rewriting AOF" or "Background resizing." These operations also involve forking. If they happen concurrently with other fork-triggering events (like
SAVEor replication syncs), it exacerbates the problem. - Fix:
- Tune AOF
fsyncPolicy: Inredis.conf, changeappendfsync everysec(default) toappendfsync noif data durability is less critical than preventing pauses. Or, ifeverysecis too frequent, considerappendfsync alwaysonly if your disk I/O can truly handle it without causing other bottlenecks. - Schedule AOF Rewrites: If possible, schedule AOF rewrites to occur during low-traffic periods. You can manually trigger
BGREWRITEAOFwhen you know traffic is low. - Disable AOF if not needed: If you are using RDB snapshots and don’t require the durability guarantees of AOF, disable it by setting
appendonly no.
- Tune AOF
- Why it works:
BGREWRITEAOFforks to rewrite the Append-Only File. If this happens frequently or when other forks are occurring, it doubles the fork load. Adjustingfsyncreduces the frequency of disk writes, which can indirectly impact fork performance by freeing up I/O.
- Diagnosis: Check Redis logs for frequent messages related to "Background rewriting AOF" or "Background resizing." These operations also involve forking. If they happen concurrently with other fork-triggering events (like
-
Replication Syncs:
- Diagnosis: If you have Redis replicas, a full synchronization (
SYNCorPSYNC) to a replica also involves a fork on the master. Monitor your replication lag (INFO replicationon the master, look atmaster_repl_offsetvs.slave_repl_offsetandlag). If replicas frequently disconnect and reconnect, a full sync might be happening often. - Fix:
- Ensure Stable Network: A stable, high-bandwidth network connection between master and replica is crucial to reduce sync times.
- Increase
repl-diskless-sync: Inredis.conf, setrepl-diskless-sync yes. This allows the master to send the RDB file directly over the network to the replica without writing it to disk first, speeding up the sync. - Optimize RDB Snapshotting: Ensure your RDB snapshots are not too large and are generated efficiently.
- Why it works: Diskless replication bypasses the disk I/O for RDB transfer, making the sync process much faster and less disruptive to the master’s fork operations.
- Diagnosis: If you have Redis replicas, a full synchronization (
-
Running
SAVECommand Manually:- Diagnosis: Check your Redis command logs or audit logs for manual
SAVEcommands. TheSAVEcommand is a blocking operation that forks the process.BGSAVEis the non-blocking asynchronous version. - Fix:
- Always Use
BGSAVE: Replace any instances ofSAVEin your scripts or applications withBGSAVE. - Configure RDB Snapshots: Rely on the automatic RDB snapshotting configured in
redis.conf(e.g.,save 900 1,save 300 10,save 60 10000).
- Always Use
- Why it works:
SAVEis synchronous and will pause Redis for the entire duration of the fork and save operation.BGSAVEperforms the fork in the background, allowing Redis to continue serving requests.
- Diagnosis: Check your Redis command logs or audit logs for manual
The next error you’ll likely encounter after resolving fork latency is a "Redis connection refused" or a client timeout error, as the system recovers and clients re-establish connections.