Redis started to feel sluggish, and you’re seeing timeouts on your application side. The culprit is almost always that Redis itself is taking too long to respond to commands. This isn’t usually a network blip, but Redis getting stuck.

The LATENCY HISTORY command is your first line of defense. It tracks the longest-running commands over time, giving you a historical view of when Redis was struggling.

redis-cli --latency-history 127.0.0.1

This will spit out lines like:

1678886400.000000000000 [0] 1.234567 ms (0.000000 ms) -- SET mykey 123
1678886460.000000000000 [0] 2.789012 ms (0.000000 ms) -- GET anotherkey

The first number is the Unix timestamp, the second is the latency (the time Redis took to execute that specific command), and the third is the exceeded value (how much the latency exceeded the configured threshold, if any). The command itself is the last part.

This history tells you which commands are slow, and when. Now, let’s dig into why.

Common Causes of Redis Latency

  1. Slow Commands on Large Data Structures: The most frequent offender. Commands like SMEMBERS on a set with millions of members, LRANGE on a list with millions of elements, or KEYS * (which you should never run in production) can take a long time to execute. Redis is single-threaded for command execution, so a long-running command blocks everything else.

    • Diagnosis: Look at LATENCY HISTORY for commands operating on large keys. If you suspect a specific key, use redis-cli --bigkeys to find large keys.
    • Fix: Refactor your application to avoid commands that iterate over entire collections or large parts of them. Instead of SMEMBERS followed by application-side filtering, use SSCAN with a pattern. Replace KEYS * with SCAN in a loop. If you must use a slow command, consider running it during off-peak hours or on a replica if your data model allows.
    • Why it works: SCAN is an iterator, returning elements in batches and yielding control back to Redis between batches, preventing it from blocking for too long.
  2. Evictions: If your Redis instance is running out of memory (maxmemory is set and hit), it will start evicting keys to make space. Eviction policies (like allkeys-lru) require Redis to scan keys to find candidates for eviction, which can be a CPU-intensive operation and cause latency.

    • Diagnosis: Check redis-cli INFO memory. Look for evictedkeys count increasing rapidly. Also, check redis-cli INFO stats for instantaneous_ops_per_sec and total_commands_processed to see if there’s a correlation with spikes.
    • Fix: Increase maxmemory if you have available RAM, or optimize your data to use less memory. If you can’t increase maxmemory, consider a more efficient eviction policy if appropriate, but the real fix is usually to reduce memory usage or increase available memory.
    • Why it works: Reducing the need for evictions or providing more memory means Redis doesn’t have to spend CPU cycles finding keys to delete.
  3. Background Save Operations (RDB/AOF Rewrite): Redis can perform RDB snapshots and AOF rewrites in the background. While it tries to minimize impact, these operations can consume CPU and I/O, especially on slower disks or when the dataset is very large.

    • Diagnosis: Run redis-cli INFO persistence. Look at rdb_bgsave_in_progress and aof_rewrite_in_progress. If these are frequently "1" during your latency spikes, it’s a strong indicator. Also, monitor your system’s I/O wait times (iostat).
    • Fix: Configure save points for RDB snapshots to occur less frequently or during off-peak hours. For AOF, tune auto-aof-rewrite-percentage and auto-aof-rewrite-min-size to trigger rewrites less often. Consider disabling RDB if you primarily use AOF for durability.
    • Why it works: By controlling when and how often these background tasks run, you reduce the chance they’ll overlap with critical command execution times and contend for resources.
  4. High CPU Usage (Other Processes or Redis Itself): Redis is single-threaded for command execution, but other processes on the same machine can steal CPU cycles. Even within Redis, complex Lua scripts or the background save operations can consume significant CPU.

    • Diagnosis: Use top or htop to check overall CPU usage. If redis-server is consistently at 100% CPU or if other processes are hogging CPU, that’s the problem. Also, check redis-cli INFO stats for latest_fork_usec, which indicates how long the last fork (used for background operations) took. A high value means the fork itself caused a significant pause.
    • Fix: If other processes are the issue, move them to a different machine or optimize them. If Redis itself is the bottleneck, consider optimizing your data structures, avoiding slow commands, or scaling up to a larger instance with more CPU cores. For Lua scripts, optimize their logic.
    • Why it works: Ensuring Redis has dedicated CPU resources for command processing, or optimizing its internal CPU-bound operations, directly leads to faster responses.
  5. Network Issues (Less Common for Redis Internal Latency): While this guide focuses on internal Redis latency, sometimes perceived slowness is due to network saturation or high latency between the client and Redis.

    • Diagnosis: Use redis-cli --latency to measure the round-trip time from the client to the server. High min or avg values indicate network issues. Also, ping the Redis server from your application server.
    • Fix: Optimize network configuration, use dedicated network interfaces, or move your application and Redis closer geographically.
    • Why it works: Reducing the time it takes for requests to reach Redis and responses to return directly improves perceived performance.
  6. Replication Lag: If your application is reading from a replica and that replica is lagging behind the primary, you might see stale data or slower reads if the replica is also busy.

    • Diagnosis: On the replica, run redis-cli INFO replication. Check master_repl_offset vs. slave_repl_offset and master_sync_id. A large difference or master_sync_id mismatch indicates lag.
    • Fix: Ensure the replica has sufficient network bandwidth and is not overloaded. If the primary is too busy to send data, you might need to scale the primary.
    • Why it works: A replica that is caught up to the primary can serve reads with minimal delay.

After fixing these, the next error you’ll likely encounter is a READONLY error if you’re trying to write to a replica.

Want structured learning?

Take the full Redis course →