The Redis cluster node failed to respond to a client’s connection request within the configured timeout period because the network path between them became saturated or intermittently unavailable.

Cause 1: Network Congestion

Diagnosis: Monitor network interface traffic on both the client and the Redis node. Look for sustained high utilization (e.g., >80%) on the interfaces involved in the Redis communication.

# On the client or Redis node:
sudo ethtool -S eth0 | grep rx_packets
sudo ethtool -S eth0 | grep tx_packets
# Compare rx_packets and tx_packets over time to infer traffic volume.
# Alternatively, use a tool like nload:
nload eth0

Fix: Identify and alleviate the source of the congestion. This might involve:

  • Increasing Bandwidth: Upgrade network interfaces or switch to higher-capacity links.
  • Traffic Shaping/QoS: Prioritize Redis traffic over less critical network flows.
  • Reducing Data Transfer: Optimize Redis client operations to send less data or fewer commands.

Why it works: Redis commands and cluster communications travel over the network. If the network pipes are full, packets will be dropped or delayed, leading to timeouts. Increasing bandwidth or prioritizing Redis traffic ensures that Redis packets have a clear path to their destination.

Cause 2: Firewall State Table Exhaustion

Diagnosis: Check the state table size and current usage on any firewalls or network address translation (NAT) devices between the client and the Redis node.

# Example for iptables on Linux:
sudo conntrack -S
# Look for 'entries' (current usage) approaching 'max' (limit).

If using a dedicated firewall appliance, consult its specific monitoring tools.

Fix: Increase the firewall’s state table limit. The exact method depends on the firewall software/hardware.

  • iptables (Linux):
    # View current limit
    cat /proc/sys/net/netfilter/nf_conntrack_max
    # Increase limit (e.g., to 1,000,000)
    sudo sysctl -w net.netfilter.nf_conntrack_max=1000000
    # Make persistent by adding to /etc/sysctl.conf
    echo "net.netfilter.nf_conntrack_max = 1000000" | sudo tee -a /etc/sysctl.conf
    
  • Dedicated Firewalls: Consult vendor documentation for increasing connection tracking limits.

Why it works: Each TCP connection (and many UDP flows) consumes an entry in the firewall’s state table. If this table fills up, the firewall can no longer track new connections, causing them to be dropped or ignored, leading to timeouts. Increasing the limit allows the firewall to manage more concurrent connections.

Cause 3: Redis Node Overload (CPU/Memory)

Diagnosis: Monitor the CPU and memory utilization on the Redis cluster nodes. High CPU usage (>90%) or memory pressure (e.g., nearing maxmemory if set, or swapping) can prevent Redis from processing new commands quickly enough.

# On the Redis node:
top -c
htop
# Or for memory usage:
redis-cli INFO memory | grep used_memory
redis-cli INFO persistence | grep rdb_bgsave_in_progress

Fix: Optimize Redis performance or scale the cluster.

  • Optimize Commands: Avoid slow commands (e.g., KEYS *, SMEMBERS on large sets) in production. Use SCAN instead of KEYS.
  • Increase Node Resources: Add more CPU cores or RAM to the Redis instances.
  • Scale Out: Add more Redis nodes to the cluster to distribute the load.
  • Tune maxmemory: If maxmemory is set, ensure there’s sufficient buffer for operations and eviction.

Why it works: When a Redis node is overloaded, it spends more time on existing tasks (like background saves, evictions, or complex command execution) and less time accepting and processing new client requests. This delay exceeds the client’s connection timeout.

Cause 4: Intermittent Network Packet Loss

Diagnosis: Use ping with a high packet count and size, or mtr (My Traceroute) to check for packet loss between the client and the Redis node.

# On the client, pinging a Redis node IP:
ping -c 100 -s 1024 192.168.1.100
# On the client, using mtr:
mtr --report --report-wide 192.168.1.100

Look for packet loss percentages greater than 0%.

Fix: Address the underlying network issue causing packet loss. This could involve:

  • Replacing faulty network hardware: Cables, switches, NICs.
  • Resolving duplex mismatches: Ensure network devices are configured for the same speed and duplex settings.
  • Working with your network provider: If the loss occurs on an external network segment.

Why it works: Even small amounts of packet loss can severely degrade performance. Redis relies on reliable TCP connections. Lost packets require retransmissions, significantly increasing latency and potentially causing the client’s timeout to expire before a response is received.

Cause 5: Incorrect Redis Cluster Slot Distribution

Diagnosis: Verify that the Redis client is aware of the correct cluster topology and slot distribution. An outdated or incorrect view of slots can lead to clients trying to connect to nodes that are not responsible for the requested keys.

# On any Redis node, to see slot distribution:
redis-cli cluster slots

Compare this output to what your client library reports or expects.

Fix: Ensure your Redis client library is configured to discover and update its cluster topology dynamically. If the client has a static configuration, update it with the current cluster slots output.

  • Client Configuration: Most libraries have an option to "refresh cluster slots" or similar. Ensure this is enabled or triggered periodically.
  • Manual Update: If necessary, restart the client application or manually trigger a topology refresh.

Why it works: Redis Cluster distributes keys across 16384 hash slots. Clients need to know which node owns which slot. If a client attempts to send a command for a key to a node that doesn’t own that key’s slot, the node will respond with a MOVED or ASK redirection. If the client is unaware of the correct node due to stale topology information, it might repeatedly try the wrong node, leading to timeouts as it waits for a response that never comes or is a redirection it can’t act on.

Cause 6: Aggressive cluster-node-timeout Setting

Diagnosis: Examine the cluster-node-timeout configuration parameter on your Redis cluster nodes. This is the time a node waits for another node to send a PING before considering it dead.

# On a Redis node:
redis-cli CONFIG GET cluster-node-timeout

A very low value (e.g., 1000ms or less) might be too sensitive for your network conditions.

Fix: Increase the cluster-node-timeout value. A common starting point is 5000ms (5 seconds).

# On a Redis node:
redis-cli CONFIG SET cluster-node-timeout 5000
# Make persistent by updating redis.conf and restarting the node.

Why it works: This setting dictates how quickly nodes within the cluster consider each other "down." If the network experiences brief, transient issues, a low cluster-node-timeout can cause nodes to incorrectly mark each other as failed, leading to cluster instability and potentially affecting client connections as the cluster rebalances or nodes restart. While primarily for inter-node communication, cluster instability can indirectly impact client operations.

The next error you’ll likely encounter is READONLY You can't write against a read-only cluster node.

Want structured learning?

Take the full Redis course →