The Redis master is down, and a failover is in progress because the Sentinel monitoring process detected that the master instance was no longer reachable and initiated the process of promoting a replica to become the new master.
Cause 1: Master Instance Actually Crashed or Became Unresponsive
Diagnosis: Check the Redis master’s process status. On the master server, run:
ps aux | grep redis-server
If the process is not running, it has crashed. Check Redis logs for crash reasons:
tail -n 100 /var/log/redis/redis-server.log
Look for OOM (Out Of Memory) errors, segmentation faults, or disk full messages.
Fix: If the instance crashed due to OOM, you need to increase Redis’s memory limit or reduce its memory usage.
- Increase
maxmemory(if set): Editredis.confand set a highermaxmemoryvalue. For example:
Then restart Redis:maxmemory 8gbredis-cli shutdownfollowed byredis-server /etc/redis/redis.conf. - Increase System Memory: If
maxmemorywas not set or is already high, the system might be out of RAM. Increase the server’s RAM or tune OS-level OOM killer settings (though this is a last resort). - Address Disk Full: If logs indicate disk full, free up space or expand the disk.
Why it works: Redis needs memory to operate. If it runs out, it can crash or become unresponsive, triggering Sentinel. By providing more memory or freeing up existing space, Redis can function correctly.
Cause 2: Network Partition Between Sentinel and Master
Diagnosis:
From a Sentinel machine, try to ping the master’s IP address directly.
ping <redis_master_ip>
If ping fails, or if redis-cli -h <redis_master_ip> -p 6379 ping returns Could not connect to Redis, there’s a network issue. Check firewall rules on both the master and Sentinel servers, and any network devices in between.
On the master server, check its network interface status:
ip addr show
And check routes:
ip route show
Fix:
- Firewall Rules: Ensure that port 6379 (or your Redis port) is open for TCP traffic between the Sentinel(s) and the master.
On
ufw(Ubuntu/Debian):
Onsudo ufw allow from <sentinel_ip> to any port 6379 sudo ufw allow from <redis_master_ip> to any port 6379 # If master needs to reach sentinelfirewalld(CentOS/RHEL):sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="<sentinel_ip>/32" port port="6379" protocol="tcp" accept' sudo firewall-cmd --reload - Routing/Network Configuration: Correct any misconfigured network interfaces, routes, or DNS issues that prevent communication.
Why it works: Sentinels monitor the master by sending PING commands over the network. If network connectivity is lost, Sentinels incorrectly assume the master is down. Restoring network path or opening firewall ports allows Sentinels to communicate with the master.
Cause 3: Master Instance is Overloaded and Not Responding to PINGs
Diagnosis: If the master is still running but unresponsive, check its CPU and network load. On the master server:
top -n 1 -c -i -P
Look for Redis processes consuming near 100% CPU. Also, check network traffic:
sar -n DEV 1 5
If Redis is consistently maxing out CPU or network bandwidth, it might not be able to respond to Sentinel’s PING commands in time.
Fix:
- Optimize Redis Workload: Identify slow commands or excessive traffic patterns. Use
redis-cli --latency -h <redis_master_ip> -p 6379to check latency. - Scale Up/Out: Increase the master’s resources (CPU, RAM, network) or offload read traffic to replicas if applicable.
- Adjust Sentinel Timeout: Increase
down-after-millisecondsin Sentinel configuration if the overload is temporary and acceptable. Insentinel.conf:
This tells Sentinel to consider the master down only after it hasn’t responded for 10 seconds (default is 3000ms). Apply changes withdown-after-milliseconds mymaster 10000sentinel reload-config.
Why it works: High load can prevent Redis from processing PING requests from Sentinels within the configured timeout. Either by reducing the load, increasing master capacity, or giving Sentinel more time to wait, the master becomes responsive again from Sentinel’s perspective.
Cause 4: Sentinel Configuration Errors (Incorrect Master IP/Port, Wrong Quorum)
Diagnosis:
Examine the Sentinel configuration file (sentinel.conf on each Sentinel instance).
Look for the sentinel monitor <master-name> <ip> <port> <quorum> directive. Ensure the <ip> and <port> correctly point to the current master’s address.
Check the <quorum> value. If it’s too high for the number of active Sentinels, they might not be able to agree on a failover.
Fix:
- Correct Master Details: Update
sentinel.confwith the correct IP and port for the master.
After editing, reload the Sentinel configuration:sentinel monitor mymaster 192.168.1.100 6379 2redis-cli -p 26379 -h <sentinel_ip> sentinel reload-config - Adjust Quorum: Ensure
quorumis less than or equal to the number of Sentinels you have running. If you have 3 Sentinels, a quorum of 2 is usually appropriate.
Reload Sentinel config.sentinel monitor mymaster 192.168.1.100 6379 2
Why it works: Sentinels need accurate information to monitor the correct master and agree on its state. Incorrect IP/port means they’re looking at the wrong target, and an incorrect quorum can prevent them from reaching consensus on a failover, even if the master is truly down.
Cause 5: Master Redis Version Incompatibility with Sentinel Version
Diagnosis:
Check the Redis and Sentinel versions.
On the master: redis-cli --version
On the Sentinel: redis-cli -p 26379 --version
While Redis Sentinel is generally robust across versions, very old versions of Sentinel might have issues communicating with very new Redis masters, or vice-versa, especially if there were significant protocol changes.
Fix: Upgrade both Redis master and Sentinel instances to the same, recent stable version.
- Upgrade Redis Master: Follow standard Redis upgrade procedures, typically involving downloading the new version, stopping the old instance, configuring the new one, and starting it.
- Upgrade Sentinel: Similarly, update the Sentinel binary and configuration, then restart the Sentinel process.
Why it works: Ensures a consistent communication protocol between Redis instances and their monitors, preventing subtle bugs or misunderstandings that could lead to false positives or failed failovers.
Cause 6: DNS Resolution Issues for Master Hostname
Diagnosis: If your Sentinel configuration uses a hostname for the master instead of an IP address:
sentinel monitor mymaster redis.example.com 6379 2
On the Sentinel machine, try resolving the hostname:
nslookup redis.example.com
dig redis.example.com
If these fail, or if they return an incorrect IP address, DNS is the problem.
Fix:
- Correct DNS Records: Update the DNS record for the master hostname to point to the correct IP address.
- Check Sentinel’s DNS Server: Ensure the Sentinel machine is configured to use a reliable DNS server. Check
/etc/resolv.confon the Sentinel machine. - Use IP Address: As a workaround, change
sentinel.confto use the master’s IP address directly. Remember to reload Sentinel config.
Why it works: Sentinels rely on DNS to find the master. If the hostname doesn’t resolve to the correct IP, the Sentinel cannot reach the master it’s supposed to be monitoring.
The next error you’ll likely encounter if you fix this is * +sdown master <master-name> <ip> <port> <count> followed by * +resetting master <master-name> and * +succeeder master <master-name> <ip> <port>.