Redis Sentinel is reporting itself as not master or not connected to the master, leaving your Redis cluster in a degraded state.
Common Causes and Fixes
-
Network Partition Between Sentinel and Master:
- Diagnosis: From the Sentinel server, try to
pingthe master Redis server’s IP address and port. Also, checkredis-cli -h <master-ip> -p <master-port> INFO sentinel. If the Sentinel cannot reach the master, you’ll see connection errors in the Sentinel logs (/var/log/redis/sentinel.logor similar). - Fix: Ensure firewalls (e.g.,
ufw,iptables) on both the Sentinel and master nodes allow traffic on the Redis port (default 6379) and Sentinel port (default 26379) between the relevant IPs. For example, on the master:sudo ufw allow from <sentinel-ip> to any port 6379. On the Sentinel:sudo ufw allow from <master-ip> to any port 26379. - Why it works: Redis Sentinel communicates with the master over its configured port. If this traffic is blocked, the Sentinel cannot verify the master’s status.
- Diagnosis: From the Sentinel server, try to
-
Incorrect Sentinel Configuration (sentinel.conf):
- Diagnosis: Examine the
sentinel.conffile on your Sentinel instances. Look for discrepancies in thesentinel monitor <master-name> <master-ip> <master-port> <quorum>directive. The<master-name>must be identical across all Sentinels monitoring the same master. The<quorum>must be set correctly (e.g., 2 for a 3-node Sentinel setup). - Fix: Correct the
sentinel.conffile to match the master’s IP and port, and ensure the master name is consistent. For example, if your master is at192.168.1.100:6379and you want to call itmymasterwith a quorum of 2:
Then restart the Sentinel process:sentinel monitor mymaster 192.168.1.100 6379 2sudo systemctl restart redis-sentinel. - Why it works: The
sentinel monitordirective is how Sentinel learns about the master it’s supposed to be watching. Mismatches lead to Sentinels not recognizing or communicating with the correct master.
- Diagnosis: Examine the
-
Master Redis Server is Actually Down or Unresponsive:
- Diagnosis: Try connecting to the master Redis server directly using
redis-cli -h <master-ip> -p <master-port> PING. If it doesn’t respond withPONG, the master is indeed down. Check the master Redis logs for any errors. - Fix: Investigate why the master Redis process is not running or is unresponsive. This could involve checking system logs (
/var/log/syslog,journalctl), memory usage (free -m), or disk space (df -h). Restart the master Redis process:sudo systemctl restart redis-server. - Why it works: Sentinel’s primary job is to monitor the master. If the master is truly unavailable, Sentinel will correctly report it as such and attempt to failover.
- Diagnosis: Try connecting to the master Redis server directly using
-
Master Redis Server Overloaded or Crashing:
- Diagnosis: Monitor the master Redis server’s CPU, memory, and I/O. High load can cause Redis to become unresponsive, leading Sentinel to believe it’s down. Use
top,htop,iotop, andredis-cli --bigkeysorINFO memoryto check for issues. Look forOOM(Out Of Memory) errors in the master’s Redis logs. - Fix: Optimize Redis performance: reduce memory usage by improving data structures or expiring keys, upgrade hardware, or tune Redis configuration parameters like
maxmemoryandmaxmemory-policy(e.g.,volatile-lruorallkeys-lru). Ensureappendfsyncis not set toalwaysif performance is critical and data loss is acceptable on crash. - Why it works: An overloaded or memory-starved Redis instance cannot respond to PINGs or Sentinel’s requests in time, triggering a false positive for master failure.
- Diagnosis: Monitor the master Redis server’s CPU, memory, and I/O. High load can cause Redis to become unresponsive, leading Sentinel to believe it’s down. Use
-
Sentinel Clock Skew:
- Diagnosis: Ensure all Sentinel nodes and the master node have synchronized clocks. Use
ntpdate -q <ntp-server-ip>ortimedatectl statuson each node. Significant clock differences (more than a few seconds) can cause issues with leader election and status checks. - Fix: Configure NTP on all Redis and Sentinel nodes. For example, ensure
ntpdorchronydis running and configured to sync with reliable time servers. - Why it works: Redis Sentinel relies on timestamps for leader election and to determine the health of other Sentinels and the master. Clock drift can lead to incorrect state reporting and failed elections.
- Diagnosis: Ensure all Sentinel nodes and the master node have synchronized clocks. Use
-
Sentinel Process Crashed or Not Running:
- Diagnosis: Check if the Sentinel process is running on the Sentinel nodes:
ps aux | grep redis-sentinel. Also, check Sentinel logs for startup errors. - Fix: Ensure the Sentinel service is enabled and started:
sudo systemctl enable redis-sentinelandsudo systemctl start redis-sentinel. If it’s crashing, examine the system’ssyslogorjournalctlfor the Sentinel process for clues. - Why it works: If the Sentinel process isn’t running, it obviously can’t monitor anything, and the master will appear as "not connected" from the perspective of clients trying to connect through Sentinel.
- Diagnosis: Check if the Sentinel process is running on the Sentinel nodes:
-
High Latency Between Sentinels:
- Diagnosis: If Sentinels cannot communicate with each other reliably due to network latency or packet loss, they may disagree on the master’s status or fail to elect a leader. Use
pingandmtrbetween Sentinel nodes. - Fix: Investigate and resolve network issues causing high latency or packet loss between your Sentinel nodes. This might involve network hardware checks, routing optimization, or ensuring sufficient bandwidth.
- Why it works: Sentinel relies on consensus among its peers. If they can’t talk to each other effectively, they can’t agree on the state of the master, leading to instability and "not connected" states.
- Diagnosis: If Sentinels cannot communicate with each other reliably due to network latency or packet loss, they may disagree on the master’s status or fail to elect a leader. Use
The next error you’ll likely encounter after fixing Sentinel issues is a client being unable to connect to Redis because the Sentinel hasn’t successfully promoted a replica to master, or the client configuration points to the wrong Sentinel endpoint.