Redis Sentinel is a fascinating piece of kit, and its automatic failover setup is less about magic and more about a distributed consensus problem that’s surprisingly elegant. The most surprising thing is that Sentinel doesn’t actually detect failures directly; it infers them based on a quorum of other Sentinels agreeing a master is down.

Let’s see it in action. Imagine a master Redis instance (redis-master) and two Sentinel instances (redis-sentinel-1, redis-sentinel-2) watching it.

Scenario Setup:

  • Master Redis:

    redis-server /etc/redis/6379.conf --port 6379
    

    (Assuming a standard redis.conf for master, listening on 127.0.0.1:6379)

  • Sentinel 1:

    redis-server /etc/redis/sentinel.conf --sentinel --port 26379
    

    /etc/redis/sentinel.conf:

    port 26379
    dir /tmp
    sentinel monitor mymaster 127.0.0.1 6379 2
    sentinel down-after-milliseconds mymaster 5000
    sentinel failover-timeout mymaster 10000
    sentinel parallel-syncs mymaster 1
    
  • Sentinel 2:

    redis-server /etc/redis/sentinel-2.conf --sentinel --port 26380
    

    /etc/redis/sentinel-2.conf (identical to Sentinel 1’s config, but on port 26380)

The "Monitoring" Dance:

Sentinels constantly ping their configured masters. If a Sentinel doesn’t get a reply within down-after-milliseconds (5 seconds in our example), it marks the master as Subjectively Down (SDOWN). This is just Sentinel 1’s opinion, or Sentinel 2’s opinion.

But one Sentinel’s opinion isn’t enough for failover. Sentinel 1 will then start asking other Sentinels (Sentinel 2 in this case) if they also think the master is down. If enough Sentinels (the quorum, set by 2 in sentinel monitor mymaster 127.0.0.1 6379 2) agree, the master is marked as Objectively Down (ODOWN).

The Failover Process:

Once a master is ODOWN, one Sentinel is elected the Leader Sentinel. This leader is responsible for orchestrating the failover. It will:

  1. Find a good replica: The leader scans all known replicas of the master and picks the "best" one to promote. "Best" usually means the one with the lowest offset (most up-to-date with master’s logs).
  2. Promote a replica: The leader sends a REPLICAOF NO ONE command to the chosen replica. This turns the replica into a new master.
  3. Reconfigure other replicas: The leader then tells all other former replicas to replicate the new master using REPLICAOF <new_master_ip> <new_master_port>.
  4. Update configuration: All Sentinels update their internal configuration to point to the new master.

The Client’s Role:

Applications need to be Sentinel-aware. They connect to any Sentinel instance and ask for the current master for a given service name (mymaster). The Sentinel responds with the IP and port of the current master. If the application receives an error (e.g., a MOVED error or a connection refused), it should ask the Sentinel again for the master’s address. This is how clients automatically switch to the new master after a failover.

The sentinel monitor mymaster 127.0.0.1 6379 2 line is crucial. The 2 is the quorum. This means at least 2 Sentinels must agree the master is down before Sentinel will consider promoting a replica. If you only had one Sentinel, it would never reach quorum and failover would never happen.

The sentinel parallel-syncs mymaster 1 setting determines how many replicas can be reconfigured to sync with the new master concurrently. Setting it to 1 means replicas will be reconfigured one by one, which is safer but slower. A higher number can speed up the process but increases the risk of a replica getting into a bad state if the new master is unhealthy.

What most people miss is how the Sentinels themselves maintain consensus on the state of the system and the leader for failover. They use a form of the Raft consensus algorithm, but simplified. When a failover is needed, Sentinels vote for a leader among themselves. A Sentinel can only become a leader if it has been elected by a majority of the Sentinels and it successfully promotes a replica. If multiple Sentinels try to become leaders simultaneously, they might get into a deadlock, but Redis Sentinel has mechanisms to resolve this by backing off and retrying.

The next hurdle is usually configuring Sentinel-aware clients correctly, or understanding the nuances of connection pooling with Sentinel.

Want structured learning?

Take the full Redis course →