What happens if a master node fails in a Redis Cluster?
Jul 13, 2025 am 12:16 AMRedis Cluster handles master node failure through automatic detection, replica promotion, and client redirection. 1. Nodes detect failure via gossip protocol, marking node as PFAIL then FAIL if majority of masters agree. 2. Eligible replicas request votes, and the winner becomes new master, taking over writes and slot ownership. 3. Cluster does not automatically rebalance slots or replace lost replicas, requiring manual intervention. 4. Failover may result in data loss due to async replication, though configurations like min-replicas-to-write can mitigate risk.
When a Redis Cluster is running normally, all nodes (both master and replica) communicate using the Redis Cluster Bus. If a master node fails, Redis Cluster has built-in mechanisms to detect the failure, promote a replica to become the new master, and redistribute traffic accordingly.
Here’s how it works in more detail:
1. Failure Detection
Redis Cluster nodes constantly ping each other to check health. If a master node becomes unreachable (due to crash, network partition, etc.), other nodes mark it as PFAIL
(possible fail). After some time, if enough nodes agree, the status changes to FAIL
.
- Nodes use a gossip protocol to share information.
- At least
(N/2) 1
master nodes must acknowledge the failure for the cluster to proceed with failover.
This mechanism prevents false positives due to temporary issues or short network hiccups.
2. Replica Election and Failover
Once a master is marked as FAIL
, one of its replicas is promoted to be the new master.
- Replicas eligible for promotion are those that have successfully replicated data recently.
- The cluster uses a voting system: replicas request votes from other nodes, and the first to get enough votes wins.
- After promotion, the new master starts accepting writes, and the cluster updates its internal state.
Clients connected to the old master will be redirected to the new master automatically (if they support cluster mode).
3. Cluster Rebalancing (Not Immediate)
After failover, the cluster doesn’t immediately rebalance slots or create new replicas. That means:
- If there were multiple replicas before, one may now be missing.
- Slot ownership remains with the new master.
- You’ll need to manually add a new replica later if desired.
This helps avoid unnecessary overhead during transient failures but leaves room for human intervention.
4. What Happens to Data During Failover?
Redis Cluster uses asynchronous replication by default, so there's a small chance of data loss during failover:
- If the master had accepted writes that hadn't yet been replicated, those changes may be lost.
- To reduce risk, you can configure Redis to delay acknowledgments until at least one replica confirms receipt (
min-replicas-to-write
setting).
Still, in most setups, the trade-off between performance and consistency favors this async model.
That’s basically how Redis Cluster handles a master node failure — detection, agreement, replica promotion, and client redirection. It’s not foolproof, but it’s solid for most production environments.
The above is the detailed content of What happens if a master node fails in a Redis Cluster?. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

RedisusesRDBsnapshotsandAOFloggingfordatapersistence.RDBprovidesfast,periodicbackupswithpotentialdataloss,whileAOFoffersdetailedloggingforpreciserecoverybutmayimpactperformance.Bothmethodscanbeusedtogetherforoptimaldatasafetyandrecoveryspeed.

Redisexcelsinreal-timeanalytics,caching,sessionstorage,pub/submessaging,andratelimitingduetoitsin-memorynature.1)Real-timeanalyticsandleaderboardsbenefitfromRedis'sfastdataprocessing.2)Cachingreducesdatabaseloadbystoringfrequentlyaccesseddata.3)Sessi

Redisislimitedbymemoryconstraintsanddatapersistence,whiletraditionaldatabasesstrugglewithperformanceinreal-timescenarios.1)Redisexcelsinreal-timedataprocessingandcachingbutmayrequirecomplexshardingforlargedatasets.2)TraditionaldatabaseslikeMySQLorPos

ShardedPub/SubinRedis7improvespub/subscalabilitybydistributingmessagetrafficacrossmultiplethreads.TraditionalRedisPub/Subwaslimitedbyasingle-threadedmodelthatcouldbecomeabottleneckunderhighload.WithShardedPub/Sub,channelsaredividedintoshardsassignedt

Redismanagesclientconnectionsefficientlyusingasingle-threadedmodelwithmultiplexing.First,Redisbindstoport6379andlistensforTCPconnectionswithoutcreatingthreadsorprocessesperclient.Second,itusesaneventlooptomonitorallclientsviaI/Omultiplexingmechanisms

Redisisbestsuitedforusecasesrequiringhighperformance,real-timedataprocessing,andefficientcaching.1)Real-timeanalytics:Redisenablesupdateseverysecond.2)Sessionmanagement:Itensuresquickaccessandupdates.3)Caching:Idealforreducingdatabaseload.4)Messagequ

RedisonLinuxrequires:1)AnymodernLinuxdistribution,2)Atleast1GBofRAM(4GB recommended),3)AnymodernCPU,and4)Around100MBdiskspaceforinstallation.Tooptimize,adjustsettingsinredis.conflikebindaddress,persistenceoptions,andmemorymanagement,andconsiderusingc

INCR and DECR are commands used in Redis to increase or decrease atomic values. 1. The INCR command increases the value of the key by 1. If the key does not exist, it will be created and set to 1. If it exists and is an integer, it will be incremented, otherwise it will return an error; 2. The DECR command reduces the value of the key by 1, which is similar in logic and is suitable for scenarios such as inventory management or balance control; 3. The two are only suitable for string types that can be parsed into integers, and the data type must be ensured to be correct before operation; 4. Commonly used in concurrent scenarios such as API current limiting, event counting and shared counting in distributed systems, and can be combined with EXPIRE to achieve automatic reset temporary counters.
