A successful failover solution is one that accomplishes its intended goal transparently. There are three elements that contribute to a successful failover mechanism: data replication, node failure detection, and concerted DNS propagation.

Cross Datacenter Failover
Data replication is the mechanism that makes sure changes on the active node, whether those are database or simply content, make their way into the backup node. The assumption here is that the backup node will at some point take over the role the main node. The latter can happen anytime the main node experiences a failure. Data replication can either be real-time or delayed.
Node failure can be caused by several factors such as network failure, process level failure, or infrastructure failure. Regardless of the root cause, the failover mechanism should function as intended. There are several ways to check on the health of a node. There is heart beat and daemon level checks. Heart beat type checks are only useful for local area failover configurations. Regional failover health checks are better performed by probing daemons remotely such as HTTP on port 80 or HTTPS on port 443.
A strong DNS system is essential to a seamless failover solution. That is how domains ultimately resolve to IP addresses. For a seamless failover system, the DNS system has to be dispersed worldwide and have records with very low propagation attributes. Not only does it need to be dispersed to ensure resiliency at the DNS level but all DNS actions have to coordinated amongst all the individual DNS servers.
In this post, we have touched briefly on each of the three elements that make up a failover system. We shall continue covering all aspects of failover in subsequent entries.
Stay tuned!
Failover is a mechanism that ensures service continuity whereby single-point failures are eliminated. It is a mechanism because failover has to be minutiously designed, implemented, and deployed. There are several types of failover mechanisms but the most resilient configuration spans geographical locations.
Geographical failover is a redundant system that is made to survive disruptions that can involve whole regions. Geographical failover eliminates certain weaknesses inherent to natural disasters, man-made disasters, mistakes, and law enforcement orders. Going forward, we will focus on geographical failover or commonly abbreviated as geo failover, as we will refer to it henceforth.
Traditionally, however, failover is implemented at the network level of the infrastructure to prevent network disruptions. But in reality, there are several elements of an infrastructure that could experience disruption and therefore render the whole unusable. Even though network failover is implemented on almost any network today, disruptions still occur. This is unfortunately bound to happen. The latter is analogous to constructing a strong bullet-proof front-door for a house when the windows are vulnerable and fragile.
In the same train of thought, a geo failover house would be a set of two identical houses located in different neighborhoods or cities. Should one house be robbed or completely destroyed, the backup house is used instead. This geo redundancy comes at a cost. One has to build a second house, the hot backup or redundant house, with identical characteristics, furnitures, and provisions.
Of course, there’s a price to pay for an independently successful failover plan. The cost, however, far outweighs the outcome of disruptions. In the case of a business, disruptions cause material loss due to missing out on revenue opportunities because potential customers cannot reach you or your online store. In the next post, we will discuss an implementation of geo failover based on the most successful hosting platform in the world.
Stay tuned!