Intro

I was recently asked to design an application to help track the delivery of critical medicines worldwide, including under extreme circumstances such a global pandemic. I had already started to think through how I could leverage Azure’s various multi-region services (e.g. CosmosDB, Traffic Manager) when I was told it had to withstand the loss of the cloud provider. My first thought was how unlikely it would be for multiple Azure regions to go down at once, but after some investigation I learned that there are more reasons than high availability to consider a multi-cloud strategy.

Multi-Cloud Pros

  • No vendor lock-in: you can cherry-pick the services you need from each provider
  • Extremely high availability since loss of an entire cloud can be tolerated
  • Better responsiveness as data can be hosted closer to the users
  • Better support for data sovereignty regulations, where data must be stored in a particular country or region
  • Can be more cost efficient as workloads can be placed with the lowest cost provider, and you may have more bargaining leverage on cost

Multi-Cloud Cons

  • More expensive and time-consuming to develop, test and support given the lack of standardization of cloud provider APIs, services, and infrastructure
  • Running the same workload in multiple clouds may make it difficult to leverage cloud-specific services
  • May be more expensive if services must run in multiple clouds
  • Costing models, billing and reporting differ between the cloud providers
  • Cross-cloud networking is complicated and hard to support

Given the differences between the various cloud providers, layering abstractions onto their APIs is important in order to simplify development, testing and support. Kubernetes has become this standard abstraction for good reason:

  1. Deploying containers on Kubernetes is simpler than provisioning VMs
  2. Kubernetes standardizes deployments with minimal differences between cloud providers. In this way, Kubernetes abstracts away the differences between clouds.

In this case,  I wanted the application running in all of the supported clouds at the same time. I didn’t want to worry about that catastrophic event that requires a full cloud failover; multi-cloud should be the usual state. This suggested that writes should be supported in all regions (i.e. database is multi-master), and that the data is replicated across cloud providers.

While various cross-cloud data replication solutions are available, I wanted to explore ways to accomplish this that do not require 3rd party solutions or introduce additional points of failure into the system. Let’s start with the assumption that our use case does not require ACID guarantees, so a NoSQL database should be fine. That also means we will not have to worry about cross-datacenter (or cross-cloud) transactions which would really slow us down and introduce failure scenarios we would have to design around.

Data updates should propagate across clouds with clear rules for conflict resolution. This is where leveraging a provider-specific database would create headaches, as the logic for conflict resolution is generally handled inside the database itself. As data moves across providers, these rules would need to be implemented independently in the application.

A better option is to deploy the same database across the different clouds, one that already supports reliable, cross-datacenter replication such as Cassandra. However, this does require configuring Cassandra clusters to connect to each other securely across the public network, something you don’t have to worry about with a cloud provider-managed database. But once you have accomplished that, Kubernetes will ensure your clusters remain highly available.

Read on to see how I set this up. It should take 3 – 4 hours to do it yourself.