🍵️

2022-07-20

The Modern Web's Many Single Points of Failure

Running a big web service? Reliability is key to serve your customers well. You decide to move to a cloud service. They boast 99.99% uptime and many modern services to make development, deployment, and many other things easier.

Your backends are hammered by traffic as your service grows, and since you pay for bandwidth and your customers are more globally distributed than the two or so regions you run in (that's decent redundancy and not too costly, after all) you decide to use a CDN. It's cheaper and serves your content faster.

Oh, you also need to handle access to some services where some customers need to be able to log in but it should be closed to the general public. And DDOS protection wouldn't be bad at all. Cloudflare seems like a good choice.

Of course you have your code in git forge and run CI/CD processes in another cloud service. When (not if) these are breached you'll probably leak code because you haven't updated all of your services to the best security practices yet. This doesn't affect the service uptime, thankfully, but let's just say shareholders and management aren't too happy.

As far as I know Cloudflare has never had a major crash, but imagine if they did.

Modern web services build their entire businesses on being available and their customers come to expect it. They use a whole suite of tools to reach that goal, but each new tool brings both benefits and risks. Would a slower development pace and a lower expectation of uptime and speed bring more security?

Some Outages

CircleCI leaked login credentials in June to September 2019.

AWS leaked Github credentials in January 2020.

AWS went down in November 2020

CodeCov leakes environment variables, often including access credentials, in Januray to April 2021.

The CDN provider Fastly crashed in June 2021.

Google Cloud had a networking issue in November 2021.

CircleCI had problems with their service in April 2022

Microsoft Azure is transparent about their many smaller and bigger issues. See for example their networking issue on June 28th (post mortem summary on the 29th) 2022.

Multiple cloud services have cooling issues and shut down servers in July 2022.

-- CC0 Björn Wärmedal