Cloud "oops turns out our distributed system is centralized in a single datacenter" Flare

"Well, we *thought* we had High Availability, but we never actually tested that"

- Cloudflare, supposed distributed systems experts, processing a double-digit percentage of the world's web traffic

I heard there's this well-tested distributed network routing protocol, that is extremely resilient to provider failures. I think it's called BGP - perhaps we should all be using that instead.

🤔 Cloudflare seems to actually have pissed off the HN commenters this time

Follow

Some other juicy bits about the outage:
- No 24/7 (experienced) technician availability at the datacenter that hosted their control plane(!)
- No end-to-end service dependency tracking or diagrams
- Therefore, supposedly HA services depending on non-HA infrastructure
- Even if the "redundant" setup *did* work (it didn't), all three locations would be physically within *the same earthquake zone*

This is absolute clowncar level network administration, frankly, for something the size and importance of Cloudflare.

· · Web · 2 · 5 · 10

@joepie91 wait, the internet went down while I was out eating dinner? I better go read up on this lol

@joepie91 sorry, are you talking about the multi-billion dollar networking company or a 6 month old startup?

Sign in to participate in the conversation
Pixietown

Small server part of the pixie.town infrastructure. Registration is closed.