Cloud "oops turns out our distributed system is centralized in a single datacenter" Flare

"Well, we *thought* we had High Availability, but we never actually tested that"

- Cloudflare, supposed distributed systems experts, processing a double-digit percentage of the world's web traffic

Follow

I heard there's this well-tested distributed network routing protocol, that is extremely resilient to provider failures. I think it's called BGP - perhaps we should all be using that instead.

· · Web · 2 · 0 · 5

🤔 Cloudflare seems to actually have pissed off the HN commenters this time

Some other juicy bits about the outage:
- No 24/7 (experienced) technician availability at the datacenter that hosted their control plane(!)
- No end-to-end service dependency tracking or diagrams
- Therefore, supposedly HA services depending on non-HA infrastructure
- Even if the "redundant" setup *did* work (it didn't), all three locations would be physically within *the same earthquake zone*

This is absolute clowncar level network administration, frankly, for something the size and importance of Cloudflare.

@joepie91 wait, the internet went down while I was out eating dinner? I better go read up on this lol

@joepie91 sorry, are you talking about the multi-billion dollar networking company or a 6 month old startup?

@joepie91 haven't followed what happened at Cloudflare but:
- BGP is what Cloudflare is using?
- you very much can fuck up BGP, remember the Facebook outage? BGP route hijacks also happen, sometimes to the extent of your traffic randomly going via China Telecom
- you need to register as LIR with one of a few orgs like RIPE, and own/lease blocks of IP addresses for yourself, not sure what's the cost of it

Sign in to participate in the conversation
Pixietown

Small server part of the pixie.town infrastructure. Registration is closed.