I heard there's this well-tested distributed network routing protocol, that is extremely resilient to provider failures. I think it's called BGP - perhaps we should all be using that instead.
Some other juicy bits about the #Cloudflare outage:
- No 24/7 (experienced) technician availability at the datacenter that hosted their control plane(!)
- No end-to-end service dependency tracking or diagrams
- Therefore, supposedly HA services depending on non-HA infrastructure
- Even if the "redundant" setup *did* work (it didn't), all three locations would be physically within *the same earthquake zone*
This is absolute clowncar level network administration, frankly, for something the size and importance of Cloudflare.
@joepie91 wait, the internet went down while I was out eating dinner? I better go read up on this lol
@joepie91 sorry, are you talking about the multi-billion dollar networking company or a 6 month old startup?
@thibaultmol Yeahhhh
@joepie91 haven't followed what happened at Cloudflare but:
- BGP is what Cloudflare is using?
- you very much can fuck up BGP, remember the Facebook outage? BGP route hijacks also happen, sometimes to the extent of your traffic randomly going via China Telecom
- you need to register as LIR with one of a few orgs like RIPE, and own/lease blocks of IP addresses for yourself, not sure what's the cost of it
🤔 Cloudflare seems to actually have pissed off the HN commenters this time