Topic

Security

A security patch triggers a global outage. A configuration change meant to harden infrastructure accidentally breaks it. These case studies cover what happens when the fix is as dangerous as the vulnerability — and how engineering teams navigate security decisions that affect millions of users without a safety net.

A Security Fix Broke 28% of the Internet for 25 Minutes — Cloudflare's December 2025 Outage

A well-reviewed security patch hit production traffic patterns it had never seen in testing, and a retry amplification loop did the rest.

25 min outage ~28% internet affected HTTP 500 errors +1 Security fix still shipped

GitHub Lost Telemetry and Its Own Security System Started Blocking Real Developer VMs

A telemetry pipeline went silent. GitHub's security automation treated the silence as a threat signal — and locked every Codespaces VM out of its own metadata service.

~6 hr Codespaces outage All regions affected Copilot, CodeQL blocked +1 Self-hosted runners unaffected

Uber Had 150,000 Secrets Scattered Across 25 Vaults — So They Built One Platform to Rule Them

150,000 secrets. 25 separate vaults. Hundreds of teams managing their own credentials in their own ways, some in plain text in version control. At Uber's scale — 5,000 microservices, 5,000 databases, 500,000 analytical jobs per day — secrets sprawl is not a compliance problem. It is an incident waiting to happen. A team of ten engineers decided to fix it.

150,000 secrets managed 25 vaults → 6 managed vaults 5,000 microservices secured +2 20,000 automated rotations/month 90% fewer secrets in pipelines