Company

Cloudflare

Every Cloudflare engineering case study on TechLogStack — real production incidents, post-mortems, and fixes.

Cloudflare Reliability
18 min

Cloudflare Fixed a React Security Vulnerability and Broke the Entire Network

In late 2025, Cloudflare was rolling out a fix for a React security vulnerability. To do so, they needed to disable an internal testing tool with a global killswitch. The killswitch, unexpectedly, triggered a bug that sent HTTP 500 errors across Cloudflare's entire global network. This was the third major configuration-related global outage in two years.

Cloudflare Reliability
16 min

Cloudflare's Datacenter Partner Failed and the Control Plane Went Dark for 40 Hours

On November 2, 2023, Cloudflare's primary datacenter partner experienced a power failure. The control plane — the system that lets customers configure DNS, firewall rules, and every Cloudflare service — went dark. It stayed dark, in various forms, for nearly 40 hours. The postmortem introduced a concept Cloudflare hadn't had before: Code Orange.

~40 hours control plane down
Cloudflare Reliability
17 min

A Database Permission Change in ClickHouse Took Down 28% of Cloudflare's HTTP Traffic

On November 2, 2023 — the same day as the control plane datacenter failure — Cloudflare also experienced a separate six-hour global outage. The cause: a database permission change in ClickHouse generated a corrupt configuration file that was silently propagated to every server in Cloudflare's Bot Management system, crashing it globally.

28% HTTP traffic impacted 6 hours total duration 2.5h to find root cause