Company

Spotify

Every Spotify engineering case study on TechLogStack — real production incidents, post-mortems, and fixes.

Spotify Reliability
20 min

Spotify Changed a Filter Order in Their Proxy — Then Every Server in the World Crashed at Once

On April 16, 2025, Spotify's engineering team made a change they deemed low risk: reordering the custom filters inside their Envoy Proxy perimeter. They applied it to all regions simultaneously. Within two minutes, every Envoy instance worldwide had crashed. And then the restart loop began — a loop Kubernetes itself was powering, killing each new server as fast as it came back up. 675 million users couldn't load the app. Asia Pacific stayed up, and the reason why told the engineers exactly what was broken.

675M MAU affected 48,000+ Downdetector peak reports