Topic

Performance

A p99 latency of 40ms sounds fine until the product ships and 1% of requests are failing under load. These case studies cover how engineering teams at major companies diagnosed performance problems that only appeared at scale — and the fixes that required rethinking assumptions baked in years earlier.

Google Built a Free Design Tool That Generates Production Code From a Sentence — Then Added Multiplayer

At Google I/O 2025, Sundar Pichai demoed a tool that turned a plain English description into a complete mobile UI in under 30 seconds. Figma charges $15 per editor per month for collaborative design. Google Stitch does it free. A year later, Google added real-time multiplayer, a streaming design agent, and voice input. The design industry noticed.

350 free generations/month

A Fiber Cut in Seattle Slowed GitHub Clone Speeds to Under 1 MB/s for Eight Hours

Nothing changed in GitHub's codebase. A cable under Seattle got cut, and git clone on the US west coast went from fast to dial-up speed — and TCP made it worse.

<1 MiB/s clone speed 800 Gbps → 3.2 Tbps ~8 hr disruption +1 No data lost

Stripe Converted 3.7 Million Lines of JavaScript in One Pull Request on a Sunday

On Sunday, March 6, 2022, Stripe merged a single pull request that converted their entire largest JavaScript codebase from Flow to TypeScript. 3.7 million lines of code. Hundreds of engineers arrived Monday morning to start writing TypeScript. The migration had been invisible until it wasn't.

3.7M lines converted in 1 PR

Netflix Made Their Workflow Orchestrator 100x Faster by Rewriting the Engine Nobody Thought Was Slow

Maestro had been running Netflix's data and ML workflows successfully for two and a half years. Then Live, Ads, and Games drove sub-hourly scheduling requirements that revealed the orchestrator's overhead — not in crashes or alerts, but in slow step launches that nobody had measured. The fix was a complete engine rewrite that delivered 100x throughput improvement.

100x throughput improvement 2.5 years before overhead visible 1M+ tasks/day still supported

Netflix's Containers Were Fighting Their Own CPUs — and Losing

Netflix ran millions of containers per day on modern multi-core CPUs. The containers performed well on benchmarks. In production, under certain workloads, they were mysteriously slower than expected — slower than the hardware should have allowed. The culprit was CPU topology: the operating system was scheduling container workloads in ways that violated modern CPU cache architecture. They called the investigation 'Mount Mayhem.'