Company

GitHub

Every GitHub engineering case study on TechLogStack — real production incidents, post-mortems, and fixes.

GitHub Databases
16 min

How GitHub Upgraded 1200 MySQL Hosts Without Dropping a Single Query

MySQL 5.7 was hitting end-of-life, and GitHub's production database fleet spanned 1,200 hosts, 300 terabytes of data, and 5.5 million queries every second. Getting from here to MySQL 8.0 without disrupting 100 million developers was going to take more than a weekend.

1,200+ MySQL hosts upgraded 300+ TB data migrated 5.5M queries/sec maintained +2 >1 year planning+execution 50+ clusters zero-downtime
GitHub Distributed Systems
18 min

GitHub Built the Internet's Code Platform — Then AI Agents Broke It

Between May 2025 and April 2026, GitHub experienced 257 incidents — 48 of them major outages. That's roughly one significant disruption every single week. The culprit wasn't a security breach, a botched deployment, or a rogue engineer. It was the thing GitHub had spent years celebrating: AI. Specifically, agentic AI workflows that turned one human developer's footprint into hundreds of commits, thousands of CI minutes, and a dozen simultaneous PR operations — all at once, across millions of accounts. GitHub had been built for humans. Agents are not human.

257 incidents — May 2025 to April 2026 48 major outages, 112+ hours total downtime 57 GitHub Actions outages in 12 months +1 10x scaling plan revised to 30x by February 2026
GitHub Reliability
17 min

The Test That Broke GitHub: A Failover Drill Goes Live

June 29, 2023, 17:39 UTC: GitHub engineers initiate a planned live failover test of their brand-new second Internet edge facility — six months of infrastructure work designed to eliminate a single point of failure. Within seconds, instead of validating their redundancy, they've created an outage that takes GitHub offline for millions of developers across North America and South America.

32-minute outage 2-min detect-to-revert ~100M devs affected