1,200+
MySQL hosts across Azure Virtual Machines and bare-metal data center hardware — each needing individual upgrade without disturbing its neighbors
300+ TB
Relational data stored across 50+ clusters, sharded both horizontally and vertically using Vitess for GitHub's highest-traffic product domains
5.5M QPS
Queries per second sustained throughout the entire year-long upgrade — the SLO target that could not slip during any single cluster promotion
>1 year
Total duration from preparation start in July 2022 through final cluster upgrades — a timeline that reflects the discipline of doing this safely, not slowly
GitHub started as a Ruby on Rails application with a single MySQL database over 15 years ago. Since then, MySQL had become the foundation of everything GitHub stores: repositories, pull requests, issues, code review comments, user accounts, billing data, and the entire social graph of 100 million developers. By 2022, was approaching end-of-life — Oracle had announced support would end in October 2023. The GitHub database team made a simple calculation: stop receiving security patches on the database that holds every line of code pushed to GitHub, or upgrade. The only real question was how to upgrade 1,200 hosts, 300+ TB of data, and 5.5 million queries per second without disrupting a single user-visible transaction.
Preparation began in July 2022 — a full year before any production host was promoted to 8.0. The team added MySQL 8.0 to for all applications using MySQL, running 5.7 and 8.0 side-by-side to catch regressions early. They built MySQL 8.0 debug containers so developers could test their queries against the new version. They created an internal GitHub Project board to track every cluster's upgrade status across the entire fleet. And they did all of this before upgrading a single production host. The discipline of the preparation phase is what made the execution phase look routine.
THE HIDDEN BREAKING CHANGE
MySQL 8.0 changes the
default character set to utf8mb4 and its default collation to `utf8mb4_0900_ai_ci` — a newer Unicode specification that MySQL 5.7 does not support. This created a problem: when an 8.0 primary replicates writes to a 5.7 replica, the collation metadata in the
can cause replication to break entirely on the downstream 5.7 nodes. GitHub's rollback strategy depended on maintaining backward replication from 8.0 to 5.7 — so this had to be solved before a single production primary was promoted.
Problem
MySQL 5.7 Hits End-of-Life
Oracle announced MySQL 5.7 end-of-life for October 2023, cutting off security patches and bug fixes. GitHub's 1,200+ host fleet running at 5.5M QPS could not safely continue on an unsupported database version. The challenge was executing a major version upgrade across a mixed fleet of Azure VMs and bare-metal hosts without a maintenance window or service disruption.
Cause
Backward Replication Incompatibilities
Testing revealed two breaking changes: MySQL 8.0's new default broke downstream 5.7 replicas, and the new MySQL 8.0 roles feature caused permission-expansion scripts to generate 8.0-syntax statements that 5.7 replicas could not parse. Both had to be patched before any primary promotion.
Solution
Rolling Replica Upgrades + Dual Replication Chains
GitHub built a 5-step playbook: upgrade replicas one data center at a time, reconfigure the to create parallel 5.7 and 8.0 chains, promote an 8.0 host to primary via graceful failover, keep 5.7 standbys ready for rollback, then clean up after 24 hours of successful traffic.
Result
100% Fleet Upgraded, Zero SLO Violations
Every cluster upgraded without a single SLO violation. The rollback path was preserved throughout the entire year-long process — a 5.7 standby was always available. The project delivered not just the MySQL 8.0 upgrade but a repeatable automation framework for future major version upgrades, so the next one will be faster.
🔄GitHub's engineers discovered a replication bug in MySQL 8.0 that only manifested under intensive load over long periods — a host could eventually run out of commit-order sequence numbers and stall. The bug had been patched in MySQL 8.0.28. This meant GitHub had to ensure all hosts were on 8.0.28 or later before any long-running cluster was considered safe, adding a version-pinning requirement to an already complex upgrade matrix.
The upgrade process for each cluster was designed to preserve the rollback option at every single step. Promoting an 8.0 replica to primary was never an irreversible action until after 24 hours of clean traffic had confirmed success. During the brief window of dual replication chains, GitHub maintained a set of offline 5.7 replicas specifically for rollback — not serving traffic, not receiving new promotion candidates, just sitting ready. was configured to blacklist all 5.7 hosts as failover candidates during this window, preventing an automated failover from accidentally rolling back to 5.7 during an unplanned outage. The architecture of the rollback path was as carefully designed as the architecture of the upgrade path itself.
ℹ️What MySQL 8.0 Actually Unlocked
Beyond escaping end-of-life, MySQL 8.0 delivered features GitHub's database team genuinely wanted. Instant DDLs allow many schema changes to be applied without rebuilding the entire table — critical for a 300+ TB fleet where traditional ALTER TABLE could take hours. Invisible indexes let engineers create an index, test it under production traffic without it being used by the query planner, and only then make it active — dramatically safer index deployment. Compressed binary logs reduce replication bandwidth between primary and replicas, a meaningful saving at 5.5M queries per second.