The Story
On March 19, 2026, between 16:10 UTC and 00:05 UTC the following morning, developers on the US west coast noticed something strange: cloning from GitHub was slow. Not GitHub-slow. Dial-up slow. Clone speeds dropped below 1 MB/s. Teams running large monorepos watched a simple git pull stretch into a 40-minute operation. Nothing in GitHub's infrastructure had changed. The problem was a cut in the physical fiber that connected GitHub's Seattle edge site to the rest of the backbone.
Why TCP made it worse
Git's transfer protocol is not built for lossy links. A broken TCP session mid-clone means starting over from scratch — adding load back to an already saturated link. Every retry attempt made the next one slower.The Seattle edge site was the primary entry point for US west coast users. The fiber cut forced all traffic through the remaining links, which saturated quickly. Packet loss made TCP connections unstable. Git's protocol restarted stalled transfers from the beginning, which compounded the congestion. The fix wasn't a configuration change or a rollback. GitHub accelerated a planned capacity expansion — upgrading from 800 Gbps to 3.2 Tbps of edge capacity — and activated cloud region capacity as a user redirect during recovery.
The Fix
GitHub accelerated a capacity upgrade that was already planned but not yet scheduled. The Seattle edge backbone expanded from 800 Gbps to 3.2 Tbps — a 4x increase. Cloud region capacity was brought online during the recovery window to redirect users while the physical infrastructure was restored. The fiber cut itself was repaired by the network provider. The capacity upgrade that the cut forced into emergency execution had been on the roadmap for months.
Solution
Seattle edge capacity expanded from 800 Gbps to 3.2 Tbps. Cloud region fallback activated during recovery. Planned upgrade executed as an emergency instead of a scheduled window.
Seattle edge before vs after
| Metric | Before fiber cut | After upgrade |
|---|---|---|
| Backbone capacity | 800 Gbps | 3.2 Tbps |
| Cloud region fallback | Not active | Active redirect |
| Single-site dependency | Yes | Partially resolved |
Lessons
What to remember
- Fiber cuts are outside your control. Redundant backbone paths at a single edge site are not. Geographic concentration of edge capacity is a physical single point of failure.
- TCP behavior under packet loss multiplies user-visible impact. A 20% packet loss rate can feel like a 90% speed reduction to a Git client retrying from scratch.
- Planned capacity expansions that keep getting deprioritized become emergency expansions when something breaks. Track the headroom runway, not just current utilization.
- Cloud region fallback should be pre-configured and tested, not assembled during an active incident with a saturated edge.
The upgrade was already planned. The fiber cut just decided the schedule.