GitHub Codespaces Feb 2026: Telemetry Loss Blocked VMs

The Story

On February 2, 2026, between 18:35 UTC and 23:10 UTC, GitHub Codespaces stopped launching. Existing sessions held on, but no new environment could spin up. Copilot Coding Agent, CodeQL, Dependabot, GitHub Enterprise Importer, and GitHub Pages all went down with it. The error offered no useful signal: 'failed to provision VM.' What actually happened was that GitHub's security automation decided every VM on the platform was untrusted.

GitHub's compute infrastructure uses a telemetry pipeline to pass health and identity signals about active VMs to the security layer. When that pipeline dropped — due to an unrelated infrastructure change — the security system didn't see an uncertain situation. It saw silence, and it treated silence as a confirmed threat. Security policies designed to protect backend storage accounts activated automatically, blocking access to VM metadata. Codespaces depends on that metadata service to initialize every new environment. With it blocked, all provisioning failed in every region.

Problem

A telemetry gap caused security policies to activate against backend storage accounts. This blocked VM metadata access, preventing all Codespaces provisioning and halting Actions hosted runners across all regions and runner types.

Problem

Telemetry pipeline drops

At 18:35 UTC, compute telemetry stops flowing. Security policies interpret the silence as a breach signal and activate against backend storage accounts.

Cause

VM metadata access blocked

New Codespaces VMs cannot access their own metadata service — a prerequisite for initialization. All provisioning attempts fail across all regions.

Solution

Policy rollback and telemetry restore

Engineers identify the telemetry gap and roll back the security policy activation. Telemetry pipeline is restored.

Result

Full recovery by February 3

Standard runners recover at 23:10 UTC. Larger runners recover at February 3, 00:30 UTC. Codespaces fully restored at 00:15 UTC. Self-hosted runners on other providers were unaffected throughout.

The Fix

GitHub separated the telemetry-dependent security policies from the provisioning critical path. New VMs now receive a time-limited provisioning trust token that allows metadata access even when telemetry has not yet fully initialized. The security policies that triggered were moved to a higher activation threshold — multiple corroborating signals are now required before policies can block infrastructure access. A single missing telemetry stream is no longer sufficient to trigger a lockout.

Solution

Provisioning trust token added with time-limited metadata access. Multi-signal threshold required before security policies activate. Telemetry loss alone can no longer trigger a storage lockout.

Lessons

What to remember

Systems that act on the absence of a signal are dangerous. Silence is not evidence of a threat. It is evidence of a monitoring gap.
Security automation that can block production infrastructure needs the same change-control rigor as a production deploy — not just a security review.
If your security posture instantly hardens when telemetry drops, you've built a system that becomes less available the less observable it is.
Verify trust before provisioning starts, not during it. The provisioning path is not the right place to enforce complex security policies mid-operation.

We built the security system to act decisively on threat signals. The mistake was treating a missing signal as a threat signal.GitHub incident review, February 2026

The Story

Telemetry pipeline drops

VM metadata access blocked

Policy rollback and telemetry restore

Full recovery by February 3

The Fix

Lessons

Related Stories

How GitHub Upgraded 1200 MySQL Hosts Without Dropping a Single Query

GitHub Built the Internet's Code Platform — Then AI Agents Broke It

GitHub's Settings Cache Went Stale and Took Authentication, Actions, and Copilot Down With It