The Story
At 12:30 AM AEST on September 18, 2025, Optus began a routine firewall upgrade. Firewall upgrades happen hundreds of times a year across any large telecommunications network. This one went wrong. By 12:30 AM, customers in Northern Territory, South Australia, Western Australia, and New South Wales could not complete calls to Triple Zero — Australia's emergency services number. The outage lasted 13 hours. During that window, approximately 600 emergency calls failed. Four people who needed emergency services and could not get through died.
Problem
A firewall configuration error broke the routing path for Triple Zero emergency calls. No automatic failover activated. Emergency calling is a legal obligation for all Australian carriers. The failure persisted for 13 hours across four states.
The upgrade introduced a configuration error in the call routing layer — the component that passes emergency calls from mobile networks to the Public Safety Answering Point (PSAP) dispatch infrastructure. Emergency calls were not rerouted to a secondary path when the primary failed. For 13 hours, calling Triple Zero from an affected Optus network returned silence. The engineering failure was compounded by the absence of automatic failover: no backup route activated, and the manual recovery process took hours to complete.
Problem
Firewall upgrade begins
At 12:30 AM AEST, the firewall upgrade is applied. A configuration error in the routing layer breaks the Triple Zero call path within minutes.
Cause
No automatic failover activates
Emergency call routing has no secondary path configured to activate automatically when the primary route fails. Calls return silence.
Solution
Engineers identify and push corrective config
Optus engineers locate the misconfigured routing rule and deploy a corrective configuration to restore the Triple Zero call path.
Result
Emergency calling restored at 1:30 PM AEST
Full restoration of Triple Zero routing after 13 hours. Australian telecommunications regulators open formal investigations across all carriers.
The Fix
Optus implemented mandatory live-call simulation testing for any firewall change that touches the emergency call routing path. A test call is placed on isolated infrastructure before any routing change goes to production. A secondary PSAP routing path was added with automatic activation within two seconds if the primary route fails to complete a call. Australian telecommunications regulators tightened testing and failover requirements for emergency service infrastructure changes across all carriers following the incident.
Solution
Live-call simulation required before production routing changes. Automatic PSAP failover activates within 2 seconds of primary route failure. Regulatory requirements tightened for all Australian carriers.
Lessons
What to remember
- Emergency call routing is not configurable the way application routing is. It requires its own isolated test environment and a mandatory live-call gate before any change touches production.
- Automatic failover for emergency services cannot be optional. A manual recovery that takes 13 hours is not a recovery — it is a systemic failure with human consequences.
- Regulatory compliance defines the floor, not the ceiling. The legal obligation to carry emergency calls demands the same redundancy thinking as the highest-uptime commercial systems.
- A failed emergency call should generate an alert in seconds, not after hours of silence. Build detection for the absence of expected call completions, not just the presence of errors.
We were upgrading infrastructure to protect the network. Instead, we disconnected people from the one service that protects them.