The conventional wisdom about database scaling at 800 million users is straightforward: you shard. You move to a distributed SQL system. You decompose into microservices each with their own database. You do not run a single primary PostgreSQL instance. OpenAI's ChatGPT does not follow this conventional wisdom. It runs on one Azure PostgreSQL Flexible Server that handles all writes — backed by approximately 50 read replicas spread across multiple regions. The system handles millions of queries per second at low double-digit millisecond p99 latency and has maintained five-nines availability. In twelve months, they had one SEV-0. The story is not that Postgres is magic. The story is that relentless optimization of a boring, proven technology can outperform premature architectural complexity.
WHY SINGLE-PRIMARY WORKS AT THIS SCALE
ChatGPT's workload is
overwhelmingly read-heavy. When 800 million users open the app, browse their chat history, or load their settings, those are reads. Writes happen on message submission and account updates — a much smaller fraction of the total traffic. This access pattern is exactly what a single-primary with many read replicas handles well: the write path stays narrow, the read load fans out horizontally across replicas. The architecture is not brilliant. It is appropriate for the workload. That fit is what makes it work.
OpenAI's blog published at PGConf.dev 2025 was unusually candid about both the decisions that worked and the ones that nearly broke the system. The database load grew by more than 10x in a single year following ChatGPT's viral growth. The team responded with aggressive optimization at every layer: connection management, query design, caching, write path discipline, and schema change governance. Each of these deserves examination — not because the techniques are novel, but because executing all of them simultaneously, under extreme growth pressure, with production at risk, is far harder than any one technique in isolation.
🔌OpenAI's Azure PostgreSQL Flexible Server has a maximum of 5,000 concurrent connections. At ChatGPT's scale, application servers would easily exhaust this limit without connection pooling. Before deploying , average connection time was 50ms. After deployment in statement-pooling mode: 5ms. A 10x improvement from one infrastructure change.
Problem
10x Database Load Growth in One Year
ChatGPT's viral growth — 100 million users in two months at launch, 800 million by 2025 — drove database load up more than 10x in a single year. Connection exhaustion became a recurring threat. A 12-table ORM-generated join was causing multiple high-severity incidents when traffic spiked. Write pressure on the single primary was approaching dangerous levels during high-demand events.
Cause
Invisible Query Complexity and Write Pressure
ORMs generate SQL automatically, hiding complexity from developers. Under low load, even a 12-table join is fast enough to not notice. Under 10x load, the same query saturates database CPU. Meanwhile, write-heavy workloads that could be migrated to sharded systems like Azure Cosmos DB remained on the single primary longer than optimal.
Solution
Multi-Layer Defense: Pool + Cache + Rate Limit + Migrate
OpenAI implemented PgBouncer connection pooling (cutting connect time 10x), a cache-locking mechanism to prevent thundering herd on cache misses, multi-layer rate limiting at application, proxy, and query levels, surgical elimination of the worst ORM-generated queries, strict schema change governance (5-second DDL timeout), and a policy of migrating all new write-heavy workloads to sharded systems by default.
Result
One SEV-0 in Twelve Months, Five-Nines Availability
One SEV-0 in twelve months — triggered by the viral launch of ChatGPT ImageGen, which caused a 10x write surge as over 100 million users signed up within a week. Postgres recovered by design. p99 latency held at low double-digit milliseconds. The single-primary architecture remained viable at a scale that surprised the entire database engineering community.
OpenAI's schema change governance is one of the most operationally distinctive aspects of their Postgres setup. They enforce a strict rule: schema changes that trigger a full table rewrite are prohibited in production. Postgres's model means that operations like ALTER TABLE ADD COLUMN DEFAULT on large tables can hold an exclusive lock for hours while rewriting billions of rows. This would be catastrophic at ChatGPT's scale. All DDL operations have a 5-second timeout: if the schema change cannot acquire a lock within 5 seconds, it is cancelled automatically. Long-running queries that would block vacuum or DDL are automatically terminated.
ℹ️The Hot Standby in High-Availability Mode
OpenAI runs the primary database in High-Availability mode with a hot standby — a continuously synchronized replica specifically designated as the failover target. If the primary goes down, the hot standby can be promoted to primary with ~30–60 seconds of downtime. During a primary failure, read traffic on replicas is unaffected — since most ChatGPT requests are reads, a primary failure is not a SEV-0 (because reads remain available). Writes fail until promotion completes. This asymmetry between read and write availability is a conscious architectural tradeoff: the 800 million users who are just browsing conversation history continue being served.
ℹ️Why Not Shard? The Honest Answer
The engineering question 'why didn't OpenAI shard PostgreSQL?' has a straightforward answer: sharding is expensive and their workload didn't require it yet. Horizontal sharding introduces cross-shard transaction complexity, scatter-gather query patterns, operational overhead of multiple database instances, and application-layer awareness of shard routing. For a read-heavy workload that can be served from replicas, these costs are not justified. OpenAI chose to pay the operational cost of extreme Postgres optimization rather than the architectural cost of sharding — and the math worked out. The 'no new tables' policy ensures this calculation will be revisited for write-heavy workloads as they emerge.
IDLE TRANSACTION TIMEOUTS: THE QUIET KILLER
OpenAI identified a subtle but devastating Postgres pattern at scale: idle transactions. When application code opens a database connection, starts a transaction, does unrelated work (calling an external API, waiting for user input), and only then commits — the transaction holds locks for the entire duration. At ChatGPT's scale, applications that hold open transactions for seconds can block vacuum, block DDL, and degrade query performance for all other connections. OpenAI enforces
strict idle_in_transaction_session_timeout settings — any connection idle inside a transaction for more than a few seconds is automatically terminated.
This breaks poorly-written code immediately in staging rather than causing incidents in production.
📊Despite having ~50 read replicas across multiple geographic regions, OpenAI reports near-zero replication lag on most replicas under normal conditions. This is achieved by co-locating PgBouncer, application servers, and replicas in the same region (minimizing network latency in the replication path) and by keeping primary write load within the replication throughput capacity of the replicas. Heavy write events — like the ImageGen launch surge — temporarily increase replication lag, which is why read-your-own-write operations are always routed to the primary.
⚠️The Write Ceiling Is Real
OpenAI's single-primary architecture has an acknowledged limit: write-heavy events can overwhelm it. The ImageGen SEV-0 was caused by a write surge, not a read surge. The architecture is not defended against arbitrary write load — it is defended against the current write profile, which remains manageable because most new write-heavy workloads are being routed to Cosmos DB. If write load grows faster than the migration effort proceeds, the single-primary architecture will face a harder ceiling. The 'no new tables in Postgres' policy is the operational discipline that buys time.