Before the migration, Discord's message write and read path ran through a monolithic API server directly into the Cassandra cluster. There was no intermediary — every user action that read messages translated directly into a database query, with no protection against fan-out or . The API server held connection pools to Cassandra, handled for message pagination, and relied on Cassandra's own internal mechanisms (memtable, SSTables, compaction) to handle read pressure. Under normal load this worked. Under peak load — a major announcement, a viral moment, a World Cup Final — it did not.
Before: Direct API-to-Cassandra Architecture (Hot Partition Risk)
flowchart TD
client1["Client Browser"] --> api["Discord API Monolith"]
client2["Mobile Client"] --> api
client3["Desktop Client"] --> api
api -->|"CQL query per request"| cass_cluster["Cassandra Cluster (177 nodes)"]
cass_cluster --> node_a[("Node A — hot partition 🔥")]
cass_cluster --> node_b[("Node B")]
cass_cluster --> node_c[("Node C")]
node_a -->|"GC pause triggered"| alert["⚠️ On-call alert"]
node_a -->|"reads SSTables"| sstable1["SSTable 1"]
node_a -->|"reads SSTables"| sstable2["SSTable 2"]
node_a -->|"reads SSTables"| sstable3["SSTable 3"]
sstable1 -->|"merge required"| result["Merged result"]
sstable2 --> result
sstable3 --> result
result -->|"40–125ms p99"| api
SHARD-PER-CORE ARCHITECTURE
The fundamental reason ScyllaDB handles concurrent reads so much better than Cassandra is its
shard-per-core architecture. Each CPU core is assigned its own exclusive slice of the data and handles all requests for that data without coordination with other cores. In Cassandra's JVM-based model, all threads compete for heap memory under a single garbage collector. In ScyllaDB's C++ model,
each core is an independent actor: no cross-core locking, no GC, no stop-the-world. When one partition gets hot, it affects only the core assigned to that shard — it cannot cascade to neighbors.
ℹ️Consistent Hashing: Routing Channels to Service Instances
Each Rust data service instance is responsible for a deterministic subset of channel IDs via . This means if 1,000 users simultaneously load the same popular channel, all 1,000 requests arrive at the same service instance and collapse into one database query.
After: Rust Data Services + ScyllaDB Architecture (Hot Partition Mitigated)
flowchart TD
client1["Client Browser"] --> api["Discord API Monolith"]
client2["Mobile Client"] --> api
client3["Desktop Client"] --> api
api -->|"gRPC call"| ds1["Data Service Instance A\n(consistent hash: channel 0-N)"]
api -->|"gRPC call"| ds2["Data Service Instance B\n(consistent hash: channel N-M)"]
ds1 -->|"request coalescing"| coalesce1["Coalesce Worker\n(1 DB query for N requests)"]
ds2 -->|"request coalescing"| coalesce2["Coalesce Worker"]
coalesce1 -->|"single CQL query"| scylla["ScyllaDB Cluster (72 nodes)"]
coalesce2 --> scylla
scylla --> core_a["Core 0 — Shard A"]
scylla --> core_b["Core 1 — Shard B"]
scylla --> core_c["Core 2 — Shard C"]
core_a -->|"15ms p99 stable"| coalesce1
core_b --> coalesce2
scylla -->|"NVMe fast reads"| superdisk[("Superdisk\nNVMe + Persistent RAID 1")]