Discord's Cassandra to ScyllaDB Migration: How They Moved Tr

The Story

Our Cassandra cluster exhibited serious performance issues that required increasing amounts of effort to just maintain, not improve.

— Bo Ingram, Senior Software Engineer — via Discord Engineering Blog

Discord launched in 2015 with a mission to build the best voice and text chat platform for gamers. By 2017 they had outgrown MongoDB and migrated their entire message store to . Cassandra's promise was compelling: write anywhere, replicate everywhere, scale horizontally forever. For a few years it held. By 2022, however, the promises had curdled into a maintenance nightmare that consumed engineering cycles every single week. The database cluster had grown to 177 nodes holding trillions of messages, and keeping it alive required the kind of expertise and vigilance that should be reserved for nuclear reactor operators, not chat app engineers.

At peak, Discord's Cassandra cluster required engineers to manually reboot individual nodes after spiraled long enough to drop the node from the cluster. This was not a rare emergency — it was routine on-call work.

The core problem was architectural. Cassandra is written in Java, and Java's garbage collector periodically halts all threads in the JVM to reclaim heap memory — a moment engineers call a stop-the-world pause. Under Discord's workloads, these pauses could last long enough to cause cascading latency spikes visible to users, and in severe cases, the JVM's consecutive GC pauses got so bad that a node would effectively fall out of the cluster entirely. An on-call engineer would then have to manually reboot it and babysit it back to health. The p99 latency on historical message reads ranged between 40 and 125 milliseconds depending on whether compaction was running — an unpredictability that made SLO planning impossible. Every time someone tried to improve the cluster rather than merely maintain it, they risked triggering a cascade.

The Hot Partition Problem

Discord's message data model organized messages by channel ID and a fixed time window called a . This was efficient for write distribution and replication, but created a painful read problem. Cassandra performs writes cheaply by appending to a and an in-memory structure called a . Reads, however, must query the memtable and potentially multiple , a dramatically more expensive operation. When a popular Discord server made a major announcement and thousands of users simultaneously opened their apps to read it, every single one of those reads would hammer the same partition. The cluster called these hot partitions, and they were Discord's most common and painful operational incident.

Problem

The Maintenance Spiral

By early 2022, Discord's on-call rotation was spending more time nursing Cassandra than building features. GC pause alerts fired multiple times a week, and the p99 latency on reads ranged from 40ms to 125ms depending on whether compaction was running on the affected node — an unpredictability engineers had simply learned to live with.

Cause

JVM GC + Hot Partition Physics

The root cause split into two layers: on write-heavy nodes created latency cliffs, while Cassandra's read path — requiring merges across multiple SSTables — meant any popular channel partition would spike latency under concurrent user load. The combination made the cluster inherently unpredictable at scale.

Solution

ScyllaDB + Rust Data Services

Discord chose ScyllaDB, a Cassandra-compatible database rewritten in C++ with a . They also built a Rust-based data services layer between the API and the database to absorb hot-partition spikes via request coalescing. The migration tool was rewritten in Rust to achieve 3.2 million records per second transfer speed.

Result

9 Days, 4 Trillion Messages, Zero Users Noticed

The migration completed in nine days — the original estimate using ScyllaDB's Spark migrator had been three months. The cluster footprint shrank from 177 nodes to 72, each ScyllaDB node running with 9 TB of disk versus the average 4 TB on Cassandra. P99 latency for historical reads settled at a stable, predictable 15 milliseconds.

Why cassandra-messages Was the Last to Move

By 2020 Discord had migrated every other database to ScyllaDB — the messages cluster was the lone holdout. They deliberately waited to last because it was the most critical dataset: trillions of messages, nearly 200 nodes, and the one cluster whose failure would be immediately visible to every user. They used the other migrations to tune ScyllaDB for their access patterns first, including filing and waiting on performance improvements to ScyllaDB's reverse query support.

The Tombstone Trap at 99.9999%

The migration nearly ended in drama rather than triumph. After running the Rust migrator for days at 3.2 million records per second, the progress bar hit 99.9999% — and stopped. The migrator was timing out trying to read the last few because they contained gigantic ranges of that had never been compacted away. Engineers had to manually trigger compaction on that token range; seconds later, the migration hit 100%. Automated data validation confirmed correctness by sending a sample of reads to both databases and comparing results. Discord switched to ScyllaDB in May 2022.

THE WORLD CUP TEST

The real stress test came months after go-live: the 2022 FIFA World Cup Final between Argentina and France. Every goal by Messi, every equalizer by Mbappé, every moment in the shootout created a massive spike of simultaneous message reads across Discord's biggest servers. Under the old Cassandra architecture this would have triggered hot-partition alerts and cascading latency. Under ScyllaDB with the Rust data services layer, the monitoring dashboards showed nothing unusual. The system held flat through 120 minutes of football and a penalty shootout.

The Rust data services layer was the architectural insight that made ScyllaDB viable, not just the database choice alone. When a popular server makes an announcement and thousands of users simultaneously open their clients, all those read requests arrive at the data service within milliseconds of each other — all asking for the same messages in the same channel. Without coalescing, each request would hit the database separately, creating a hot partition. With , only one query goes to ScyllaDB; every subsequent request subscribes to the in-flight result and receives the answer when the single database query returns. The data services layer also used to route requests for the same channel to the same service instance, maximizing coalescing opportunity.

The Fix

177→72

Cassandra nodes replaced by ScyllaDB nodes — a 59% reduction in cluster footprint while handling the same workload

15ms

Stable p99 read latency on ScyllaDB, down from an unpredictable 40–125ms range on Cassandra depending on compaction status

9 days

Total migration time for 4+ trillion messages — versus the original 3-month estimate with ScyllaDB's Spark migrator

3.2M/s

Peak migration throughput of the Rust-rewritten migrator, unlocking a single-flip cutover instead of a complex time-based phased approach

The fix had three distinct components, and Discord was deliberate about not rushing any of them. First, they spent years migrating every other database to ScyllaDB to build operational expertise before touching the one cluster that mattered most. Second, they collaborated with the ScyllaDB team to improve reverse query performance — a blocker they hit in early testing — and waited until that was production-grade before proceeding. Third, they built the Rust data services layer before starting the migration, so the new database would go live already protected from hot-partition load patterns. This sequencing was the engineering discipline that made the migration look easy in retrospect.

The Rust Migrator Rewrite

The turning point in the migration timeline was a one-day engineering sprint. ScyllaDB's off-the-shelf estimated three months to move the message data — three months of dual-running two massive database clusters, three months of operational complexity, and three months of potential failure modes. Bo Ingram decided that was three months too long. He and two colleagues rewrote the migrator in Rust in a single day. The new migrator read token ranges from a database, checkpointed them locally via SQLite for crash recovery, and fired them into ScyllaDB as fast as possible. The result: 3.2 million records per second. The new estimate was nine days, and the team chose to do a single-flip cutover instead of a phased time-based approach entirely.

rust

// Simplified version of Discord's request coalescing logic in the Rust data service
// Real implementation uses Tokio async runtime

use std::collections::HashMap;
use tokio::sync::broadcast;

struct CoalescingDataService {
    // Map from cache_key -> active broadcast sender
    // If a task is in flight, subscribers receive the result
    in_flight: HashMap<String, broadcast::Sender<Message>>,
}

impl CoalescingDataService {
    async fn get_messages(
        &mut self,
        channel_id: u64,
        before_id: u64,
    ) -> Result<Vec<Message>> {
        // Build a stable cache key for this exact query
        let key = format!("{}:{}", channel_id, before_id);

        if let Some(sender) = self.in_flight.get(&key) {
            // A query for this channel is already in flight
            // Subscribe and wait — NO second database round-trip
            let mut rx = sender.subscribe();
            return Ok(vec![rx.recv().await?]); // receive the shared result
        }

        // No existing task — we are the first; create the broadcast channel
        let (tx, _rx) = broadcast::channel(16);
        self.in_flight.insert(key.clone(), tx.clone());

        // Execute the single database query to ScyllaDB
        let results = self.scylladb.query_messages(channel_id, before_id).await?;

        // Broadcast to ALL waiting subscribers at once
        let _ = tx.send(results.clone()); // every subscriber wakes up
        self.in_flight.remove(&key);      // clean up the in-flight tracker

        Ok(results)
    }
}

The SuperDisk: Custom Hardware for Cloud Durability

ScyllaDB is optimized for but in cloud environments NVMe is ephemeral — a node restart wipes the disk. Discord engineered a custom RAID 1 configuration they called the Superdisk: writes go to both fast local NVMe and slower persistent network-attached storage simultaneously; reads prefer the NVMe for speed. This gave them NVMe-level read performance with cloud-level data durability.

Zstandard Compression: 50–60% Disk Reduction

Alongside the database migration, Discord enabled Zstandard compression on their ScyllaDB tables. Message data compresses extremely well. The result was a 50–60% reduction in raw disk usage compared to uncompressed Cassandra storage — effectively giving each physical node far more useful capacity at zero hardware cost.

THE VALIDATION STRATEGY

Discord ran automated correctness validation throughout the migration by sending a small percentage of reads to both databases simultaneously and comparing results. Only when reads matched across Cassandra and ScyllaDB was a partition considered successfully migrated. This shadow-read approach caught data inconsistencies without any user-visible impact, and gave the team confidence to flip the cutover switch as a single atomic event rather than a long, hedged transition.

Cassandra vs ScyllaDB at Discord: Before and After Migration

Cassandra vs ScyllaDB at Discord: Before and After Migration
Metric	Cassandra (Before)	ScyllaDB (After)
Cluster Nodes	177	72
Disk per Node (avg)	4 TB	9 TB
p99 Read Latency	40–125ms (variable)	~15ms (stable)
GC Pauses	Frequent stop-the-world	None (C++, no GC)
Hot Partition Risk	High — no coalescing	Mitigated by Rust data services
On-Call Toil	Weekly node babysitting	Dramatically reduced

Architecture

Before the migration, Discord's message write and read path ran through a monolithic API server directly into the Cassandra cluster. There was no intermediary — every user action that read messages translated directly into a database query, with no protection against fan-out or . The API server held connection pools to Cassandra, handled for message pagination, and relied on Cassandra's own internal mechanisms (memtable, SSTables, compaction) to handle read pressure. Under normal load this worked. Under peak load — a major announcement, a viral moment, a World Cup Final — it did not.

Before: Direct API-to-Cassandra Architecture (Hot Partition Risk)

SHARD-PER-CORE ARCHITECTURE

The fundamental reason ScyllaDB handles concurrent reads so much better than Cassandra is its shard-per-core architecture. Each CPU core is assigned its own exclusive slice of the data and handles all requests for that data without coordination with other cores. In Cassandra's JVM-based model, all threads compete for heap memory under a single garbage collector. In ScyllaDB's C++ model, each core is an independent actor: no cross-core locking, no GC, no stop-the-world. When one partition gets hot, it affects only the core assigned to that shard — it cannot cascade to neighbors.

Consistent Hashing: Routing Channels to Service Instances

Each Rust data service instance is responsible for a deterministic subset of channel IDs via . This means if 1,000 users simultaneously load the same popular channel, all 1,000 requests arrive at the same service instance and collapse into one database query.

After: Rust Data Services + ScyllaDB Architecture (Hot Partition Mitigated)

Why Rust for the Data Services Layer

Discord chose Rust for data services because it offered C-level throughput with memory safety guarantees that prevent entire classes of concurrency bugs common in C++ — exactly what you want in a layer handling millions of concurrent subscribers. The Tokio async runtime gave them non-blocking I/O without the GC overhead that had plagued their Cassandra setup. As Bo Ingram noted with characteristic candor: it also let them say they rewrote it in Rust.

Lessons

Discord's migration took years of preparation and nine days of execution. The long preparation was not waste — it was the reason the execution was clean. The lessons here are as much about sequencing and courage as they are about database choice.

What to remember

Migrate your riskiest system last, but don't use that as an excuse to never migrate it. Discord deliberately kept the messages database in Cassandra for two years after migrating everything else, using that time to build ScyllaDB expertise on less critical workloads. However, they committed to a hard deadline once operational confidence was achieved — avoiding the trap of indefinite deferral that plagues many large migrations.
is a force multiplier against hot partitions that no amount of database scaling alone can provide. When you have popular content that thousands of users read simultaneously, add a coalescing layer between your application and your database — the reduction in query fan-out is often more impactful than hardware upgrades.
Rewrite your migration tooling if the estimated duration is unacceptable. A three-month migration estimate is not a constraint — it's a scope definition that you can change. Discord's one-day Rust rewrite of the migrator turned a three-month project into nine days, enabling a simpler single-flip cutover instead of a complex phased approach. Always ask: what would it take to make this ten times faster?
are a predictable, structural problem in Java-based databases at high concurrency — not a tuning problem you can engineer your way out of at Discord's scale. When your on-call team spends more time maintaining a database than improving it, that's the signal to evaluate architecturally different alternatives, not just different JVM flags.
Run shadow reads for data validation before any large-scale cutover. Sending a percentage of reads to both old and new systems simultaneously — and comparing results automatically — gives you objective confidence that your migration is correct without user-visible risk. This pattern is applicable to any database migration and should be standard practice before any atomic cutover switch.

The World Cup Validation

The 2022 FIFA World Cup Final was Discord's unplanned load test — and the system passed cleanly. Every goal, every save, every penalty created message spikes across thousands of servers simultaneously. The combination of ScyllaDB's shard-per-core architecture and Rust data services coalescing kept latency flat through all 120 minutes plus penalties. No hot partition alerts. No on-call pages. No post-match war rooms.

SHADOW READ VALIDATION

Discord's validation strategy during migration was elegantly simple: send a small percentage of reads to both Cassandra and ScyllaDB simultaneously, compare results automatically, and flag any discrepancy. This meant correctness was continuously verified during the nine days of data transfer — not checked at the end in a tense manual review. Any database migration touching production data should implement this pattern before flipping the final switch.

They migrated four trillion messages in nine days, and the most stressful moment was the progress bar stopping at 99.9999% — because even tombstones refuse to die quietly.TechLogStack — built at scale, broken in public, rebuilt by engineers