The size of the Stellar network is growing. Whereas in 2018 the Stellar network hosted just 2.6 million ledger entries, it hosted 43.4 million last year. By the end of 2023, we expect to see more than 300 million hosted on the ledger.
If you’re in the Stellar ecosystem, you already know this. In fact, you’re probably interested in Stellar not because of what our blockchain can do, but because of what it already does. Stellar has a proven track record of functionality, and more people are using it every day.
But soon the network will grow too big for the house that it grew up in. While Stellar’s database, currently implemented by a primary key-value store and a secondary Merkle structure, is sufficient today, it cannot scale with the rest of the network. The situation will only get more dire with the forthcoming mainnet launch of Soroban, which will increase both the size of ledgers and the amount of them that needs to be stored.
Stellar can’t go on with such a tedious system. That’s why Stellar will be using a new, first-of-its-kind storage mechanism that will use just one data structure, instead of two.
It’s called BucketListDB, and was first announced at our annual Meridian conference in Rome last year. Watch a presentation on it from SDF engineer Garand Tyson here:
If you’ve ever worked with a blockchain database before, it most likely had two data structures: a “Merkle structure,” which produces hashes, and a key value store for looking up entries. While Merkle structures produce hashes quickly, they aren’t very good at searching for entries. Key-value stores, meanwhile, can’t produce hashes but offer quick lookups. This means keeping a copy of the entire ledger in two places on disk, one in a key-value store and one in a Merkle structure.
Merkle trees are by far the most common Merkle structure used in blockchains today, but there’s a catch: Merkle trees, for all their benefits, are really slow when adding and modifying ledger entries. As in, can’t-get-faster-than-200 TPS slow, according to experts. As an organization focused on practical use cases that solve real-world problems, we believe that Stellar needs to be able to process far more transactions, far more quickly than that. Humanitarian aid can’t be disbursed at 200 TPS, nor a high volume of international remittances, nor most other critical money movements in the real world.
Stellar doesn’t use Merkle trees, but instead a different Merkle structure for creating hashes called a Bucket List. Validators review the Bucket List to compare hashes with other validators and if any validator’s hashes don’t match, they can download the missing files from a peer. When designing Stellar, we preferred Bucket Lists to Merkle trees partly because they’re temporally structured and record changes to entries rather than storing the entries themselves. This means that if a node goes out of sync, it can catch up by reviewing just the few hours it went offline, comparing its records to neighbor nodes to see what’s out of date.
Additionally, while it might take a Merkle tree hundreds of disk operations to update a single entry, updating an entry on the Bucket List is entirely in-memory and doesn’t require any disk operations. This is more efficient than a Merkle tree, which contains many interconnected hashes that all must be updated at the same time.
In addition to using the Bucket List for hashing, Stellar has historically used a SQL key-value store to search for ledger entries. It’s worked so far, but could be more efficient, because it’s not perfectly suited to Stellar’s disk read/write patterns. For one, to get a change logged in the Bucket List copied over to SQL, Stellar has to perform an excessive amount of disk write operations.
The versions of SQL Stellar supports are also ACID compliant, which keeps read operations from interfering with writes. This is unnecessary for Stellar because it already executes read and write operations at different times during consensus, and thus doesn't run the risk of performing any conflicting read and write operations simultaneously. In other words, ACID compliance makes for unnecessary overhead that slows down our operations for no good reason.
Combined, these two data structures create one more inefficiency: because all ledger entries are redundantly duplicated on the disk, once in the Bucket List and once in the SQL key-value store, Stellar uses up twice the disk space necessary. While SQL databases work well for traditional centralized applications, they are not very fast when it comes to the access patterns of decentralized blockchains.
We want Stellar to validate transactions faster. So instead of having two data structures, Stellar will combine them into one which is both more efficient at searching and hashing than what it used before: a searchable Bucket List called BucketListDB.
In a Bucket List, the main thread writes entries to the top bucket which lives entirely in memory. When the level is filled, it spills into the next level and merges with that larger bucket:
Bucket Lists, however, are hard to read. In order to find which level an entry is in, the validators would have to search through every level. Even if the Bucket List did know what level an entry was in, validators would still have to search through the level itself — and it could be several gigabytes large.
To fix this problem and make the Bucket List a more efficient key-value store, BucketListDB adds two types of indexes: a bloom filter and bucket index. A bloom filter, which is a type of lightweight, low-memory set, helps us identify which level an entry is in. Once the bloom filter knows what level the entry is at, BucketListDB uses the Bucket index to find where our entry is within the level.
It sounds complex, but the result is simple: the bigger the index, the faster reads will be. In order to not use too much memory, configure memory requirements as needed — configuring it to the same requirements as postgresql is likely optimal.
All in all, BucketListDB provides some clear improvements compared to the Bucket List and SQL key-value store system that Stellar has been using. Disk usage decreases by more than 45%, read speeds increase 400%, and validator startup time is slashed in half.
If you’re operating a horizon instance with captive-core, BucketListDB is already turned on. But if you’re running a validator node, you’ll see BucketListDB hidden behind an experimental flag.
Go ahead and turn it on if you want to get used to it early. It will turn on for all nodes on the Stellar network with a forthcoming update, and would love your feedback in the meantime.