Decentralization, Double Time

Autor

Justin Rice

Fecha de publicación

Roadmap

Stellar concensus protocol

Developer

This blog is part of an ongoing series on the progress made on SDF's 2025 roadmap and the key initiatives we’re driving to improve usability, expand adoption, and strengthen the Stellar ecosystem.

There's a Q4 milestone on the Stellar Development Foundation (SDF) 2025 roadmap that may seem a bit obscure, but that has big implications for the health and decentralization of the Stellar network:

2x Tier 1 Resilience

We currently have 7 tier-1 organizations that allow the network to remain operational with 2 failures. To strengthen open governance and resilience, we plan to add 6 more through improvements in overlay and consensus, reaching 13 by the end of 2025.

Right now, Stellar can sustain the operational failure of 2 organizations; by the end of the year, the goal is to double that. What does that mean? Why now? Why should you care? Read on to find out.

Reputation At Stake

Tier-1 organizations are teams building on Stellar who run validators that, to paraphrase the developer docs, bear the safety and liveness of the Stellar network on their shoulders due to the fact that most other organizations require agreement from them. To understand what that means, it's helpful to understand how validators work in blockchain in general, and how they work in Stellar specifically.

Validators are connected servers that message each other, keep a common ledger, and work in sync to apply transactions to update that ledger. Their interactions are recurring and programmatic, and so the software they run implements rules designed to prevent theft, cheating, and double spending, as well as protocols intended to enforce those rules. On some periodic cadence — which varies from blockchain to blockchain — a protocol selects a validator to assemble a block of transactions, which it then proposes to other validators. They check its validity, ratify it, and apply it to update the state of the ledger.

In proof-of-work — the protocol Bitcoin uses — the right to add blocks goes to the first validator to complete a brute-force math problem, along with a network token reward. You prove you did the work to solve the problem, you propose a block of transactions that follows the rules, it gets accepted by other validators, you get a bitcoin. In proof-of-stake — the protocol most blockchains use at this point — the protocol selects a validator at random to add a block, but to be eligible each validator has to lock protocol tokens in reserve. When a validator is selected and proposes a block that gets accepted by other validators, it's rewarded with protocol tokens. If it fails to propose a valid block, the protocol slashes staked tokens as a penalty. Both proof-of-work and proof-of-stake rely on financial incentives to encourage good behavior: they assume that the combination of rewards and penalties make it more worthwhile to follow the rules when producing a block than to break them.

Stellar is different. In the Stellar Consensus Protocol — which is a proof-of-agreement protocol — each validator keeps a list of the other validators it trusts, and only interacts with the validators on that list to propose and accept blocks. To create a new block, validators exchange messages with the other validators on their list to share candidate transactions, select a leader to assemble a block from those transactions, and run that block through a series of votes before ratifying and accepting it. The more a validator appears on other validators' lists, the more it is considered when proposing and accepting blocks, the more sway it has over the network.

Crucially, proof-of-agreement requires humans to configure their validators to trust other validators, which they do because they trust the humans who run those validators. To make that possible, Stellar validator operators publicly identify themselves. The real-world reputation of the operator, which is usually an organization doing business on Stellar, is what's at stake: if they don't follow the rules, everyone knows. That's why we talk about tier-1 organizations: on Stellar, a validator manifests an organization's identity.

There are great advantages to proof-of-agreement — it doesn't burn energy like proof-of-work or require up-front funds like proof-of-stake; it doesn't allow anonymous participation, so it's resistant to Sybil attacks; it doesn't rely on financial incentives, so it sidesteps situations where cheating is more lucrative than playing by the rules — and you can read all about those in the recent Stellar's Security Edge post. But for now, let's see what those trust configurations actually look like so we can understand what needs to change in order to reach the 2x goal.

The Network Present

Validator trust configurations are public, and you can view them — as well as a graph of the connections between validators — on network explorers like Raar or Stellarbeat. Here, for instance, is the trust configuration for SDF's validator nodes:

Quorum set with threshold 5
- Blockdaemon Inc. with threshold 2
- Stellar Development Foundation with threshold 2
- SatoshiPay with threshold 2
- Franklin Templeton with threshold 2
- LOBSTR with threshold 2
- Creit Technologies LLP with threshold 2
- Public Node with threshold 2

Currently, SDF includes 7 organizations in our quorum set (including ourselves), and our validators are configured to accept a block if any combination of 5 of those orgs also agree to accept it. Note: each of those organizations runs 3 validators for redundancy; any 2 suffice to represent its agreement. On the flipside, that means SDF validators will not be able to accept a block if 3 of those orgs are unavailable or disagree, at which point they halt. As configured, in other words, SDF has a fault tolerance of 2 organizations. What about the network as a whole?

The Stellar network as a whole is a cluster of individual validators — each of which has a configuration like the one above — that coheres because those configurations have validators in common. One of the key concepts behind the Stellar Consensus Protocol, which the SCP Research Paper calls "the Internet hypothesis," is that network coherence emerges naturally because network participants connect to interoperate and do business. As the Internet "which everyone understands to mean the single largest transitively connected IP network" is derived from a common desire to transmit information, so Stellar is derived from a common desire to transmit value. To use the network, participants find a way to connect to other participants; in finding a way to connect, they create the network.

Currently, the network coheres around the same 7 organizations SDF has in our quorum set. In some form or fashion, those are the organizations that appear in each individual validator configuration, which means each individual validator requires sufficient agreement from those organizations when adding blocks. They are the points of intersection from which the network is derived. They bear the safety and liveness of the Stellar network on their shoulders due to the fact that most other organizations require agreement from them. They are the tier-1 organizations.

And right now, the tier-1 organizations limit their quorum sets to one another, which means that they don't require input or agreement from additional orgs to add a block. As a result, the network as a whole has the same fault tolerance as the SDF validators: if 3 of the tier-1 orgs go down, the network halts. The roadmap goal is to 2x that fault tolerance, and we'll get into how to make that happen, but first, it's helpful to understand how we got where we are.

The Network Past

In the first few years of Stellar, there weren't many resources or tools for validators. Discovery wasn't optimal — there was just a list in Github people could add their validators to — and while there was an admin guide in the docs to help you install Stellar Core and join the network, there wasn't much instruction about how to think about quorums, or how to construct a reasonable quorum set. As a result, validator quorum sets were best guesses configured with fingers in the wind, and there was really only one organization common to them all: SDF. SDF was the single point of quorum intersection, which also means it was a single point of failure. If SDF went offline, the network would halt.

Given that one key advantage of blockchain is that it doesn't have a single point of failure, it was pretty clear to anyone who was paying attention — which, granted, was a pretty small group at the time — that the overreliance on SDF wasn't ideal. In 2019 SDF began to coordinate with validators to improve network resilience. We connected key ecosystem participants who were running validators and got them talking, and they began to include one another in their quorum sets. By 2019, 4 organizations in addition to SDF emerged as capable validators: Satoshipay, Lobstr, Coinqvest, and Keybase. We started to socialize the recommendation to adjust quorum sets to include those 4 orgs.

Things got off to a rocky start — on May 15, 2019, before the new quorum configurations could really settle, a few of the freshly recommended orgs took down their validators for maintenance at the same time, which caused the network to halt for 67 minutes — but with a bit of coordination, collaboration, and hustle, we managed to stabilize things. We updated Stellar Core to automate parts of quorum configuration, released tools for network monitoring and quorum intersection checking, created a dedicated channel to communicate with orgs in the SDF quorum set, and defined the operational requirements for Tier 1, including the requirement that tier-1 orgs run 3 validators for redundancy. Everyone started following a common process to identify their validators, and Stellarbeat decentralized validator discovery. And as a safety precaution, the newly anointed tier-1 orgs agreed to limit their quorum sets to one another.

By January, 2020, Tier 1 had increased to 7 orgs, and though there have been a few substitutions over the years as organizations came and went, that's where it's been ever since. The network hasn't halted again — even when SDF validators had issues for a day — and validators have successfully processed tens of millions of ledgers, applied billions of operations, and upgraded the protocol 11 times.

The Network Future

Despite that stability, however, there is a prevailing sentiment in the ecosystem that the current network with 7 tier-1 validators and a fault tolerance of 2 orgs isn't robust enough. SDF agrees! There should be more high-quality validators and greater fault tolerance!

How many more and who they should be, that's what we still need to figure out. It's difficult to measure decentralization on Stellar — it's secured by reputation, and a bunch of sketchy validators no one trusts don't benefit the network at all, while a single reputable organization everyone trusts does — and it's not yet clear whose participation as a validator will suffice to secure the billions or trillions in value a thriving ecosystem will require. But it's about time to think all of that through, and to start sketching out the possibilities. And while we do, we can also take steps in what is surely the right direction. We can double the current fault tolerance, and take it from there. Hence the 2025 goal to 2x Tier 1 Reliencel.

In order to do that, there are tasks to tackle, some of which are already complete, some of which are underway. Specifically, we need to:

Make technical changes to validator software (complete!)

Validators exchange a ton of messages to propose and ratify blocks, and to do that, they use a piece of Stellar Core called the overlay. It's a peer-to-peer network that connects validators and propagates transactions, blocks, and consensus votes. Tier-1 organizations have a lot to communicate to one another, so they hit the overlay much harder than non-tier-1 validators. Until recently, the overlay wasn't really prepared for the additional stress doubling Tier 1 would create.

But due to overlay changes introduced in Stellar Core v22.1.0 and performance improvements introduced in Stellar Core v22.3.0, it is!

Run tests to make sure growing Tier 1 doesn't introduce latency (complete!)

How do I know that the overlay is prepared for the additional stress doubling Tier 1 will create? Because we tested it. Before moving ahead with a plan to radically increase network participation, we modeled the effects in a laboratory environment. We used something called Supercluster to simulate the effects of doubling Tier 1, and the results showed that ledgers should continue to close without introducing too much latency. If you want to play around with Supercluster yourself, it's here.

At this point, given the improvements to the overlay and the successful tests, there are no technical blockers. Woohoo!

Build a pipeline of qualified candidates (complete, and forever underway)

This time a year ago, there weren't many organizations that were serious about running validators. But the advent of smart contracts on Stellar and subsequent growth of the ecosystem encouraged a crop of new projects to spin up validators, and based on casual observation, at least 10 of them meet the tier-1 operational requirements.

Yes: SDF will continue to encourage network participation, and will work with organizations interested in running validators to help them onboard, attain, and maintain greatness. But there are enough candidates right now to unblock the goal!

Explain how SDF chooses to add organizations to its quorum set (underway)

That said, meeting operational requirements is not sufficient for SDF to add an org to our quorum set. We actually need to evaluate them against business criteria to understand if we trust them. Who is the org? Are they legit? Capable? Mission-aligned? Does adding them improve the health and decentralization of the network?

As we started thinking about those questions, we realized that we hadn't yet formalized the process we use to add orgs to our quorum set, and so we decided to write it out, and share it with the rest of the ecosystem. The goal of doing that is to make our motivations clear, and to offer a model to other orgs who also need to make quorum-set decisions. Read all about it here.

Add qualified candidates to SDF quorum set (next up)

Now that we are technically unblocked and have a pipeline of candidates and a method for evaluating them, we can conduct a review, and add any that qualify to the SDF quorum set. When we do, we will also check in with other validators, and specifically with other tier-1 orgs. Ultimately, each will need to decide whether to add new orgs to their quorum set, and we're not sure what those discussions will be like, or where we'll find (offline) agreement. It will be exciting to engage! And we will keep you posted as we do!

Some final words: the point of all of this – the technical improvements, the explicit process to evaluate candidates, the overall goal to 2x tier-1 resilience, even this blog post itself — is to help improve the health and decentralization of the network, which is important in order to futureproof the network so it can handle continued usage and growth. At SDF, we want the network to secure enough value and process enough transactions to achieve our mission of creating equitable access to the global financial system, and that's a pretty tall order. This work will be ever ongoing. The 2025 goal is just the next step. Ready to take it? Here we go…