Scott Sunarto • 2022-07-27
Guide to blockchain finality for busy people
What is finality? When you go shopping, you might have seen a sign that says, "All sales are final". If you see this sign, this implies that any purchases you make there cannot be refunded or reversed. At its core, this is also what finality means in the context of blockchain. Essentially, it's a point in time where a blockchain transaction performed cannot be reversed. What is not finality? A transaction being included in a block is NOT NECESSARILY finality. This is a very common misnomer that blockchain thought-leaders make on Twitter when arguing how fast their blockchain is. When is finality? Now, we arrive at the tricky part. How do we know when our transactions on the blockchain are final? Unfortunately, there is no one correct answer: it depends on the blockchain's construction. This is what we will be diving deeper on the next section.
Let's say you go to a coffee shop and pay for your morning Latte with a credit card. Is your credit card payment really final? No! Technically, you can chargeback :) However, what are the odds of a customer doing a chargeback all the time for a small purchase? It's very low. This is why the coffee shop doesn't have to hold you back and ask you to wait until the charge is truly finalized. This is how finality in Nakamoto Consensus (PoW) based blockchain like Ethereum and Bitcoin works. To understand this better, we need to introduce the concept of "fork choice rule". The fork choice rule is how blockchain clients determine the correct version of the blockchain when a "fork" happens. In the Nakamoto consensus, the fork selection rule dictates that the longest chain is the canonical blockchain. Now, you might realize a subtle problem: what if someone races the network to create a longer chain that has all the transactions reversed? Whoops, we just invented the 51% attack, which resulted in a "reorg" of the chain. Note: reorg don't only happen during 51% attack. 1-2 block reorgs happen daily on Ethereum due to various reasons such as network latency or unintentional block production race conditions. This might sound a bit problematic. Luckily, the further back in blocks a transaction is included, the probability of someone being able to reverse the transaction goes down! This stems from the fact that you would have to spend more time and resources sustaining the >51% attack to overtake the current chain's block height. Trivia: This is why centralized exchanges (CEX) wait for X confirmation before your balance is deposited to your account. However, they only need to do this with chains that have non-instant finality, bringing us to the next flavor.
In contrast to the Nakamoto Consensus, Practical Byzantine Fault Tolerance (pBFT)-based consensus like Tendermint has instant finality. The transaction is finalized the moment it is executed; you can't convince an online* full node to reverse it. Note*: The keyword here is online. Most proof-of-stake blockchains have weak subjectivity that requires them to "trust" a peer when going online for the first time or after an offline period. This discussion deserves a separate blog post. pBFT-based consensus relies on 2/3 of the validator set attesting the proposed block; if >1/3 byzantine participants exist, the blockchain will grind to a halt. However, with 2/3 byzantine participants, you would be able to force a malicious block into the chain. Note: As a general rule of thumb, non-faulty full nodes will continue to be able to reject these malicious blocks and a social consensus driven fork can be executed. Unfortunately, pBFT comes with non-negligible overhead as the size of the validator set increases. Concretely, as the size of the validator set increase, the network roundtrip required to obtain the 2/3 attestation to finalize the block will also increase linearly. As a result, each blocks will take more time to finalize. This is why there is a practical limit to how decentralized a blockchain with pBFT-based consensus can be. For instance, most Tendermint based blockchains only have 100 - 150 validators; this is done to strike a balance between time to finality and decentralization. Here's the most important part: a smaller validator set doesn't necessarily imply lower security. As a matter of fact, pBFT prioritizes safety over liveness (Tim Roughgarden's lecture on CAP Theorem explains this in further detail). In case of failures, the pBFT consensus will halt instead of operating under fault. In contrast, the Nakamoto Consensus will continue to operate under fault (i.e. network partition). Something to think about: why is liveness preferable over safety for something like Bitcoin?
Ethereum's shift away from proof-of-work will also transition from probabilistic finality to absolute finality. However, in contrast to pBFT's instant finality, Ethereum 2.0 opt for a "slower" finality to support a larger validator set. In contrast to a pBFT-based consensus like Tendermint that requires 2/3 attestation at every block, Ethereum 2.0 uses a finality gadget called Casper FFG to provide finality for every 64 slots (32 slots x 2 epochs), which comes down to around 13 minutes.
With the rise of excitement in rollups, I can't possibly write this without including them! First and foremost, it's important to realize that a single sequencer rollup (which is what we have now) is more or less a single centralized server; this is what makes them so fast. That said, what allows them to inherit the security of Ethereum is the mechanisms that ensure the correctness of its state transition, which comes in two flavors.
In Optimistic Rollup (ORU), the sequencer is assumed to be correct unless proven otherwise through a fraud-proof. If someone finds that the sequencer misbehaves, they can submit a fraud-proof within the challenge window, and the rollup's state will be reverted. Currently, ORUs such as Optimism and Arbitrum have a challenge period of 1 week; this is what determines its period to finality.
For some reason, there has not been a lot of conversation with regards to time to finality with ZK Rollup although I find it to be fairly interesting. In ZK Rollup (ZKRU), the time to (verifiable) finality is determined by how fast the ZK prover can generate the proof for the transactions, commit the state update, transaction data, and proof on the base chain. A non-obvious observation here is that, since this process also involves writing to the base chain, we also have to add the time to finality for the Ethereum base chain on top of it as well. Currently, ZK proving is extremely slow, albeit highly parallelizable that can lead to significant speed-up; this is the main bottleneck with ZKRU right now and it is currently subsidized by venture funding to bridge the gap. Eventually, as ZK proving technology matures, the bottleneck for finality in ZKRU will eventually shift from proving to the finality of the base chain (in this case, Ethereum). It's important to note that while the sequencer theoretically should have no problem performing under heavy transaction load, the ZK prover might struggle to keep up. Unfortunately, this means that the time to finality will baloon during congestions; this is why it is crucial to have a proper fee market in ZKRU that takes into account the economics of ZK proving. Otherwise, it will be like Solana 2: Electric Boogaloo.
The biggest reason why finality matters is user experience. Imagine depositing your crypto into a CEX or using it to pay for coffee… If a certain blockchain has a long time to finality, you might have to wait longer than you want. That said, users and application developer can always agree on a compromise and settle on a "good enough" point for finality. For example, an optimistic rollup might not reach finality for 1 week, but an exchange might want to settle a deposit earlier if they believe that the sequencer's state transitions are valid. Note: This can also be dangerous! If an optimistic rollup rollback after the exchange have settled significant amount of deposits, the exchange is going to run fractional. Be cautious of this practice! On top of that, bridging also relies heavily on finality. This is because bridges must wait until they are confident that the initial bridge call will not be reverted before performing the corresponding action at the counterparty chain. With rollups, its (current) long time to finality becomes a hurdle for users who wants to bridge to the base chain. This significantly affects the practicality of application-specific rollups, where it is assumed that you will often have to bridge to another chain/rollups. While you can mitigate this with economic bridges (i.e. Hop), it is far from being a silver bullet (tl;dr reliance on economic value + liquidity fragmentation). In contrast, Cosmos blockchains that benefit from Tendermint's instant finality enjoy a seamless bridging experience, making them a good home for application-specific blockchains. Last but not least, finality is also vital for MEV reorg resistance! (For more detail, checkout Saneel Sreeni's MEV research post).
Thanks to Breck Stodghill, Eddy Lazarin, Amir Bolous, Divya Gupta, DC Posch, Dev Ojha, Jacky Zhao, Žygimantas, Josh Stark, and the ETH University community for feedback on earlier versions of this post.
./smsunarto · Veritas Vos Liberabit