Primary Topic
This episode focuses on the future developments and ongoing improvements within the Solana network, particularly addressing its networking and economic aspects.
Episode Summary
Main Takeaways
- Significant Enhancements in Network Performance: Recent patches have improved the stability and efficiency of the Solana network, addressing critical bugs and optimizing resource management.
- Economic Models Adjustments: The discussion highlights the shift towards more dynamic economic models to manage resources and transaction costs effectively.
- Future Network Innovations: Anticipated developments such as asynchronous execution and increased block space are expected to significantly boost network throughput and reduce transaction costs.
- Challenges in Stake-weighted QoS: Despite improvements, the stake-weighted QoS system still faces issues with reliability and fairness, necessitating ongoing adjustments.
- The Role of Economic and Network Design in Scalability: The integration of better economic strategies and network designs is crucial for scaling the network while maintaining security and decentralization.
Episode Chapters
1. Introduction
The hosts introduce the guests and set the stage for a detailed discussion on Solana's current state and future. Anatoly Yakovenko: "We're continuously working on enhancing the network's capacity and efficiency."
2. Network Improvements
Discussion on recent network improvements and the impact on Solana's performance. Lucas Bruder: "With the new patches, we've seen noticeable improvements in handling network congestion."
3. Economic Strategies
Exploring the adjustments in economic models to better handle transaction costs and network resources. Anatoly Yakovenko: "Adjusting economic models is crucial for maintaining efficiency as we scale."
4. Future Directions
Insights into upcoming technological innovations and their potential impact on the network. Lucas Bruder: "Asynchronous execution could revolutionize transaction processing speeds."
5. Closing Remarks
Summary of the discussion and final thoughts from the hosts and guests. Mert Yeter: "Today's discussion sheds light on the complex, yet promising path Solana is navigating towards scalability and efficiency."
Actionable Advice
- Monitor Network Upgrades: Stay updated with the latest Solana network upgrades to optimize your use of the network.
- Evaluate Economic Impacts: Regularly assess the economic implications of network changes on your transactions and investments.
- Participate in Governance: Engage in Solana’s governance to influence future developments and improvements.
- Leverage Improved Technologies: Utilize the latest Solana technologies to enhance your applications’ performance and security.
- Plan for Long-term Investments: Consider the potential long-term benefits of Solana’s evolving infrastructure in your investment strategies.
About This Episode
Gm! This week we invite Anatoly Yakovenko & Lucas Bruder to the show for a discussion on the current state of Solana. We deep dive into the transaction problem as on chain volumes have exploded in recent months. Toly & Lucas explore the solutions that could help to increase chain performance & what's next for Solana. Enjoy!
People
Anatoly Yakovenko, Lucas Bruder, Mert Yeter, Dan
Companies
Solana, Jujuto Labs
Books
None
Guest Name(s):
Anatoly Yakovenko, Lucas Bruder
Content Warnings:
None
Transcript
Anatoly Yakovenko
Kind of the end state. What I think Solana should get to is that you, as a validator, you have very, very reliable timing and all the execution. And once you have that, that means you can reduce block times from 400 to 200. Confirmations gets faster. Everything gets faster.
Lucas Bruder
In my mind, like, the end state is, like, something that Toli said where, like, you can double block space or increase block space. Our servers are running at, like, 20% cpu, so, like, there's no reason why you can't double or triple that. Before we get moving on today's episode, just a quick disclaimer. The views expressed on this podcast by either myself, my co host, or any guests are their personal views and do not represent the views of any associated organization. Nothing in the episode should be construed or relied upon as financial, technical, tax, legal, or any other advice.
Mert Yeter
Okay, let's jump into the episode.
All right, everyone. Hello, everyone. Welcome back to another episode of Lightspeed. We are joined today by Anatoly, the co founder of Solana, and Lucas, the CEO of Jujuto Labs, who is building infrastructure and product in and around the Solana ecosystem. Gentlemen, thank you both for joining us today.
We wanted to get you on to really just have a broad discussion on the state of Solana, but really zoning in on, like, what. What were the issues we've seen over the last three months? What has been done to kind of patch and iterate on those problems? And then where are we going from here? Like, what are these the open ended problems that we still need to think about?
And so broadly, I think that kind of buckets into two issues. One idea is focused around the networking and engineering side, and another one is more of like, a philosophical one around the economic problems. So I want to jam on both. But, Anatoly, maybe I'll kick things over to you first, if you can just walk us through over Q one. What were the highlights and the lowlights of the quarters, as far as it pertains to chain performance?
And really just start with a high level summary of what the problems that we faced were. I think on the highlights, I think the issues that were there before, like the runtime, taking a really large variable amount of time to execute, blocks being wonky in terms of when they broadcast and stuff like that, those have been solved, and you can tell because you can look at block times and replay, and all this stuff has been really steady and more or less, the compute budget still isn't perfect, but it more or less works as intended.
Anatoly Yakovenko
What started happening is you started seeing congestion around submission, like a user starting to submit a transaction. And there were a bunch of theories around why that was happening. And I think at the time, there were a bunch of folks that were advocating like Eugene from ellipsis and stuff for an economic solution to this, which meant that you increase effectively the minimum floor price to submit a transaction to the point to where spam backs off. And what turned out was that Ben from margin was such an advocate against it that I don't know if it was his gut instinct or whatever, but he really, really took the time to figure out that the underlying issues were, was actually that stake related Qos was busted due to bugs in how the server uses quic and stuff like that. And that was effectively the connection table and all these other really, like a collection of ten different bugs that are like performance issues and stuff like that that are causing the quick server to fall over and not behave in the way that it's supposed to.
And when you have bugs like that that are really subtle and kind of aren't there all the time, but only occur during particular moments at random almost. It's very, very hard to connect that to specific activity that's happening. But, and it's very easy to start trying to figure out, like, you start with the assumption of, oh, there are no bugs, but there's still problems, let's find a solution. This is where engineers get in trouble is when you start with that assumption, there is a design and it's implemented without any bugs. But still, performance isn't what I want.
It means the design is wrong. Let's go create another design. This is where I think the economic, the economists in the Solana ecosystem, I think are maybe starting at the wrong foot right now, is that there's just underlying bugs that are pretty obvious now to all the engineers involved in the networking stack. Once those are fixed, I think then we can kind of go revisit. There is probably economic issues, too, that can be improved, and then it'll hopefully be obvious what they are and be straightforward to fix them.
Lucas Bruder
Yeah, I want to chime in there. I think the, the stake weighted Qos was broken. I guess, like, there's just a lot of broken stuff in the Qwik server. I think it was impacting everyone. Ideally, stake weighted Qos would have worked more, but it seems like the, there's definitely been an increase in performance in the QWik server with some of the newer code that's being tested and, yeah, now it's like you got to break the system down into pieces.
Like what is the pipeline. Where are the boxes? What are the lines that connect these things? This is like the first box in the system. It's definitely performing much better.
The economics are, I guess the economics kind of COVID all these boxes somewhat, but it seems like the first box is doing things a little better than it was before. To illustrate this problem, kind of like there's an auction, right, for block space. And on Solana it's trickier because blocks happen so quickly that you have to transmit them as you're building them. So it's not an auction like you have on Ethereum, where you have 12 seconds, you collect a bunch of transactions, and then you pick the highest ones, and then 12 seconds later do it again. You're effectively doing this in real time, and it's more like the chinese diners auction.
Anatoly Yakovenko
So that part is one part, but then there's like, think of that as the restaurant or the auction house. And then there's a line to get into the restaurant. And inside the restaurant you can have order and you can do priority and stuff like that, and you can give everyone a fair shot. But outside of the auction house, it's the streets. It's wild.
Anyone can do anything. There is no order. And if you can beat everyone up that's standing in line to get into the auction house, you can be the only one that's in the auction house, and you can take all the plates, take all the dinners. So that's kind of like the issues there is. You're trying to create an orderly way to get into this restaurant, which is the block producer and the leader in a way that is fair and open and everyone has a chance to do it.
And there are definitely, like, economics that need to be involved there because there's no way to do it for free, right? There is a resource and stuff like that. And the problem that we run into is it's really hard to debug. And if you assume that there's no bugs but the system's still not working, you want to apply an economic solution. The solution that's been applied to bitcoin and ethereum is that you raise the floor price to the point that it's so expensive to buy the cheapest thing in the restaurant that there is no line.
It's just incredibly luxury restaurant. There's no line to get into it. And it's very easy for you to show up and say, yeah, I'm willing to buy the cheapest thing you got, and it's going to be $100. So that's kind of like, I think of it almost like as napalm. You're just like, kill everyone.
And there is only like a few users that are willing to pay for it and they get great service and everything and very high reliability. But that's not a great solution, unfortunately. So with stake weighted QoS, the intention of the design is really, it's almost, people kind of misunderstand it. The goal there isn't to solve all the problems is to, you have a leader and it's got a bunch of requests coming in from the public Internet to connect to it. And those are just, I want to initiate a connection so I can send you transactions.
Not even like, here's a transaction that pays you money, and if you spam that, it's impossible. You effectively beat up everyone outside the restaurant. No one can come in because you spam the connection table to the point that nobody can connect to. So one way to solve it is you basically limit that resource and you assign certain amount of connections per stake and certain amount of throughput to each allocated stake node. So you take the leader and you effectively spread this problem to the rest of the network.
You have one box that's supposed to be managing connections and you're like, that's not scalable. Let's use the entire network to do that. Everyone that's staked on the network gets an allocated proportion to their stake, the amount of connections and packets that get sent to the leader. Now you've created some order amongst the entire network, but you haven't solved the problem of how do actually me as an end user connect to this thing and send a transaction through. This is where the goal is to let all these amazing founders like Lucas, like Mert and whatever.
Hey, I have a business. I know my customers. Now I just need to convince some percentage of the stake to go take transactions from me and send them through their orderly pre allocated bandwidth. And they can do it in many different ways. Jito is going to do it based on Nev and stuff like that and those things.
Myrrh is more developer focused. He can literally put up a coin operated thing. Hey, send me $1. You get an API token, you can submit 100,000 requests until you run out. Then all that problems of dealing with order and figuring out who gets to propose and not, it's solved at the business layers where people like Mert know their customers have a relationship to them.
And that's a different set of requirements from Lucas. And that's really the intent of stake weighted qos, is to make it easier to do that because otherwise, like, what Mert and Lucas have to do, and Lucas basically did that job, is you have to have every leader in the chain install software that then allocates some amount of bandwidth to that approach. Lucas was able to pull that off, but then if you also want to do it for developers, then you have to what? Merge Cheeto with Mert pool or whatever. It becomes an attractable, it just doesn't work.
Versus 5% of the stake is allocated to devs because Mert has a relationship with those nodes and 40% is to MeV and whatever and all that stuff can. Then there's still going to be competition. There's a fixed amount of resources, but that competition is a lot more efficient because you don't have to go install software in every box. Just convince some percentage of stake to go do X or Y or both. I'll play the other side of this just to voice CT, maybe a little bit.
Dan
Okay, so a few criticisms around stake weight were. So, one, there is a threshold that you need to surpass to become active. I believe it's something like 15,000 sol right now. And so some of the smaller teams, let's say, who don't have access to that, were upset, then there's maybe some negative externalities around. It's decentralizing.
If everybody has stakeweight qos, does nobody have it? Ultimately? And the other one was. What was the other criticism? Oh, it can't be enforced in protocol, so it kind of leads to these weird side deals.
So I guess maybe if you want to just take those one by one. And Lucas, if you have anything to. Add there, I guess one more to add is it's additional complexity for people to manage. Like, if you are like an Arb, if you're doing arb or you're market making, then, like, and all these deals are happening in private where you can override people's stake, then it's like, this is just complexity that I don't want to manage. I just want to send my transaction and it should land.
Anatoly Yakovenko
So you can kind of start with this thought experiment. You have an open, permissionless network. If you don't have stake weighted qos, an attacker can create arbitrary number of unstaked nodes and eclipse your network. So no matter what you do, when you have, like, an open, permissionless proof of stake network, you have to have some bias towards staked information versus not. Otherwise you can basically partition everyone away from everyone else.
You use an attack and create a bajillion ips and boxes, and you isolate all the stake nodes from each other, and then you can really mess things up and control all the information as it's passing through the network. So no matter what network you're building, proof of stake network, especially Ethereum or whatever, there is bias stored stake nodes that have allocated some financial stake to their identity and associated with an IP, and therefore have now connections amongst the rest of the world. So this is kind of not like new to anyone, it's just how things behave. How like other proof of stake networks handle this is they also run a mempool on top, that's a general public mempool, and they exchange these transactions between each other and they propagate the most expensive ones, and that's how they limit, that's how they create order outside the restaurant, is that I trust my set of connections, I associate connections, I bias staked ones more so than non staked ones. But you can maintain some public, some staked, and you exchange transactions between them and you propagate the more expensive ones until you have a view of the world that tends to be consistent with like, I have the most expensive transactions and I can look from those.
You can build that right now, you can go build a mempool, you can run it on top of Solana, and you can convince 10% of the stake to go be part of your mempool, and then they will forward those transactions that are most expensive ones to the current leader. And the leader will take x amount of bandwidth from them, x amount of bandwidth from something else. So the whole point to kind of stake weight Qos is to fix bugs that should have been kind of fixed from the get go. You need some bias towards connectivity, towards stake nodes, and allow these other protocols to be built, and there's no way to ever enforce them. This is why flashbots exist in Ethereum, because there's no way to enforce only one mempool.
There are always private mempools on Ethereum. That's just kind of as soon as you have economics for math and transactions and there's some price difference or economic opportunity cost to be first in a block or first before some other transaction touches state. That's a different thing than I just want to censorship resistance, I want to send a payment. Those use cases are always going to segregate and you're going to create different business models on top of each one. There's no way to really stop that or without building a completely permission chain with full control over all this stuff.
So as long as you accept the idea that these are permissionless systems, you're going to have this competition. And I think my view has always been, let's embrace it. If there is an opportunity for this competition to exist, let there be many and let people build awesome products and go optimize for it. The cost or the amount of stake that you need to participate isn't there. That minimum isn't there for apps.
It's really a minimum for the validators themselves. Because again, if you run a non delinquent validator, you need a certain amount of stakeholders to be able to pay for the vote fees, and that keeps the network somewhat stable. As fixes roll out to make that threshold lower and lower, those limits should be lower to include all the validators. So then you might be able to have even less stake to get one packet per second to a leader or something like that. There's just some practical limits around all these things, but right now the way that they're set is it covers the entire network.
So as an application, the expectation isn't that you're going to go build a validator with one soul of stake and then try to submit transactions, because that's just not practical. From just running the network perspective, the idea is that you're going to have some RPC providers, some stake pools or something like that, that work with those and then you can start managing your connectivity there. And again, people think that this is centralizing, but like when the, when you submit a transaction, you don't need 32 ETH to submit a transaction, even though that's what you need to make a block, right? You don't attribute the cost to one transaction to the cost of the minimum cost to create a block. And Ethereum is 32 ETH to be able to have it, because obviously a market exists that allows you to submit a transaction for much, much smaller than that.
And then all the stuff happens where people are financially motivated to make sure it gets into the block, even though somebody had to pay 32 ETH to actually get access to that. So think of it as that there's some network minimums that create minimum amount of stake to be able to deal with it, with allocation of all these connections and dealing with those resources. But there's going to be just simply because there's so much competition for this stuff between different kinds of business cases. The cost to developers is not going to be you need to have a 15,000 stake sold. No, it's going to be go get a connection from like Mertz vending machine.
Lucas Bruder
There's a lot to comment on there, but I guess I'll keep it a little shorter. I think stake away quality service in my mind is a short term solution. I think everyone, at least as far as I know, everyone kind of agrees with that. It's fine given like the state of things. And you know, I kind of think of it like if everyone have stake weighted quality service, then at the same time, like no one has it.
Like if you are, if you have, if I run a stake weight and everyone's sending garbage transactions, then the site, like the end state of that is kind of the same. The difference there is you have like the capacity to schedule is much larger than the block capacity. So if you have like 100,000 transactions per second that you can prioritize and the blocks only contain an average 1000 transactions, you only need 1% of your stake nodes to submit good transactions because. Yeah, right, so like, yeah, I think you're kind of assuming that at least 1% of their state connections is running like a decent business if they're selling that connectivity to somebody, right? Like there's going to be some implicit quality controls there.
In theory, yeah. But I think that looking at the mainnet beta validators channel and Solana Discord, there's definitely a lot of people that are abusing their stake to send the same transactions, a lot. And we're kind of seeing the side effects of that. I think in my mind, like there's stake weighted quality service. The quality service doesn't have to be based on stake weight.
I think on a longer time horizon it will be based on what devalators care about. They care about money, they care about the economics of the chain. How much are they making so they can pay their server bills, feed their family, are they doing things that are in the best interest of the network from a long term time horizon? I think that as far as I know, everyone acknowledges that, you know, it is what it is, it's not a long term thing. I think on a long term time horizon you'll probably have some reputation based system or some other.
You just rip out the stake weight part, you replace it with whatever term Solana marketing comes up with that basically looks at what are the packets that this person's sending me? Are they good? Are they bad? Are they landing in blocks? Are they paying a lot?
Are they succeeding or failing? Are they trying to maximize the good usage of block space? And I think that in my mind is probably one of the better end states. I don't know if I feel confident enough to say it's the final end state. But I think that as soon as.
Anatoly Yakovenko
You build that heuristic, all you need to do is to convince some portion of the stake weighted Qos nodes to run it. Yeah. And exists and it'll all the other ones. And that's kind of the whole point of like this convention, that's all stake way to Qos is convention. The leader is trying to find the most profitable transactions in the world.
And if you think that this heuristic is really, really great and you convince some percentage of the nos to do it, it's going to accept proportionally. Right. Some amount of transactions from those. And if you get 1% of stake and you guys have the best possible way to get the most expensive transactions to the leader, you'll fill the entire block. Yeah, I think we've seen that.
Lucas Bruder
Like, I guess Jita is kind of an example of this, where we're building on average less than 10% of the block and making three times the priority fees. And so there's definitely ways to make this more efficient. And like this is a first. I think it's the first reasonable pass. And like there's a lot of criticism.
I think this comes from people that haven't built startups or companies that are moving fast forward and like sometimes you just gotta like get something out the door and get it working and you know, it's not, it doesn't have to be that in state of things. Yeah, I guess maybe something that might be concerning is, I guess Tolly is a very free markets maximalist there in terms of competition, which I think makes sense. Long term is probably the only way. But maybe in the short term it's not so accessible to some people in the face of ecosystem competition. But I guess maybe just zooming back a little bit before we even get to that, maybe let's focus a bit more on the short term.
Dan
Currently we just rolled out, or we didn't answer, just rolled out the QoS patch. Now I'm not quite sure how effective it is because. Or also stopped mining around the same time. So that doesn't help. But what do we need to do?
Anatoly Yakovenko
Networking engineer my sex work or change volumes are down. There's no x are down. Yeah. So I can tell you that it fixed some tests that were completely busted, that the Anza team was able to write and simulate a failure state. And with these patches, especially the stuff that's landing in the next releases, those tests no longer blow up.
So at least there's some confidence in that it's fixing real bugs in the network, whether it's fixing all of them, is really going to be hard to tell. And the only way you know is you go like, Anza fixed all the NFT bugs a year ago. We had a bunch of inscriptions that the network didn't fall over. Like, okay, those worked. You only know, like, after you get another event.
Dan
Okay, so would it be fair to say that for the people who are going to be just listening and aren't technical in terms of short term network fixes, there's a QoS patch and then there are some upcoming patches, but it's unclear before they're actually in production to make any statements about when the network's. Performance, they clearly fix an issue. So you can go talk to the Anza folks. The reason why those patches are there is they can isolate, oh, this is a problem. And that problem is part of the overall congestion in that pipeline.
Anatoly Yakovenko
But it's hard to tell once you fix all of them what impact it's going to have, or you're effectively narrowing it down to the point until you actually do fix it, there's only finite number of bugs. Eventually they'll close all of them. Right. I just want to zoom in one more time there. We know that there's a presence of spam on the network.
Mert Yeter
I told you and I were talking on Twitter the other day, and there was 879 addresses that spammed or sent over 5000 transactions in a given day, and that accounted for 95% of the failed transactions. Right. And so these addresses are just clearly mass sending the same transaction or, you know, adding one lamport to their fee. So it's a slightly different transaction and that's their method to execution and landing the transaction. And because we know there's this presence of spam, it's creating congestion on the network and that congestion is different.
Anatoly Yakovenko
So again, you're trying to backtrack from what you see in the block. Right, right. That's not, that's actually like, you actually have to go look at the ingress packets and see what percentage of those actually have anything to do with lands on the block. Because you could have very, very, you can have a very large number of arbitrage transaction failures, but they could be a very small percentage of the ingress. Right.
So those are, those are not necessarily true like you. And that will take some more analysis to figure out what specifically is hitting the RPC nodes that's being sent to the leader to where it's doing the prioritization. We were able to clearly, I think, recognize that or was a big part of the ingress. Right, Lucas? I think that was pretty clear.
Lucas Bruder
I think that was a large part of it. People started using bundles towards the end, but there was still a lot of people that were trying to mine through the GPU. So if we know that there's like, I don't know, let's say over a given day, there's about like 800 transactions per second that are landing what is being dropped. Do we have any metrics that say, yeah, we're getting 800 in the block, but there's 100,000 that are hitting the. Yeah, there's.
Anatoly Yakovenko
The average rate that hits the leader is about 50 to 100,000, and that's when the network is stable. When it's not stable is when it gets to like 500k.
So 500,000 packets hitting the leader, you can go measure it on your own boxes. It's when things start getting wonky because you have to go process all the data. Takes a shitload of memory allocations, all those kind of hit the system, you have to go signature verify all of them and then start sorting them and stuff. And these issues arise because it's usually right when the leader is trying to make a block, and that itself takes resources. So you have to kind of manage those cores and pipelines really, really well.
What sequence QoS is supposed to do is it's supposed to limit that traffic flow. So you're not getting slam of 500,000 tps from some random connections. You get small amount of packets from each validator, and each validator sorts by priority first. Does that make sense? So each validator is receiving a whole bunch of transactions, maybe from their public port, maybe from their RPC provider, maybe from a mempool that they're participating in.
They sort them, and then they forward based on their QoS to the leader, their best known transactions, and then the leader then merges those together and says, oh, I have like x amount of packets from all of these different validators. They're all sending me the best they got. I still have to order and merge all those lists, but I can limit that. That work has now become limited and is within the amount of resources that is safe. It's no longer dealing with these massive spikes and stuff like that.
Mert Yeter
Okay, so the stake that's supposed to happen, so the stake to validators that are not the leaders, so the non leading validators are effectively being like this filtering mechanism where people will send them, our users or RPCs will send those validators to transactions. They will do some ordering and then pass their best on to the leader. Okay, so I mean, maybe to appease the economists in the audience, right? Basically, if I'm to sum this up, we first work on the networking bugs, get those correct, and then we start looking at economics based on that data. One of the things I saw recently was maybe questioning the effectiveness of local fee markets just because the accounts aren't getting saturated.
Dan
Like, the hotspots aren't maybe as hot as we would have thought. And then maybe the answer is to actually increase block space instead. That's a great problem. So, like, if you're getting, if the blocks are being fulled but the accounts are not hot, awesome. Double the block space, you're going to double revenue.
Anatoly Yakovenko
Why wouldn't you do it? Like, why wouldn't every validator be like, all I have to do is have twice as many banking threads. They probably already have the cores for that because we haven't doubled up the block capacity since launch. So, like, I'm guessing the systems are already, like, good enough to run a two x the block. So, yeah, go do that as soon as possible.
Dan
Okay. And I guess, like, from a user perspective, okay, let's say there's two users and they want to ask the same state, and one wants to pay a higher priority fee. And then there's the inherent difficulties of continuous block building and some sort of, I guess, unavoidable randomness. What is the end state of what priority fees you think look like for users?
Anatoly Yakovenko
So the goal for the whole Solana's design is to run multiple concurrent applications in a way that they're isolated from each other's negative externalities. So if you have, like, an NFT mint that's super, super hot, you don't want that to increase the fees for the entire block. Right. Or if you have a bunch of markets that are really, really hot, you don't want that to increase the fees for payments or NFT transactions. And the only reason why that would happen is if the entire chain is saturated and you're at full block capacity now, on ethereum kind of land, that's acceptable, and you don't want to increase block space, and you want to push things off the chain.
In Solana land, that is a signal to just increase block space. Go double the blocks. As soon as we're in a state where the blocks are full but accounts are not saturated, you have basically free money given to you because you have trivially parallelizable data that you just need to pass through the system. There's no hotspots all you're doing is like increasing the bandwidth of the network in a very trivial way, just adding cores, increasing whatever broadcast needs to do and stuff like that. That is like the dream end scenario is that the validators run their notes.
They see that the network is hitting just general limits, but nothing else is falling over. The only thing they need to do is add cores. And cores are cheap. Like to add a system with twice as many cores is not a huge investment in anyone's hardware. What would you say to.
Dan
Obviously in the past month, a lot of ecosystem teams have been frustrated with sharing block space with other people who maybe aren't using it, maybe shitting on the public pool is the analogy used. And they're like, oh, okay, we want to do NL two. And you know, what do you kind of think about that whole thing? Do you think that'll ever be necessary? I think they're frustrated with sharing the line to the auction house with other, with other apps, because that's where the chaos is happening right now, and it's not orderly, and there is no reliable way to get a transaction to the leader to where it's effectively in the scheduler and gets prioritized.
Anatoly Yakovenko
I would say that's probably like 99% of the issues. Because if you look at the skip rate right now is like 5%. If you submit a transaction, it should reliably arrive at the leader 95% of the time. So you should see 95% of them land and confirm within 1.6 seconds. And then the other 5% and the next leader.
Right. The probability of you landing a transaction should basically be 100% within, what is it, 3.2 seconds. But that's not happening because of this chaos right outside of kind of the leaderbox. If the network was at that state, like all the networking bugs are fixed, you have your skip rate is 5% and you're landing 95% of your transactions on the current leader. I think that would alleviate most of the problems.
Then the question is, for some use cases where they really do rely on sub like 100th of a penny kind of fees, that's where the economics for the entire network, there's trade offs there. Do validators increased block space right now to accommodate for the median fee to drop to that level? Or is that too expensive right now? Like from my perspective, right now? Double the blocks like as fast as you can.
Like there's no reason not to increase block space as long as people are willing to fill it up. Yeah, I think you still need most. Validators will probably economics to like for this, if the software is safe to increase in double capacity and things are going to be stable, I don't think anyone would really push back hard enough. Yeah, I think there will. You're trying to meet some point on the supply and demand curve.
Lucas Bruder
And right now the demand is like validators are getting 100,000 or 50,000 to 100,000 steady state per second. The actual rate is like 10% of that. Obviously there's hopefully if you change the engineering like the scheduler microstructure or whatever you want to call it, and the networking microstructure, then that will have hopefully a good impact. But I think there's going to be a point, there's always going to be a point where the supply doesn't meet demand. And right now we're seeing basically every block is filled up to 47, 48 million compute units.
And so this is happening right now, I think in my mind, like, the end state is something that Tolly said where you can double block space or increase block space to like, our servers are running at like 20% cpu. So like, there's no reason why you can't double or triple that. There's, you know, there's going to be other bottlenecks that show up, but you know, just. Just software bugs. Full, full blocks are fine.
Anatoly Yakovenko
Like what you need to be looking at is the median fee too high? Like the blocks should always be full because there's always an economic opportunity for me to send a transaction that makes a dollar one in 10,000 times because it's going to go try to do a trade. It'll fail 10,000 times except once and make a dollar. I should submit it 10,000 times. Right?
So blocks should always be full in general because there's enough markets to arbitrage that there's no reason not to fill them up. But the question is, is the median fee to land a transaction in a block, is that so high that like drip can't work or visa payments can't work? Like if you look at like right now, the median fee is 0.2 cents. The average is point is two cent. I think for most use cases, 0.2 cents even for drip is fine.
But if people want lower than just double block capacity until the median drops. Right to your earlier point, it's kind of hard to, like you are looking at what's making it on the block. We know that there's a lot of improvements to be made in the scheduler and the networking code and I guess we'll see what that fee is when stuff like with the new priority scheduler and new quick code. What about the idea of dynamic block size? I think it might have been seven layer.
Mert Yeter
Someone from the overclock team that toyed around with this idea of, I guess that kind of keeps you in the revenue maximizing train while also giving you the increased upper bound. If you, you hit, let's say every block that hits 40 million cus, then you'd add another 2 million cus to the block or something. Why wouldn't you just keep it at the safe max and just get that revenue?
Yeah, I guess that kind of plays into the dynamic base fee as well, right? If you had some dynamic base fee that was based on the block size and then I could see why you'd want to have a dynamic block size to stay revenue maximizing. But that's not the current state today, right. 5000 lambda. To increase block size, you're relying on the, on the hardware to actually be able to support it.
Anatoly Yakovenko
So you have to provision it ahead of time. Right. There's like testing and all these other kind of put like a hard limit of. This is as far as we've tested. I don't know what happens after that.
It's probably not safe to increase it beyond that. Right. Like, so you could try. I think, I think Trent said this, but every constant is a bug. And so ideally you can like, you know, obviously we're engineers and we can be optimistic in some areas and, you know, we can be a little realistic.
Lucas Bruder
But I think it was someone, maybe his caveman. Caveman, lover boy or I think Liam from fire dancer said this too. But like, I think timely vote credits should go in soon and that will basically let you know how far leaders are behind the tip. And so there's probably a, you know, this massive, massive oversimplification. But I think there's a world where there's some feedback mechanism and it's like, okay, like filling up 48.
Let's get to like, let's double it or let's increase by 10% until pieces of the network start running behind. And you can shrink it then. Yes, that's like async execution. You can basically let measure your block. You can increase block space as long as all the execution nodes that are voting on the execution hash are close to the tip because they can fall back behind like almost a full epoch.
Anatoly Yakovenko
That's like two days worth of compute. So yeah, keep increasing it. If they're close, if they fall more than like 30 minutes behind, start decreasing it. And that's kind of your safety ban. Yeah, you could do it without Async, but I think with Async, you could definitely crank it up way more.
Well, we got to ship, like they had to ship fire dancer first. Then you got to fight all the SIMD's to incrementally get there.
Lucas Bruder
Yeah. People are like, why don't you just do, why don't you do this? Or, why don't you do this? It's like, okay, we have like 40 engineers at Anza and 20 or 30 at fire dance, or however many it is. And it's like, do you fix fires now or do you, like, build for this fancy future?
And it's easy for people to say that everything should be fancy when you're not in the trenches, when you're not shipping prs. Yeah. Yeah. So actually, that's one of the questions. By the way, Tol, you mentioned the fees being 0.2, but it's actually like two orders of magnitude less than that right now in terms of median.
Dan
So just before my OCD test.
Lucas Bruder
Got community noted.
Dan
One of the questions was actually about async execution, because I don't think the information on that is very well distributed, let's say. Can you just briefly describe what that even is? Okay, so there's kind of like, I'll start with the most extreme version of this kind of the end state. What I think Solana should get to is that the quorum of the validators, the only thing they control is the fork. They pick forks, they don't actually do anything else.
Anatoly Yakovenko
They don't need to do anything else. They don't need to execute any transactions. All they need to do is to vote. And the only thing that they need to pick forks is the data for the blocks and to process the votes in those blocks. So you can actually strip away all the other functions from the stake nodes, and you have this quorum that's picked somehow, and they follow each other's blocks and they process the votes and they pick those forks, and it kind of just continues in finalizing the state.
So in that environment, you could have this quorum that some fixed size subcommittee, 200 nodes, 400 nodes, where they get their value for how much stake everyone has. It's from some snapshot that some execution people that execute the full chain computed and said, this is a snapshot. And what's cool about this is that if the quorum picks the wrong snapshot, all the honest nodes will just not follow it. They're like, well, that's misconfigured. So the network is down.
Let's go, let's go kick those validators out and find a new quorum based on the actual computed snapshot. So you kind of have this really, really weird thing to think about, is that the validators that pick blocks, they're not responsible for safety. They don't actually depend on a safe version of the state, ever. All they need to know is their stake weights. Anyone can supply those stake weights to them if they pick the wrong ones.
It's a liveness issue, it's not a safety violation. This is really, really weird to think about it, but the network can run with very, very minimum amount of information and dependencies on the full executed state. I was just going to explain another way, and maybe, I don't know if this is correct or not, but you can correct me as I go right now. When you are a validator and you're producing a block, basically you receive all the shreds in the block, you reassemble it, you execute it in the vote, I'm pretty sure is like a bank hash. So that is basically the hash of the account state at that time.
Lucas Bruder
And with async execution, you are basically not executing blocks. You're receiving the blocks and doing some very simple verification of them, and then you are saying, this block is chained on this block, and you're not signing in the account state. If you're familiar with SVM, it's actually very simple. Is the vote program is a program like all the other programs on Solana. So you don't execute any other programs, you only execute the vote program.
Anatoly Yakovenko
That's what async execution is. You just execute the vote program, which gives you the fork wait, and then you go vote right away, and some other thread, or maybe even a different box executes the whole thing and tells you the full bandcash. So you compute the vote hash, and then you go vote right away. So the amount of work that you have to do is much, much smaller. You can skip all the other programs.
The vote program itself is very reliable in the amount of time that it takes, because it just does this very, very trivial memory state computation. It's very small amount of state, so you never have to page fault. You could probably run this whole thing in an FPGA if all you're doing is receive a bunch of shreds, erase your code, reassemble, execute the vote transactions, and compute just the changes to the vote state that can fit on probably in a ten year old FPGA at this point. So that could be like, that's the salon of validators. They're picking blocks and everything else, but businesses can't run on that.
You still, like when you have an NFT listing, you have to compute the price of that listing in the market and stuff like that. So you're going to have these big RPC providers that are computing the entire thing and then figuring out what the state is and running the actual day to day operations of transacting with the chain. So that's kind of the end state of async execution. It's not that hard to build because SVM is just really, really well suited for this because it already does isolation all the way up to the developer's transaction layer. It's very trivial to isolate votes from everything else and only execute those, and to have deterministic guarantees on just that state isn't touching anything else.
My guess is that like three Wright engineers from Anza and fire dancer, if we locked him in like a room for a week with some pizza and beer, they could get it running on testnet within a week. And then like it'll take two years to like for it to go live because of the weird, all the tail end kind of effects of this. But it's not a ton of changes. It is literally that like when you do replay, only execute the vote program and then spin up another thread to go execute the whole thing and then. Talk about the advantages, like why should people be excited about it?
Well, like I mentioned, the amount of state and processing that you're doing is very, very small to keep the liveness of the network going. So you're kind of not dealing with all the funkiness of different programs taking different amounts of time and kind of blowing the compute budget and stuff like that. And then you can do really, really fancy things, like you can keep track of how far behind are your like staked execution nodes from the tip. And if they're really close, you increase, you increase the block size, if they're starting to fall back, you start decreasing it. So then you kind of start removing these constants from the system to where like as people automatically increase the amount of cores and stuff like that to handle all this stuff, the chain automatically resizes.
A bunch of fancy things like that could be done, but the main advantage is that as a validator you have very, very reliable timing and all the execution. And once you have that, that means you can reduce block times from 400 to 200. Confirmations gets faster. Everything gets faster. Yeah.
Lucas Bruder
Also ignoring this is a big asterisk, but ignoring replay, you can basically pack blocks way faster. So I imagine the TPS would be way higher because you're not actually executing anything. You're just like, Mert sent me a transfer. Dan sent me this. I'm just going to send it out to the rest of the network.
This is accepted and then you have all your RPCs and other things that are actually replaying it. And so, yeah, I mean, the TPS might be way higher, you might be confirming transactions way faster. You don't necessarily always know the result of what happened until one of the replay nodes catches up. But I mean, in theory, like you could run this in like an FPGA, you could run this on like, you know, for the Ethereum people out there, you could run a raspberry PI if you wanted to. It's just like, it's just receiving line raid.
Yeah, yeah, go buy a racket, a data center and put a raspberry PI in it. But yeah, you can basically just like get transactions and you pack blocks. You're not actually executing stuff and you shred and sign them and you have all your other things that are, you know, replaying stuff fast. Okay. I have like three other things I need to cover, but we don't have that much time, so I'm just going to shoot a sequence of random topics.
Dan
Okay, so one of the complaints that I saw was rent, right? A lot of developers are not too happy with the amount of rent they're paying, especially with deploys and whatnot, when dynamic rent fee, et cetera, et cetera. What do you want to tell them? Totally, there's a simd that people are trying to get like create based on kind of some of the design ideas I had a while back. Basically the goal is like the issue that the network has is it doesn't have a fixed size snapshot.
Anatoly Yakovenko
You want to maximize, again, maybe set it as a constant or whatever, but you want to set a maximum of the amount of state that the validators are expected to handle. And as soon as you set that you have to then somehow price what goes into it. And there's a whole bunch of annoying behavior issues with developers. If they ever have a client allocate state, they never clean it up because you're talking about trying to get an incentive of less than a cent for somebody to go delete memory. And there's no way to convince a human to go do something for less than a cent of an incentive.
So as soon as that state is allocated, it's never going to get cleaned up. So you need some automated way to go sweep and like merkelize or compress a whole bunch of state and move it into a different spot. That's not necessary for consensus and voting. That's outside of this fixed site snapshot. There's a bunch of ideas how to do this.
I had one. The main issue is that really fire dancer folks are like, we want to ship fire dancer to mainet and maybe we fix it after that. But as soon as I think it's up to them, I think it's between the Anza folks and fire dancers. Decide on timing. But as soon as there is, I think if fight answers out, that this problem will be probably the next one to get fixed before async execution.
It's fairly straightforward. Use a bonding curve based on the state size to increase the price of the allocation. You get some of that cost back. If you free, you get 100%. If you don't freeze, the price goes up past a certain threshold and you've left that kind of.
You've abandoned that memory. It gets merkelized and booted out of the main snapshot, don't you? It's just a Merkle proof. You just need a proof that the data is there. Didn't ethereum.
Lucas Bruder
I'm trying to remember what the token was called, but they had this thing where you like, you could, when memory or like storage was cheap people, the. Problem is allocated noise at the bottom floor, and then they read it at the top, which is useless. Yeah. So you have people that like create these tokens that represent storage on the network when it's cheap. And then when it goes expensive, and it's like the self destruct, they close it and people trade that.
So another.
Economics and engineering are going to be difficult. The design idea that I had was that as the state gets more expensive, the threshold to merkelize, your state goes up as well. So think of it as like there's a bond that you have to have, just like you have right now. The state exempt value, it's not really rent anymore, it's a bond. We should rename it from rent to a bond.
Anatoly Yakovenko
So you have some bond right now, it's fixed. The idea is to change it to a function of the current state size. So as the state size goes up, the cost to the amount that you need to have in your account goes up as well.
There's another curve right below it, which is like the compression curve. If your bond drops below the compression curve every epoch, there's a process that garbage collects everything that's below that value and replaces your data with the hash of the data. Your data is moved into a different snapshot still in the same validator. It's not accessible through SVM, you have to go fetch it and reload it. I see, right, so it's still there, it's still fully replicated.
But because it's not accessible with SVM, it's not in the hot path for any of the snapshot services from Fastboot, from anything like that. Right. So the network is actually doesn't care about it. It's still there. Go get it through an RPC API and then go, go restore it.
Lucas Bruder
So it's a bullish case for DA. It's not DA. This is not, if it has data in it, people want to call it DA. It's not da has nothing to do do a Da. It is literally just dealing with the state size.
Anatoly Yakovenko
As soon as you've kind of solved this fixed snapshot size and you have some other secondary storage that's cheaper but can grow arbitrary amounts because it's cheaper, you've effectively dealt with the state problem, because the state problem is only an issue for state that is immediately accessible during program execution. Because this is what you need for consensus. You need to guarantee that when I call this program and it references account a, that every validator in the world can get the data for account a within a page fault, like within sub microsecond or whatever. If they can't, you can't reliably time anything, you can't do consensus, you can't do anything. So what does this look like for the user?
Mert Yeter
Right? Like, if I have a piece of state that goes inactive, like my USDC account or whatever, does that just mean I have like one additional transaction to say, hey, go refresh the state, pull it into the hot path, and then I'm good to go? Yep, seems like a fair trade off. Okay, uh, moving on to the next buzzword. So, uh, everybody's talking about re staking.
Dan
Lucas Jito has something called stake net. So one, can you explain what that is? And then two, tolly, what are your thoughts on re staking on Solana? Yes. Stake net is a way for, uh, liquid staking protocols to be run on Solana in an open, transparent way and be controlled directly from governance.
Lucas Bruder
So right now, stake pools on Solana, they rely, there's a bunch of data that goes into running a stake pool. You want to see like, how many credits are they earning? What is their commission? How much mev are they earning? What's their mev commission.
What version of software are they running? Maybe there's gauges or something like that, that certain stake pools have, or you can stake to more validators. Right now everything is running off chain. So you'll have a GCP server. It's using an RPC to read all the state storing a database.
And then when you actually go to run your delegation, you load all that data in from a database and you actually compute all the stake. Stake. Net is basically, there's two parts of it. One is putting that database on chain in a transparent and verifiable way. So that's live right now you can go to Jito.net that all the data there, except for the location data, is stored on Solana.
A lot of it is actually pulled from Solana. So you can see like the vote credits in the vote account on chain. And so you can just like mem copy that on chain in a program over. And then the other piece is the actual delegation. So it's like, okay, I want this data, it's on chain.
These are the inputs to my stake pool. How do I delegate stake across these validators? That was the first use case, I guess zooming out to where restaking is a thing. This is going to be a thing when you have liquid restaking tokens. We're seeing this play out on Ethereum right now, where there's tens to hundreds of Abs's and within each AV's you have potentially a few node operators.
And so you have this issue of like I have this capital on chain and like where do I want to deploy it? And you're going to have the same issue where you have like, you know, maybe you have governance, or you have some central server that's like reading from all these avss and node operators and trying to see how they're doing. But this is like, you can basically like use this on chain for like a world where there's restaking.
Anatoly Yakovenko
I think. So the purpose of restaking was for you to pick a quorum. And the feature that was cool about it is that you could have some kind of fraud proof to prove that like the quorum did something bad, or like one of the quorum participants did something bad and slash him from the quorum. This is kind of like the generalization of that setup, that proof of stake networks have built. Can you build that as a smart contract itself and run it as a service?
Because for whatever reason, straight up democracy governance tokens with low quorums where an attacker can go borrow 5% of the supply post a proposal and approve it. Those don't work. Those fail. But Chihuahua chain which runs tendermint, doesn't fail, even though there's governance protocols with much larger impact and supply and value than the smallest tendermint network. But the reason why those tendermint networks are so robust is because there is a quorum.
It takes time to stake. That takes an epoch, so there's an election process. The faults are attributable. If a single node creates an invalid block, all the other ones observe it, they can get slashed. All those mechanisms are built in, and it creates an environment that's very robust at picking good quorums.
If you have a good quorum, the probability of it failing is very low because you have to have the entire dishonest majority to go all collude. If they're really independent, and this process really does a pretty decent job ensuring that it's very unlikely. At least I don't think we've ever seen any proof of stake network have a dishonest majority. So, like, from that perspective, I think restaking is awesome. It should be kind of how governance protocols work.
Like if you create a dao, you should, it should be restaked, should have a process to pick the, the dao board or whatever, the validators, they should monitor each other, they should be get slashed. And all this stuff should be built into one little library. So all that stuff I think is really cool. Would have saved a lot of headaches, I think, from last cycle, if all that tooling was there. Yeah.
Lucas Bruder
I think the interesting thing about this on Solana is, at least in Ethereum, you're kind of seeing this, like, push away a lot of stuff from the l one. But on Solana, I think it will actually pull more into Solana. Like, look at neon. Like neon. From what I understand, neon evm is like a program on Solana.
Anatoly Yakovenko
It's a pessimistic. It's a pessimistic roll up. It executes everything synchronously. So, like, doesn't need fraud proofs. Yeah.
Lucas Bruder
So, yeah, I mean, I think there's like, on Ethereum, you see a lot of this pushing away. I think on Solana you'll actually see more of this happen on chain and, like, you can actually run your entire AV's and all the coordination and, like, you don't really even, like, probably don't even really need fraud proofs or like any of that crazy stuff. Like you were just running consensus on Solana in your smart contract. Last question. This was a while ago, but I don't think we had either of you on since to discuss it.
Dan
Lucas, you guys, a while ago suspended the mempool for some period of time, which kind of. Maybe that's more of a short term thing, because at some point somebody will bring up a mempool is kind of the train of thought. What do you think? And this for both of you, what do you guys think the end state of MeV on Solana looks like in a few years? I think the end state of MeV is like ultra transparent.
Lucas Bruder
I think that's something that we've been doing pretty well at. I think we can be a little more transparent about what validators are doing, what others are doing. I think you will have.
There will be some auction involved, is my guess, maybe to Tolly's disagreement, but I think, like some super, super fast frequent auction, maybe it's like ten milliseconds or 20 milliseconds. One more point is that having two proposers at the same time is like an amazing. Like, you have optionality, so you have the like, right now, if something's time sensitive, you only have one option. If in the future you have multiple proposers, then you have multiple options. And so it will force people's hand to behave in a way that is good for users.
And yeah, I think we're like, there's a lot of validators that care a lot about being good for users in Solana. So I think we'll continue to see that improve and evolve. The l one's job, I think, is to maximize the competition of all these other kind of services running on it. To prevent a parasitic single service provider that's screwing all the users. You want to make sure that a user always has the option to go pick validator a versus validator b.
Anatoly Yakovenko
Right now they can do that. You can actually do it at the program level. You can sign a transaction that'll only succeed if it's executed within a specific slot. You can become very paranoid if you wanted to do that, but the goal is to create such an abundance of these options that the user doesn't have to pay a time delay to go pick a specific set of validators that they know are good multiple block producers per slot, shorter slot times. All this stuff will increase those options.
The user can go sign. Hey, I only want my transaction executed by leaders xyZ, if some, if the other options are really parasitic, and if that creates those kind of free market dynamics where the parasitic ones don't get any revenue, right, and they die. So those are like kind of the things that we want to balance is like make sure there's transparency is like really important. User needs to know if they're getting screwed. And two, they need to be able to pick an option where they're not getting screwed.
I think if those things are possible on an l one, I think you're going to see pretty stable, good market dynamics play out. But I think it's very much possible that every kind of attack vector you can think in MeV could be a feature if that validator gives a rebate to all the users that use them.
I don't see an issue with somebody that says, hey, we're going to maximize the amount of value we can extract, give all the users rebates as long as users have the option to go pick that one or not. I don't see an issue with that in an open market. One more thing I want to hit on before we wrap it up is we touched on it a few times. Lucas, you called Fire Dancer a fancy future earlier in the show, but we never hit directly on it. I know first they rewrote the networking stack, which, as we spent 30 minutes discussing at the beginning of this episode, that's where a lot of the problems are today.
Mert Yeter
How much of the future shifts in a positive way once fire dancer hits mainnet. I think it shifts in a very positive way. I think it's going to be rocky at first. I hope it's good, but I want to make sure people's expectations are set realistically. The team there is killer and they've been working closely with the Anza team, but I think every software is bugs.
Lucas Bruder
I think it will be very good for the performance of the network. And I think that there's a lot to. I think there's a lot that can be learned from a ground up rewrite. You know, Tolle and the rest of the team built this, like in there, like building a plane in there, however they say it. And you know, if you can zoom out to 30,000 foot view and like look at how the system actually works and isolate these things better and build a pipeline in a better way, then those things can also be back ported in a similar way to the anzaclian and other clients that pop up.
So I think it's going to be fast, it's going to be good, it's going to push Anza to make their client as fast as well. And so I think pretty excited about it. And then there's also Tony mentioned this a few times. It's like, yeah, they want to ship. There's a lot of cool stuff that we want to get into the protocol, but it's like, does that stuff go into the protocol today, or do you wait for a fire dancer to catch up?
Because right now it's kind of like the Anza in the protocol is this moving target, and if the target keeps moving, then you have this other team that's constantly trying to catch up. So once it's live, hopefully it will be great. And we can also catch up on some of these other things like runtime, v two, and async execution and all that. One thing that I want to add is my hope is that the actual protocol development moves faster with both clients than with just one. Because the probability of the a single team shipping a bug that's catastrophic is too high.
Anatoly Yakovenko
So they have to move really carefully. The audit cycle and everything else is much, much, much slower. My hope is that with two client teams, you could actually have both of them moving much, much faster on protocol development because the probability of both of them implementing a bug that's catastrophic and that's triggerable is pretty low. So this is, I think, a superpower that ethereum is now taking advantage of. If you have, like, two strong client teams or x, you know, n number n strong client teams, they should actually be shipping protocol improvements much, much faster.
Mert Yeter
Awesome. Love that perspective as well. Well, gentlemen, it has been a pleasure being the village jester amongst the room of the Giga brain engineers behind Solana. So appreciate all the work you've been doing and coming on the show and explaining that work as well. So, yep, all the best, and we will be lucky to have you back on in the future, but cheers.