431. Web 3.0 Database Dominance, How to Trust Black-Box ML Models, Google's Ad Business in an LLM-First Search World, and Lessons from Looker, Monte Carlo, and MotherDuck (Tomasz Tunguz)

Primary Topic

This episode dives into the evolving landscape of Web 3.0 databases, the trustworthiness of black-box machine learning models, the impact of large language models on Google's advertising business, and insights from companies like Looker, Monte Carlo, and MotherDuck.

Episode Summary

In an in-depth discussion, Tomasz Tunguz explores significant advancements in Web 3.0 and AI's role in software and business frameworks. The conversation spans the intrinsic value of Ethereum as a database, the practicality of blockchain technologies in mainstream business applications, and the profitability and sustainability of these models. Further, Tunguz discusses the future of Google’s ad revenue in a search landscape dominated by large language models, emphasizing the potential decrease in profitability despite maintaining market dominance. The episode also touches on leveraging AI in venture capital and the broader implications for software investment strategies.

Main Takeaways

  1. Ethereum's potential as a valuable database surpasses traditional software companies in profitability and market cap.
  2. Blockchain's practical applications extend beyond finance, influencing major corporations' accounts systems.
  3. Google's ad business might see a decline in profitability due to the space taken up by generative AI in search results.
  4. The future of AI and Web 3.0 databases in software development is promising, with significant reductions in operational costs over time.
  5. The episode underscores the growing influence of AI in venture capital decision-making and software development.

Episode Chapters

1: Introduction to Web 3.0

Tomasz Tunguz discusses the impact of Ethereum as a transformative database, highlighting its profitability and broader applications beyond traditional financial uses. Tomasz Tunguz: "Ethereum is possibly the most valuable database company of our time."

2: Trust in Black-Box Models

The conversation shifts to the reliability of machine learning models where Tunguz emphasizes the importance of transparency and accountability in AI applications. Tomasz Tunguz: "It’s crucial to understand how and why AI makes decisions to foster trust and integrate it effectively."

3: Google's Ad Business and AI

Exploring the intersection of AI with Google's business model, Tunguz speculates on the shifts in profitability due to AI integration in search functions. Tomasz Tunguz: "AI may lead to less ad space and potentially lower profits for Google’s search ads."

4: Lessons from Leading Tech Firms

Insights from Looker, Monte Carlo, and MotherDuck illustrate practical AI and data applications, driving home the tangible benefits and challenges of current AI technologies. Tomasz Tunguz: "These companies illustrate the cutting-edge use of AI in solving real-world business problems."

Actionable Advice

  1. Consider the long-term implications of integrating Web 3.0 technologies into your business.
  2. Evaluate the transparency of AI models used in your operations to ensure they align with your company’s ethical standards.
  3. Stay informed about changes in digital advertising and anticipate shifts towards more AI-driven content.
  4. Explore opportunities for applying AI in your business to enhance efficiency and decision-making.
  5. Monitor the evolving landscape of venture capital as it increasingly incorporates AI into its analytical processes.

About This Episode

Tomasz Tunguz of Theory Ventures joins Nick to discuss Web 3.0 Database Dominance, How to Trust Black-Box ML Models, Google's Ad Business in an LLM-First Search World, and Lessons from Looker, Monte Carlo, and MotherDuck. In this episode we cover:

Blockchain and Web3 Databases, with a Focus on Ethereum and Its Potential Dominance
AI, Data Analytics, and Their Applications in Business
Machine Learning Challenges and Opportunities,
AI Innovations in Robotics, Self-Driving Cars, and Programming
Using AI to Personalize Sales Pitches and Improve Response Rates
AI's Impact on GDP Growth, Productivity, and Profitability

People

Tomasz Tunguz

Companies

Ethereum, Google, Looker, Monte Carlo, MotherDuck

Books

None

Guest Name(s):

None

Content Warnings:

None

Transcript

Nick Moran
Welcome to the podcast about venture Capital, where investors and founders alike can learn how VC's make decisions and reach conviction. Your host is Nick Moran and this is the full ratchet.

Tomasz Tungus joins us today from Palo Alto. He is founder and general partner at Theory Ventures, an early stage venture fund, investing in software companies that leverage technology discontinuities into go to market advantage. Tom has invested in companies including Looker, Monte Carlo, Mother Duck, Mistan and Arbitrum, of course. He is also the author of the ever popular blog tomtungus.com. Previous to Theory, Tom worked as a managing director at Redpoint Ventures.

Tamash, welcome back to the show. Thanks for having me, Nick. It's a privilege to be here, so. Bring us up to speed. It's been, I don't know, five plus years since you're on the show.

Obviously you've launched your own firm at this point. Walk us through the path from Red Point to theory. Yeah, we have built a little team. We have seven people now on at theory, and the idea is to build a concentrated portfolio where we do a lot of research into the areas where we invest. And it's a real thing.

Tomasz Tunguz
I have a lot of empathy for founders understanding how, I mean, and, you know, the same thing, right? Getting off the ground. It's a marvelous thing to be able to take an idea and actually see it come to fruition. And it's an incredible position of privilege to have people bet that it will work, whether it's the incredible people who work at theory or the limited partners who've invested along the way. And we started basically about a year ago.

As I said, we're about seven people now. We've invested in seven companies, should be eight on Monday. And we invest in everything they do with data. So there's the modern data stack, which we love, and then applications and infrastructure of AI. And then also we look at block blockchains as databases.

We think Ethereum is one of the most valuable database companies of all time, if not the most valuable database company of all time. And as the price performance characteristics of those databases improve, they will be much more ubiquitous than they are today. Very interesting. And what stage is the best entry point for you? Ethereum.

We typically become involved either at the seed or the a. So two of the investments we've made so far are formation stage, and then about four series a's, one series b. Okay, and how many deals will you do for the first one? Around twelve to 15. So not very many.

Nick Moran
Okay, so average check size ballpark. This bimodal seed right now is around 3.5 to four and then the a's are around 1313.5. Okay, very good. Awesome. So I think we covered the thesis.

Let's talk a bit about web three. You mentioned Ethereum. I think there's differing viewpoints on ethereum. Can web three databases be valuable as standalone businesses? So I would say yes.

Tomasz Tunguz
Look at ETH. Ethereum is worth 350 billion in market cap. It's roughly seven times snowflakes. Valuation is worth 60% more than Salesforce. And a lot of people will say that's tulips.

And there's, look, there's some truth to that. There are many of these tokens that have very small float trade at massive multiples, have no appreciable revenue. Reality, Ethereum as a business, it's really a project, but let's call it business for now produced almost 400 million in net income in Q one, which made it the most profitable software company, if we can call it that in Q one, on a percentage basis about 40% 45% net income margin. And if you were to take just the aggregate dollars, that 400 million, it would be, if you consider it a software company, it's the software company in Q one. It's the 6th number, six in terms of total net income production, total profit dollars after Microsoft, Adobe and Zoom and a handful of others.

So look, there's a real, I mean it's hard to argue there's a real business there. I won't tell you that the net income margin has been as beautiful as that forever for that business. But reality is most of the software companies that are publicly listed on the Nasdaq still aren't profitable. And so here you have a company that's worth trading at a very high multiple producing a lot of cash. How much of that net income do you think is derived from real utility and real applications?

Yeah, I think stablecoin volume is now larger than visa. In the first two months of this year there was 2 trillion of stablecoin issuance and that's more than the visa network processes. I would argue some fraction of that is real. Then there's a lot of next generation finance applications and it's easy to deride those financial applications as not being meaningful or doesn't have the clout or the class of the New York Stock Exchange or the Nasdaq. But the reality is there are businesses that are built trading those assets.

There are estimates is that there will be somewhere between 150 games over the next two years that are built on top of blockchains that are producing real income. And so it's early, right, but there are real, I mean, we know of one fortune five company that has built its Accounts payable and accounts receivable system on a blockchain. Because the database is immutable. It's effectively a double entry ledger. And so it's Reddit doing the s one.

They declared that all of their cash and short term equivalents are held in stablecoins. They're not held in a bank account. So there are these little points of, okay, there's an innovation here, and that's a real company, and there's an innovation here and that's a company. And the transaction volumes are going up and it's really big. And so, yes, okay, let's discount it by like 50, 60, 70% just because it's wild, but it's real.

Nick Moran
Which is the Fortune five company? Oh, I don't want to say it's not public. No, it's not. I don't know if it's public, and so I don't want to. All right, so there are various debates about which of the web three databases is going to win.

And there are some issues with ethereum, naturally, but it is the largest web three database by market cap. Do you think it will be the dominant database going forward, or is there a challenge to Ethereum's model that will allow others to overtake it? I think so. In the Ethereum ecosystem, you have Ethereum as a base layer, and then you have the next generation l two s, which are now processing more in transaction volume. They're processing more in transaction count and in total value most days.

Tomasz Tunguz
And so arbitrum has about half of that market share, and there are others. I think those layers on ETH will become the dominant ones. And like at the end of the day, when everything needs to be settled and written to a permanent ledger, and you can pay for a higher transaction costs, let's say it will be written to ETH. Most of the transactions will happen at the l two or even the l three layer you can think about as like a database caching layer, and then an l three. In the Ethereum world is an application specific database with different parameters.

Then you have Solana, which it's ups and downs for Solana. During FTX, it was ripping, and then it really kind of fell off as that, as FTX imploded recently, it's had a resurgence, and there's some technical challenges associated with it. It is a more centralized model. The rust programming language is a big appeal, and then you have the whole cosmos ecosystem, which more predominant in Asia, but has some really compelling attributes, particularly around compatibility. And I think what we'll see is there'll probably three or four majors here, just the way that we have three or four major clouds, and they'll have different parameters.

That one will be the most developer centric, one will be the one where it has the highest performance for very demanding workloads. Maybe that's Solana, maybe it's another technology. But I think you'll see three or four that ultimately emerge as the big clouds, let's call them. What's the future for web three databases in software development? So I think there are a couple of different ones.

But our long term view is that because of the price performance of these databases, let's just set that level. Set for a second. Three years ago was a million times more expensive to write to a web three database than RDS. And today it's about, today it's about, can I get the fireworks going? Today it's about 1000 times more expensive.

So in three years, we've cut three orders of magnitude off that cost. And at some point, it will asymptote. Web three databases, because of their decentralized nature, will always be more expensive. But the reality is many software companies with 70% to 80% margins won't care. And for really sensitive data where it has to be secure, you will likely write it to a web three database because you can cryptographically prove that the data wasn't hacked.

I mean, how many times has your Social Security number been hacked, Nick? How many? Never. Never to my knowledge. Oh, my gosh.

So my Social Security number has been hacked eight or nine times. And it's a hospital system that leaked. The credit card company, the auto finance business. I mean, I think I'm on. I get these notices every three or four months that says here we've enrolled you in a free credit monitoring program.

And for ten years, I've never needed to pay because somebody leaked a bunch of stuff once and it just keeps recirculating. And so a funny story, my kids. Went, I don't know if you're a bigger target or me after this discussion. Yeah. So my kids went to the dentist, and the dentist asked for my Social Security number, and I said, no, you can't have that.

And the reason they do it is because they look up your policy. But the reality is, who knows what the security policies are of a local dentist office? And if big hospital chain, big local hospital chain cannot secure, what hope does a dentist have. And so I think we're going to be moving to this ecosystem or sort of an architecture design where major corporations want control over their own data. We're seeing that with iceberg and snowflake for different reasons.

And users will want to control their own data. And so the sensitive information will very likely move to web three databases. And we're talking here like a five or a ten year period. This is not happening tomorrow. But that's our belief.

And so the price performance characteristics have to get there. There'll be increasing regulatory pressure and costs associated with storing data. GDPR, CCPA, every state will have their own. I think there are 40 states in 2025 that will have some form of regulation, german data, locality rules. And so all those things will really drive a lot more demand for web three data.

But the reality is, like, the APIs for these databases will look like a Mongo API, or they'll look like an RDS API and nobody will know. They'll just be marketed as cryptographically secure, can't leak HIPAA databases. Consumers get more control over their data and access to the data. Do you think that creates headwinds for tech? No, I think, I think there is this broad sentiment about not being in control of your own data.

And there's been, when I was at Google, there was anti Google sentiments because Google just knows so much about you, I think as the. And so I think there's that underneath the water, a lot of consumers don't prioritize it enough to care. Some companies are finally starting to care about it. Large language models and the use of the information that you have will totally accelerate it. I mean, I think the next generation architecture for lots of large language models will be running on device.

And so, I mean, already seeing it with Google, like the voice transcription, the model is running locally. Some of the generative search, I think, is running locally. And so I think people will want this to happen. And I think at some point the companies that are processing huge amounts of data will also want to move to this kind of architecture because the compliance cost would be enormous. What do you think happens to Google's advertising business on an LLM first search world?

Yeah, great question. One of our predictions, one of my predictions for this year is that half of search queries by the end of the year are generative. And the ad model at the end of this year. Yeah, I know that's really aggressive. It probably won't be, right?

Nick Moran
I don't know. Like every time I go to my parents are 70, every time I go over there, my dad is telling me about chat GPT and so if he's using it, that gives some confidence to your prediction. Chat GPT is top ten app on the App Store. On my phone when I most of the queries that I see, Google generates a paragraph on the top. And so I think initially I thought that the searches would be fully generative.

Tomasz Tunguz
But I do think we'll probably end up seeing a hybrid where at the top there's a summary and the q and a box, which is what Google has now, and then a bunch of links at the bottom. And I think in this model two things happen. The first is Google continues to win and continues to have incredible market share. And the search ads business is worse. In other words, it's less profitable.

Why is that? Right now I don't know. 60% to 70% of a search results page is ads. And just because there's now a generative text box at the top that is taking up ad space, the revenue has to decrease. And so I think that's what Google continues to win.

But the search ads business is actually is worse because it's at least ten times more expensive to run a generative search query than a standard query. That number will go down with time. And the ads, you think that 80%. To 90% of the revenue? I think it's less than that.

YouTube is pretty enormous. The content ads business, which is the ad network, is a pretty enormous business. And then the display network. Yeah, but it's been a long time since I've looked at the relative share. But it's vast majority of profits because there's no rev share where on the content network you have to pay the publishers for running your ads.

So I think Google wins, but is less profitable. Back on data. What areas or subcategories within the modern data stack present the most white space and opportunities for startups in the coming decade? Yeah, we're excited about the bi layer. So I was involved with Looker.

We backed a company called Omni, which is a handful of the Xlookr core team that are building a new model. The new way of building data combines that allows a business analyst to say, I want to define cost of customer acquisition. They create that metric and their approval workflows to put it into centralized it. It's amazing that talent from looker is spinning out and doing great things. I remember having you on the show talking about Looker very early on.

It's such a wonderful collection of people and they understand the ecosystem really well. So a thrill to be involved with that team. I think there's the second area is the databases themselves involved with Motherduck and Jordan, who was the tech lead at Bigquery, came to this realization that 80% of bi workloads are smaller than ten gigs in size, which means on your or my MacBook Pro, we can analyze that data. And so he has built a company called Mother Duck that's commercializing that data. And what's really interesting about that technology is not only is it really fast, but you can embed it in, Omni actually uses it.

So if you have a really big data set and you want to analyze it and visualize it, Omni will take a gigabyte of it, put it in the cache in your browser, and then actually run wASM container with Duct DB there, and then analyze the data really fast. So it's a new architecture. It's like the new edge. Yeah, it's a new edge. And so you can take advantage of the fact that the laptops that we have are super powerful.

It increases your margin. The latencies are significant. I mean, most of the ductdb queries are less than ten milliseconds. Even on pretty large datasets, the user experience is meaningfully different, even from any other browser based application. And then the last category is data transformation.

So what we hear from buyers today is lots of different data pipelines. Large language models are stressing those data pipelines, because now all of a sudden, everything downstream of the cloud data warehouse used to be offline bi exploratory analytics, offline customer segmentation, churn prediction, but now those data pipelines are actually being fed into the training models. And so there's a lot of stress there. And we have made an investment in that category, which is we haven't announced yet, but I think there's a lot of innovation around data movement and ETL pipelines. Yeah.

Nick Moran
What is the primary constraint on the future in an AI first world? Is it going to be bandwidth? Is it going to be processing? I think the scale of data is one, yes, and then the cost associated with the compute is the other. So.

Tomasz Tunguz
And this is why Snowflake, I mean, you have separation of storage and compute in the sense that a lot of the major enterprises that are snowflake customers want to store their data in iceberg tables inside of s three, manage their own data, and then allow different query engines like Snowflake or DuckDB or spark to hit them for different applications. And that's driven first by cost. But also there are many more users demanding access to that data. And so that's a really big. And then the last is these things can be really expensive to run.

So that's. And cost, according to the data leaders that we speak to, setting aside AI initiatives, cost is the single biggest driver. Snowflake used to have 177% net dollar attentions, down to 100%. 127 the last time I looked. And so it's real.

People are starting to pay attention to how do I segment workloads as a function of cost? You've mentioned to me that you think and seek out applications of machine learning where the feedback loops are really short. Explain why you're interested in short feedback loops. So I'll give you an example. So if you use Google and you click on the first link, and then you come back to the search results page within 5 seconds, and then you click on the second link and Google never sees you again, that's a really fast feedback loop for Google to understand that the first link shouldn't be the first link for that search, especially at scale.

Imagine if it took Google a year to figure that out. The search results would be much worse. But because the feedback loop is basically instant, it's really good. And so what we're looking for with AI are very similar feedback loops. If you use Bart or Gemini and you ask for a result, there's a little g button that says, verify the results.

So we were looking for an suv with three rows, and I was really curious about what is the third row legroom across the most common suv's in America. And Gemini produced for me a table, and then I hit the verify button, and two of the models, it was correct, and two of the models it was not. But that feedback loop goes and presumably retrains the model. And so we're looking for is products that are able to capture that feedback loop because, and these are broad numbers, but the way we think about it is it's pretty easy to get a machine learning system to 80% accuracy. Think about the self driving car.

It's really hard to get to that marginal 15% to 90% accuracy. The faster a product or a service can achieve that level of accuracy, the more they'll differentiate. It's basically exponential in terms of difficulty. And so faster and faster feedback loops allow you to climb that curve ahead. Of your competitors, like humans and computers, I would contend.

Nick Moran
Tom, are you more interested in machine learning at the infrastructure or the application layer? I think we're more interested right now at the application layer. There are just many more opportunities. So the way we think about it is in web two. There are three companies that are the largest clouds.

Tomasz Tunguz
They're worth about 2 trillion collectively. In order to get to that level of market cap at the application layer, you need 100. We're talking about Snowflake and Netflix and many others. And so we think there's something very similar that will happen here. Look at the capital intensity of the large language models.

Amazon was recently quoted saying that the next generation of models will cost a billion dollars per training run to execute. There are not that many companies that can afford to pay a billion dollars, and I feel for the engineer who has to hit the go button, because if they've made a mistake on something that expensive. But so I think that the capital intensity will really benefit just a handful of players for the infrastructure layer. Now, that being said, I think there's a bunch of developer tools that will be really important to simplify some of these architectures, whether it's query routing or vector improving vector computation, or analytics and evaluation. Those categories, I think, will lend themselves to startups more than they will the large language models.

But broadly speaking, in terms of the total number of count, or the addressable opportunity for startups, the application layers is much bigger. One of the challenges with machine learning is that as it gets more sophisticated, technical leaders often don't know how or why a model makes decisions. So the future can be unpredictable and present scenarios that have no precedent from past data. How can ctos with high stakes tech infrastructure get comfortable implementing a system where they don't understand and can't explain how it works? So this is, aside from security, this is one of the biggest questions.

So these models are, they have two attributes to them. I mean, one, they're fancy word prediction machines, and I know I'm simplifying, but they're fancy word prediction machines, and they have two characteristics. The first is they're non deterministic, and the second is that they're chaotic. What does that mean? If I'm typing into a chatbot and I ask, how do I reset my password question mark?

And then I ask, how do I reset my password space question mark? Those are two different inputs and will very likely receive two different outputs. The other challenge is compared to like a classic machine learning classifier. Is this a photo of a dog or is this a photo of a cat? There's two different potential inputs, and there's two different potential outputs.

The universe of potential inputs into an LLM is enormous, right? I can ask, how do I reset my password in any number of different ways, in any number of different languages with all kinds of creative punctuation. And so consequently, the universe of potential outputs is also equally large. We're kind of measuring orders of infinity. And so as a result, if I'm building a product, it's really difficult.

I can't take an average result and say the average result is pretty good because the distribution is so broad. So there's a couple of different ways to solve this, or maybe not solve, but manage. The first is to run what people are doing now, which is evaluations. You just have example queries, and just like a test suite that you might run on software, you just go through those queries. The second level of evaluation is to watch the user queries in an analytics platform and then take the output of that and make them the test cases, because generating the test cases themselves is an awful amount of a lot of work.

The third thing that people are starting to do is they're starting to narrow the models and use smaller models that are built for purpose. So a large language model is really good at handling any kind of incoming query, because it knows so much about so many different things. If you have a very frequent query, like how do I reset my password? Or things like that, a company may ultimately end up training a very small model that is excellent at handling only that task. And so as a result, the input's narrow and then the output's narrow.

But even in those scenarios, even turning down the knobs of temperature, which is a way that you control how creative or the distribution of the outputs would be for a model, you still have the risk of hallucination or just outright wrong answers. There's, I mean, I'm sure. Did you see the bereavement canadian airline? No. So there was a man who unfortunately suffered a loss in his family, and the canadian national airline has a chatbot.

And he asked the chatbot, what is the bereavement policy for the airline? And the chatbot provided him a bereavement policy. He took a screenshot, saved it, booked the flight, and I think it was a 50% reduction in the overall ticket price. Because he needed to fly with some urgency. He applied to the customer support team and said, I have my 50% back.

And they said, sir, we don't. This is not our bereavement policy. It's a complete fabrication. And the government, the judicial branch, ruled that the airline needed to, needed to respect the policy that was produced by the chatbot. Wow.

Yeah. There were real world ramifications. There's another example I didn't follow. One was the chatbot originated? Was it a canadian based chatbot?

I think so. Okay. Yeah. Some sense then. Yeah, it makes sense.

And there was another example where a car buyer negotiated with an online chatbot to buy a car for a dollar, and the chatbots had agreed. And so there's a. Right, because you can manipulate, you can ask it to write a poem, you can ask it to write a song, you can ask it to negotiate, pretend to be someone, and it will respond to you, because the universe of potential inputs is so large. And anyway, so I think there are real ramifications, and I will see many more examples of these where humans nefarious are not a manipulating models, just the way that you know, to negotiate a good deal. Or so the examples you cite here, the stakes feel low to me.

Nick Moran
They're monetary and they're on the order of. Maybe with a vehicle it's a couple tens of thousands of dollars. Something like self driving, though stakes are a little bit higher, right? Tesla recently launched their FSD twelve. At the core of FSD twelve is a shift from traditional programming to neural network based decision making, or imitation learning, where raw video from the cameras on the vehicle is processed and the tech translates what it sees directly into driving actions, mimicking human cognitive processes more closely than ever before.

So rather than programming all the individual edge cases, engineers can train the car to function like a human solely by providing numerous examples of humans driving cars. So this kind of gets back to the point. This is more of a high stakes hallucination in this context, could have significant ramifications. So I'd like your general take on what Tesla has done here with FSD twelve. I'd also like your opinion on what this means for programming.

If we're getting away from sort of these logic based if statements and going to like a sort of a brave new world of how programming gets done. It'S a great question. So I think there's, within the world of robotics, and I would put self driving cars in the world of robotics, there's a lot of innovation that's happening that has come from Stanford and Google, from brain, where we are teaching robots how to do things using reinforcement learning. So we're doing things with robots, and as opposed to teaching them rules or having them look at data and learn from that data, we're just, we're doing it alongside. And if you Google mobile, Aloha, you'll see some of these videos of a two armed robot cooking a piece of shrimp, or holding some clothes, or putting items into a grocery bag.

Tomasz Tunguz
And this is holding a lot of promise. I think the dynamic, and I'm not that deep within the world of self driving cars, but I used to study control systems, and I think what's happening inside, I would guess that what's happening inside the self driving car, and especially with these robots, is there's some machine learning system that is suggesting particular actions. And then there are control systems which are deterministic, that have certain parameters that will say, like, okay, the generative component or the next generation machine learning component will suggest an action that will go into a classic control system that will say, no, if we were to move the steering wheel that fast, the acceleration forces on the passengers would be too great, or the probability that this is the right decision is too low to accept. So let's slow the car down. I don't know if you saw in San Francisco one of the on ramps, there was one Waymo taxi that was trying to merge, and there were a bunch of cones, and as a result, it was confused, so it just stopped.

And then there were six other Waymo taxis behind it. Anyway, that's an example of a brilliant control system, right? The cognition systems or the sensory systems on the taxi didn't know what to do, and so it failed in a beautiful way. Nobody was hurt. People were a little bit encumbered with some delays, but that's the.

And so you have really great sensory systems, and then you have very strong control systems, like autopilot systems for aircraft. That's what I studied one of my final exams in grad school. Those are classical mechanisms, and autopilot's been working in aircraft for 50 years very safely. So I think that's how we balance these systems. There's generative systems that suggest.

And then there are control systems that control in terms of what it means for software programming. I think these are two separate paradigms where it is possible to teach programming through reinforcement learning. But quite frankly, like the copilot systems and code llama, they're just really good. I mean, you look at Devon, which is a fully automated software engineer, and there are four panes, right? There's GitHub, there's a terminal window, there's a web browser, and I forget the fourth, but it's task with creating a web website.

It creates the account at AWS, it downloads the right GitHub repositories, it creates the user accounts, and then it writes the code. It's pretty good. I mean, it's still early days, but it's doing a lot. Are you using this in your own workflows at this point. What are some of your favorite recent uses of AI?

So I love it. I run, I've just been testing llama three locally, the 8 billion parameter model that works really well. I love dictating because I type a lot. There are lots of messages. You've been doing that for years, though, dictation, although with the help of humans, I think that's right.

Yeah, initially it was drag, but the new models are so sophisticated that you rarely have to correct an error. And so most of what I publish and most of what I write and the reason I use an Android is actually for the dictation. The one really fun, one really fun workflow that's new for me is this idea of a living document. So what do I mean by that? So, let's say I was preparing to interview you, Nick, and I was collecting a list of questions, and I wanted to know about your background.

I would start a document. I'd say, what is Nick's work history? And it would produce that for me. And I'd say, here are ten of Nick's recent blog posts, and here are five of Nick's recent podcasts. Remember them all?

And then put together a list of questions. And they would produce for me a list of questions. And then you would tweet something. And then I would say, great, here's the tweet. Add a question about this.

And the LLM is actually managing. In other words, it's no longer in a Google document, it's in a chat GPT session or Gemini session, where that document lives. And as there's new information that's coming in, the LLM is actually updating. It's not perfect, but it's a completely different workflow around information management, where I'm not editing the document, I'm not organizing the outline, just telling the computer, do this or change this question, or put this section about his background earlier, and let's finalize, like a lightning round or whatever thing that I want to do. And that's.

I really enjoy doing it that way. I have to remember less. I mean, it would be amazing if it was doing that on the fly as we're speaking. Like, Tom, what you just said contradicted what you said, like, four years ago on the show. I'd be in real trouble.

Nick Moran
That would be pretty good. That would be pretty good. Yeah. It's funny because we are incorporating that a bit in our outbound flow. So we have, like, this platform that scrapes a bunch of stealth deals that people don't know about.

And then it goes out to the profiles, it finds the Twitter, it finds the LinkedIn, it finds similarities. And so without even me having to go to email, I can just click on a startup in our platform and it will pull up a note that says, hey, I see that you went to a big ten school and you were an athlete. I was a swimmer at Indiana. I see you're building a startup. It would be great to connect.

And so it can find some similarities in my profile and a founder's profile to kind of customize the message and increase the hit rate. It's the future. And my understanding is that the response rates of the machine generated emails are very similar. There's maybe five, five, seven percentage points difference in response rate, but at some. Point they will be the same so much faster, right.

For me to do that research, it would take me, I don't know, five to ten minutes. The hard part now though is you have to remember that context coming into the meeting somehow. It is in the email thread, remember. Because this is, yeah, that's right. It is in the email thread.

Tomasz Tunguz
But if the machine does more and more, what I find is I remember less and less. And so I have to go ahead, read and recontextualize. This is part of my issue with like, I take notes at all the meetings I have with startups and now I've got circle back and other tools that are taking these incredible notes for me and summarizing them, but I don't remember the conversations as well as when I took the notes. Totally interesting. Yeah, there's something to it.

Nick Moran
So Tom, how has your investment approach been impacted by the overall sentiment from enterprise buyers to consolidate their tools and try and reduce costs? So this has been the dominant narrative within organizations, especially when rates, when the Fed increased rates from effectively zero to 5500 basis points, I think. I'll give you an example. We were talking to one enterprise buyer and she said, do not sell me another tool. That was the first thing she said on the call.

Tomasz Tunguz
We were just interviewing her about how she's thinking about budgets. I do not need another tool. And I think that's broadly true. People feel that way and there's a desire to consolidate and there's a desire to see more value. We saw that in the Palo Alto most recent earnings, where the most enterprise customers are saying, I bought the 15th or the 16th security solution, but I'm not actually more secure than I was in the past.

And so there's incumbent upon the salesperson, I think the challenge to convincing them for a new budget line item, or the marginal SaaS software is much higher than it was three or four or five years ago. And what that means is people buy software really for two reasons. One is to increase revenue, the second is to reduce cost. And during this zero interest rate environment, the pitches became, you could pitch something further and further away with a less and less tangible connection to one of those two value propositions. And now it's back to basics.

It's really how are you increasing my revenue? Or how are you meaningfully reducing my cost? And I mean, we had this, we were on a call yesterday, one of my partners, Andy and I, we were talking to a head of engineering at a very large company, and we were asking him about code auto completion, and we asked him a question around latency. How important is it that the code auto completion respond in 2 seconds as opposed to 5 seconds? And I think the dominant narrative within the valley is that matters.

Those 3 seconds, I mean, Google 100 millisecond latency on the click really matters. So a three second difference. And his answer was, look, if it takes three more seconds for a printer to print out a piece of paper, am I buying a new printer every year in order? Because the sum of all those little epsilons in terms of time ultimately aggregate to some huge number of hours, and then you multiply it by the hourly rate of the average engineer is no, then they can wait. And so I think there's like that has changed where it's not about squeezing as much juice from the lemon as possible, maximizing productivity, it's more about big blocks.

It has to be blatantly obvious and completely transparent at the beginning of the pitch that this really will move revenue or reduce costs in a pretty meaningful way. How do you think about whether a new workflow or capability should be built into an existing platform versus standalone? This is a great question. I mean, Salesforce, 20 years ago or 25 years ago, they perfected this playbook of starting with one particular workflow inside of a CRM, which is Salesforce automation. And they went after Siebel and Oracle, and after that they expanded.

Now it's worth 250 billion. And if you review almost every single business since then that started a software company, they focus on a single workflow, made it brilliant. And then if they were able to get the scale, they tried to broaden out, and some of them more successfully than others, rippling would be a great counterexample of a company that was started as a compound startup from day one. Let's just raise a lot of money hire individual product managers to build out. I think last time I looked there were 27 or 28 different products that were all integrated.

There is a desire within the buyer ecosystem to find products that look like that because of these cost dynamics. On the other hand, it is exceptionally difficult to do both because of the capital requirements that are needed and then also strategic vision of a CEO to say this is the way that I want to build a business and to have that level of market acceptance. So I think if a buyer had a choice they would take the suite or the compound strategy. But I think in reality the sales force path to market will likely to be the vast majority of startup efforts because much faster time to feedback, less capital requirements and it's just easier in a certain sense. Tom, how do you think AI will impact GDP?

So I think so. The US GDP is growing faster than anybody expected. I saw a statistic two days ago that the US GDP is far outpacing any other g seven economy. I know this is going to sound ridiculous. I think a bunch of it has to do with AI already.

I think so. I mean you think about like the time compression that some of these AI tools are. Look at the legal profession, look at the research for sales development reps. I was chatting with the founder of a publicly traded CRM company and I was asking him what do you think will happen with AE's and sdrs over the next couple of years? And he said, and this is an extreme view, but he says some significant fraction of BDRs and SDRs will no longer have a job because of the automations that we talked about, right?

Because of the research. And so they'll become junior AE's as opposed to becoming sdrs. And so it's probably too aggressive to say that it's having a pretty meaningful impact on GDP. But I'm optimistic and I think you'll see the US grow significantly faster than everybody expects and see materially better profitability for a lot of these companies. Microsoft's reporting 75% improvement in productivity from their sales.

Their software engineer servicenow is saying 50% Klarna cut two thirds of their headcount customer support. And so I think you'll see this like one time cascade of profitability over the next couple of years in lots of these companies. And the productivity per person, which has basically been flat for 15 to 20 years, will have a huge surge. So I'm really bullish. I mean is it reasonable to say over the past two decades software has been one of the primary causes and drivers of GDP growth?

Nick Moran
And now this is kind of the next extension that could even make the slope of that line more severe. I think that's right. I went back and I looked maybe six months ago at the impact of the personal computer on GDP growth, and it was relatively trivial. And so has most of software. And I think a lot of us have been selling software that we contend should meaningfully improve productivity.

Tomasz Tunguz
But I think this wave will actually fulfill that promise in a way that previous generations of software have. I mean, I can, instead of listening to a podcast, I can put it into an LLM and in 30 seconds have a summary of it. Saves me an hour. Right. Same for the YouTube video.

And instead of whatever drafting some long document, I can have the LLM start it. And it's far easier to edit it than it is to start with a blank sheet of paper. And so I think there are enough of those examples where we're saving 30, 40, 50% of our time that we should see it in the GDP. We should see better growth and more efficient growth and salespeople hitting five x their quota in a way that just wasn't possible two or three years ago. I agree with that, but I refuse to believe that software hasn't had a meaningful impact on GDP.

Nick Moran
I mean, I'm not sure how the analysis was done, but I mean, the efficiency outside of AI, the efficiency that one can get and the data access, the flow of information, the ability to transact in a healthier way with less local ARB opportunities and, I don't know, the world has become more flat. And I feel like it's been a rising tide for all countries. I think that's very fair. I think there's no doubt that it has had, and I need to go back and look at the data. But I was surprised it wasn't like a whole percentage point a year.

Tomasz Tunguz
Do you know what I mean? It should be us GDP is 23, 24 trillion. Adding 200 billion in GDP a year as a result of software, I think we'll see something like that from AI. Maybe it's like 20 billion a year from software, something like that. What are your thoughts on Reddit selling their data and how many Internet business models are going to transform going forward?

So I think we could see, and this is low probability guessing, but I think there's a probability that the business model of the Internet changes. Today, most of the Internet's commercialized through ads, and I spent some time helping build out some of the systems at Google. The core underlying technology that powers us is a cookie, and the cookie is going away. Facebook lost $10 billion in revenue, a quarter because Apple removed the mobile equivalent cookie called the IDFA, and it changed the way that people can target. And the third party cookie is also going away.

The ultimate performance of the ad market will go away. And so there's this pressure because the publishers, the people who produce websites, are making less and less money as a result of more and more privacy within the ad ecosystem. At the same time, these large language models require more and more data. So llama three, the 8 billion parameter trained on 15 trillion tokens, and we were talking about it internally. One of the questions we asked is, how many tokens are there in the world?

How many different words or subsets of words are there? And it's got to be something like 15 to 20, because we think Google is around 20. So in order to improve these models, Google is paying Reddit, and in order to access their data, 60 million a year. We could say, is that a lot of money or a little? I'd say it's probably underpaid.

And Adobe is paying $3 per minute of video footage in order to train their models to be able to compete. So there'll be this huge business development effort, which is all of these businesses that have proprietary data, or data that's really recent or data of a very particular kind, they'll sell them. And for Reddit right now, it's about six, seven, 8% of their revenue, but it's really high margin. It's 100% margin or thereabouts. And so I think we might see a change where the publisher has a far more predictable revenue source selling their data.

The publisher then specializes in a particular kind of content, then approaches Google and says, I see your model is weak in the following areas. We built this product feature that will produce a lot of data to solve this challenge that you have within the machine learning model. And then the big publishers are already spending a billion dollars a run on training. It's really valuable to have the most accurate search engine or LLM based search, and so they'll pay a premium for unique data or very recent data. And then the other dynamic that you have is people selling products want their information to be accurate in the LLM.

So if you have a large language model, that search product, that, let's say there's a next generation soft drink, and all the reviews are negative, but they're misinformed. The soft drink maker wants to influence those results. They might pay for the inclusion of certain kinds of data within, let's say, the context window and say, here are some nutrition facts about this particular kind of soda. Balance that against the negative reviews of Reddit. And so you might have this completely different instead of ads, the marketers of the products are paying to inject content within the context window.

And instead of ads being run on Reddit, Google is paying Reddit just for the data. And if that happens, then you have a pretty different Internet experience. The last point I'll make on this is five years ago, we would have thought it'd been crazy to pay for search as a consumer the way that we pay for Netflix. But today there are hundreds of thousands, if not millions of people paying for chat, GPT or Claude or Gemini, and we pay $20 a month for it because we want a better experience. Imagine if it looks like television and, right, you unbundling a television.

You buy AMC because you really like british police procedurals, like I do. Or you buy access to Bloomberg because you really care about financial information or NFL Sunday ticket. Imagine if the same thing in the search experience where I pay dollar 20 a month for search, and then I really care about financial information. So I buy the Wall Street Journal upgrade, and then I buy at the Bloomberg upgrade, and that premium data is now fed into my version of my LLM. And now all of a sudden, I'm paying 50, 60, $70, maybe $250 a month for an enterprise, for a very particular kind of search.

And there's still a free experience which doesn't have that data, but there's this premium search experience. And so maybe that's where this all goes. It's still kind of a twinkle in our eyes, so to speak. But you can see it if the. Experience is that much better and people are already paying for it.

Nick Moran
I mean, we had Adrian un from forward on the podcast, and he was talking about like, a ten x better healthcare experience. And his analogs are, do you pay for Spotify or do you listen to old radio? Do you pay for Netflix? Or do you use an antenna for free broadcast? I mean, if the experience is ten x better, people will pay for it.

It calls into question, though, what is the future for publishers, right? Like, will I have a website or will I just have like a. Effectively a database where I put the appropriate information that gets called in an effective way, and then I get paid by chat GPT every time it's requested? I don't know. Yeah, substack is that way already, where we aggregate content, and Google, in a certain sense, is that way.

Tomasz Tunguz
Some publishers will have either uniqueness of content or an existing audience large enough to be able to build directly, but there's probably great. It's like public access television versus Netflix, right? It's possible, but I think it might be harder and harder. Tom, if we could feature anyone here on the show, who do you think we should interview and what topic would you like to hear them speak about? Boy, this is a phenomenal question.

I think I would really like. I think Lenny Ricciski, I'd pick Lenny. I think he has built a marvelous business for content marketing, particularly focused on one initial segment. And as we think about what the future of large language models means for publishers picking up on that part of the conversation, he would have some brilliant insights. And in addition, for all the product people in the audience, his depth of understanding that particular discipline and domain is unparalleled within the ecosystem.

Nick Moran
Amazing. Tom, give us a book, article or video that you would recommend to listeners. I recently read a book that was called the New Map, which is written by David Juergen, who won the Pulitzer for another book about the history of oil. And the new map is a book that talks about how the US becoming a net exporter of energy, both natural gas and oil, has completely changed geopolitics. And this is a book that's published in 2018 or 2019.

Tomasz Tunguz
It presages the Ukraine conflict and it's just a brilliant summary of how the world will be different in the next ten years as a result of that one change that we no longer rely on external sources of energy. Amazing. I want to plug it into chat GPT and get the summary. Tom, do you have any habits, tactics or techniques that are a secret weapon? Persistence, I think.

I was a rower in college and our freshman year coach taught us to love the sport and he taught us to love it because it just requires a lot of repetition. But if you start with the love of something, you will continue and continue. And so just repetition. How did he teach the love? He put us on the water, so he put us in a boat, he took us to beautiful places, he built camaraderie in the boathouse and he loved it.

And so that, I mean, it's when someone like one of my family members really loves classic rock. And he spent two or three years educating me about classic rock. I knew nothing about it. And now I can tell you my favorite guitar solo happens to be prince at the 20 2012 induction for the hall of Fame. And so, like, if somebody shows you what they see and why they love something, it won't work every time.

But his name was Joe, and Joe really it spoke to me, and I take a lot from that experience, really, that sort of long term commitment. And there's this great Michael Phelps ad, which I'll never forget, but it was in the Olympics, which was, what you do in the dark determines what you do in the light. In other words, what you do in your basement. Working really hard is ultimately how you get to be where you are. And no one for which that's more true than Phelps.

Nick Moran
Good stuff. And then finally here, Tom, what's the best way for listeners to connect with you and follow along with theory? Yeah, there's a blog@tomtoongooz.com, or follow us on LinkedIn, where we publish all of our content there. And feel free to reach out through LinkedIn messaging. We're quite responsive.

Tomasz Tunguz
Very good. He is Tom Tungus, and the firm is theoryventures. Tom, thanks so much. It was a pleasure. Pleasure was all mine, Nick.

Thanks again for having me. All right, sir. Thank you.

Nick Moran
All right, that'll wrap up today's interview. If you enjoyed the episode or a previous one, let the guest know about it. Share your thoughts on social or shoot them an email. Let them know what particularly resonated with you. I can't tell you how much I appreciate that.

Some of the smartest folks in venture are willing to take the time and share their insights with us. If you feel the same. A compliment goes a long way. Okay, that's a wrap for today. Until next time, remember to over prepare, choose carefully, and invest confidently.

Thanks so much for listening.