Hyperscaler strategy in AI, the application landscape heats up, and what we know now about agents with Sarah and Elad

Primary Topic

This episode is a host-only discussion on the recent developments and strategic shifts in the AI landscape, focusing on hyperscaler strategies, model advancements, and the evolving role of AI agents.

Episode Summary

Elad Gil and Sarah Guo explore the dynamic AI environment, discussing the rapid emergence of new AI models and the competitive hyperscaler strategies shaping the industry. They delve into the implications of model accessibility, the economics of AI development, and Microsoft's strategic AI initiatives. The conversation highlights the democratization of AI through open-source projects and smaller-scale computational models, challenging the dominance of major players. They also examine voice cloning technologies, regulatory challenges, and the societal impact of AI. The episode provides a comprehensive analysis of how AI technologies are evolving beyond mere tools into agents with significant operational and strategic potential across various sectors.

Main Takeaways

  1. The AI model landscape is becoming more democratized, challenging previous monopolistic trends.
  2. Significant advancements in AI are reducing the cost and computational demands of powerful models.
  3. Corporate giants like Microsoft are intensifying their AI investments, influencing the market's direction.
  4. Voice cloning and other AI technologies face regulatory and societal scrutiny, impacting their adoption.
  5. AI's evolution into 'agents' could redefine user interactions and operational efficiencies in multiple industries.

Episode Chapters

1: Introduction and Recent AI Developments

Elad Gil sets the stage for discussing AI's recent advancements and strategic shifts in the hyperscaler landscape. Elad Gil: "There's so much going on over the last couple weeks in AI... a host only discussion."

2: Democratization of AI Models

Sarah Guo discusses the changing economics of AI models, emphasizing increased accessibility and reduced costs. Sarah Guo: "The model landscape completely shifts versus what people expected to be."

3: Microsoft's AI Strategy and Market Impact

Elad and Sarah analyze Microsoft's AI strategies and its implications for the broader AI ecosystem. Sarah Guo: "Microsoft...saw you as a live player."

4: The Future of Voice Cloning

The hosts discuss the technological and societal implications of voice cloning technologies. Elad Gil: "I think you can do an attestation where when you upload a voice for the first time..."

5: Investment Trends in AI

Elad Gil explains the funding dynamics in the AI space, particularly the role of venture capital and hyperscalers. Elad Gil: "Microsoft's last quarter, they mentioned Azure revenue, which was about 25 billion for the quarter, grew by..."

Actionable Advice

  1. Explore open-source AI models to understand their potential and accessibility.
  2. Stay informed about the regulatory environment impacting AI technologies like voice cloning.
  3. Consider the implications of AI in your business strategy, especially if operating in tech-heavy industries.
  4. Evaluate the long-term strategic impacts of AI investment, focusing on both market trends and technological advancements.
  5. Engage with AI as a transformative agent, not just a tool, to fully leverage its capabilities in operational and strategic contexts.

About This Episode

This week on a host-only episode of No Priors, Sarah and Elad discuss the AI wave as compared to the internet wave, the current state of AI investing, the foundation model landscape, voice and video AI, advances in agentic systems, prosumer applications, and the Microsoft/Inflection deal.

People

Elad Gil, Sarah Guo

Companies

Microsoft, OpenAI, Databricks

Books

None

Guest Name(s):

None

Content Warnings:

None

Transcript

Elad Gil

Today on no priors, we are going to have a host only discussion. There's so much going on over the last couple weeks in AI, we just thought it'd be good to take a big deep breath and a step back and talk through some of the really big changes that seem to be happening in the landscape. Sarah, there's been a lot of new models that have come out over the last, even just week or two. Does Claude Grok databricks a variety of folks who launch things? What do you think?

What's going on? Yeah, I think it's a huge update for most people's priors versus a year ago. Right. I think it's very likely at this point that you end this year with a handful of GPT four level models and that some of those are open source. Right.

Sarah Guo

And so I think Mistral first, but then also databricks with DBRX, they change the point of view on what you can do with a relatively small amount of compute, tens of millions of compute, and then also from a scale perspective. The databricks team in particular just declared a very strong point of view that they call mosaics law, where a model of a certain capability will require a quarter of the dollar capital investment every year due to a bunch of improvements on the hardware and algorithmic side. And I don't know if that's grounded in any particular technical belief, but I do think that the model landscape completely shifts versus what people expected to be. I think most people expected to be quite monopolistic or at least oligopolistic a year ago. Right.

And I think there's still a really big question at the state of the art, because if you go up one level of scale in terms of capital investment, if you're still the dominant factor is compute scaling, I think that question remains. But there's an awful lot you seem to be able to do with the GPT four level model. So I think the net impact of that is pretty good. From the application or the enterprise adoption side. Yeah, it definitely feels like the most cutting edge, smartest models.

Elad Gil

In some sense, you're going to end up with an oligopy at least in the next couple of years just because of the scale of capital needed, but also just how far ahead you start to be, as you have a model that can help you build the future models, even just things like data labeling or certain forms of reinforcement, learning through AI feedback, or other things like that. And so as you get better and better model capabilities, you start bootstrapping the next generation of models. Although obviously you have to do other breakthroughs to get there. And then to your point, I think under that you have this broader swath of different models and companies and things that are available. One could argue part of what that's going to do is just flip some of the value, capture the revenue, the margin, the people, whatever metric you want to use over to the clouds because they're going to be hosting all these things whether it's llama or whether it's Claude or whether it's one of these other entrants.

There's just going to be a lot of room I think for the clouds to make money over time as well, which I think is a little bit under discussed in terms of who captures value in this market besides the model providers related to the clouds. How do you think about the recent influx of Microsoft deal? I think the first reaction is like they're true believers in AI at Microsoft and saw you as a live player. Right. And so I think these sort of obvious observations with Microsoft here would be they both see a product, they see a product opportunity that they need AI aware, product leadership and research leadership to go after across Microsoft properties.

Sarah Guo

Despite all of the initial real traction around copilot in the code domain, I think we're still far short of what revenue Microsoft actually expects to drive in terms of across its productivity suite and in search. And they're ambitious to go after that. And I think this is a leadership change that supports that. Now they're clearly still working with OpenAI given like direct statements from both companies and the Stargate data center effort. But it's also hard from the outside not to see this as somewhat of a hedge, right?

Not in a criticism of OpenAI, but if you are a true believer that this is the most important technical driver for your company and then you're reliant on an outside player, that's not a position that a trillion dollar company likes to have. Mustafa has had more capital and more compute available to him than the vast, vast majority of entrepreneurs and research teams. And I think one big argument you can make from Microsoft is just like you have direct access to that if you're focused on the research, right? And so I think it supports what you said where the spend required at the perhaps not even this generation, but the next generation really requires a certain level of sponsorship that is challenging for most independent players. But it's a directly opposing view to, for example, the databricks narrative today.

Can I ask you a more domain specific question? OpenAI just announced voice cloning. The interesting thing here is you have companies like eleven labs with really great traction. Other competitors out there focused on different feature sets like latency that are progressing. But let's go ahead and assume for argument's sake, given the OpenAI announcement of voice cloning, that both OpenAI and DeepMind and maybe others have very, very good voice video image, song models.

And the question has been like, will they release that? And what does that do to the market beyond text APIs? Yeah, I mean it seems like a lot of the hesitancy as far as I can tell, for these companies to go aggressively after the voice side is just regulatory, societal concerns, right. I think one of the concerns people have on the voice cloning side is do you end up with different types of deepfakes or other things where it's much harder to tell with a voice what's going on? There's obvious ways around that.

Elad Gil

I think you can do an attestation where when you upload a voice for the first time, the person actually who owns the voice in some sense whose voice it is, can actually do some form of attestation or other things, or there's other ways to do verification. My sense of the market is multiple players have this technology, but they've been holding it back and in some cases they may have had it for a year or two now because, you know, there's also been open source versions of this like tortoise and things that the Suno team was working on earlier and things like that. So I'm surprised in some sense by how little competition there's been. Okay, let's, let's characterize like the rest of the investing landscape and then elad himself, you know, driving the rest of the landscape. Like do investors keep funding general foundation model training efforts from here or more, more specialized ones.

Sarah Guo

Can you talk about what you think those dynamics will be? It's interesting because if you look at the scale of capital that's gone into foundation models, venture capitalists have put hundreds of millions of dollars into individual companies, but then the big cloud providers or big tech companies including Nvidia have put billions into companies. And so most of the funding of this market is actually being done by the hyperscalers and a few other big tech companies. And that's true in China as well, right? It's the really big pre existing Internet companies that are funding everything.

Elad Gil

And so the VC's are almost a bootstrap and the bootstrap is sometimes tens of millions and sometimes it's hundreds of millions. But to get to real scale, it comes from other places. And to the points earlier on the cloud side, there's a strong incentive for the clouds to keep funding these things as long as it drives cloud revenue. So for example, Microsoft's last quarter, they mentioned Azure revenue, which was about 25 billion for the quarter, grew by, I think it was 5% due to AI related products, which would be another billion billion and a half a quarter in revenue off of AI. Thats 6 billion 5 billion annualized and its probably still growing.

And so if you look at it from their perspective, theres a strong incentive to fund these things because theyre driving so much utilization and usage. So I definitely think well see more funding going into the market. I think on the foundation model side, from a venture capital or angel investor perspective, I think were going to see fewer new language models, but we should see models in a lot of other areas. And we have new things happening in music. We talked a little about text to speech with eleven, but then there's a bunch of other areas around video, image, gen physics models, biology models, material science, robotics, et cetera.

And so there's this broad swath of other types of foundation models that are starting to get funded or who are accelerating in terms of the funding cycles there. One can anticipate, well see a similar thing there where well probably have venture capitalists do the first set of rounds and then itll shift over time to large strategic players who really view that as things that are beneficial. And then there may be other areas where people are doing really interesting things. Applied intuition is a good example of a company thats doing simulation software, and theyve been doing really interesting things in terms of modeling behavior there for years now. Right.

So I just think there's a lot of um, a lot of room to still do lots of interesting things on the foundation model side. But I do think it's going to continue to shift over time. What domains do you find misinteresting, or what's your framework for figuring out which of these things are going to be not only important societally, but also good businesses? One basic way to look at this, which is what are the capabilities that we are still missing or struggling with? Right.

Sarah Guo

And so one, um, one thing that I've been interested in for a long time is just how do you operate on time series with uh, more general knowledge and reasoning? Right. There are so many ways in which, uh, being able to better understand time series would be really, really valuable. Right. And it's a very unsolved problem.

If you look at anomaly detection, anything that is an infrastructure monitoring, a security, a healthcare consumer behavior use case. There's domains, and then there's sort of other dimensions, like context windows and how you handle a particular type of data. That's one domain where I just feel like there's huge commercial applicability and interesting architectural approaches that could allow you to break through. Then I think there's like we take the existing advancements in language models and all people from all fields applying machine learning are now paying attention to this and working at some of the best labs. And if they go look at the domains that they've been traditionally focused on.

So, for example, robotics and biotech are two areas where I've been spending a bunch of time. There's something in the water where a bunch of very smart people are showing leading results on traditional benchmarks with these approaches. And the core of it is that you get several smart teams at once thinking that a domain can be solved with a foundation model that has more generality and then some cleverness and approach. I don't mean that in a trivial way, because, for example, in robotics, most machine learning, people will look at it as a data collection problem where your Internet data, even video data of a bunch of actions just isn't enough. We need embodied action data, like controls data in some way.

And then people who have very, very different ideas on data collection both you mentioned simulation, but also real world efficient collection as kind of a core question for a number of these companies. And then also different ideas in terms of how to split up the value chain from are you doing software that will apply to many different types of hardware? Are you doing a, you know, a verticalized company? And I, I am inclined to believe that, you know, a lot of these domains are going to be solved, and it's just a question of like, picking the product path through. I think there's a set of things that are just intellectually interesting.

Elad Gil

I think, for example, the biotech models were really cool where it seemed like in some examples, long context, windows make protein folding easier, which is really neat. And then there's the societal implication side, and there may be some things that never make money, but are incredibly societally useful. And then lastly, there's like, what are big commercial applications? The thing that I find very fascinating is there's one or two of these areas where I've seen a dozen teams all enter at once, and there's so much white space in AI right now. There's so many different things to do.

I've actually started incubating a few things again, just because people just aren't working on certain areas that seem kind of obvious. And I've looked for companies. I'd rather back somebody than incubate something, which is my charge. But it's surprising to me that half a dozen people, or a dozen people are all really smart and really talented and really vetted in the field, will all jump on one thing and then these wide open opportunities somewhere else. And so it's a very odd market right now, or you don't see the fast follows for certain things that are clearly working that you'd expect.

Sarah Guo

What's your explanation of that? I think it's a mix of what people view as societally significant and therefore they want to work at it. But also I feel like any set of startup waves always have these memetic actions where there's almost these memes of what to build that spread. And that's happened in prior technology waves too. And the memes are often correct.

Elad Gil

Not always, but often. I mean, a great example of that would be in the mobile wave. I knew of literally a dozen different photo apps where people had photo upload from their phone and they'd go viral and they'd grow like crazy, and then they'd all die because there was no sustainability. And then Instagram came out with the compact format, the filters. There was actually a company called Camera plus that filters before that, but they were charging $20 a download because they just wanted to monetize it.

It was sort of this indie dev shop. It never wanted to grow very big that was doing it. And so Instagram had similarly started working on filters, a common format and then a feed and more of a social product. And that's the thing that became sticky and sustained. But there was at least a dozen other ones I knew of where I knew the founders who were doing it.

That was memetic but correct. But it took the right product substantiation to do it. In robotics, the correct product substantiation is harder, or in biotech it's harder. Or in llms it's harder, because these are very big mass markets. And I think people are just excited by the scale of commercial opportunity for these things too, right?

At the same time, there's other markets where there just aren't that many players. And so the question is, why is that? And what's the difference? And it's fascinating to watch this happen again and again. Mimetic things are often correct.

So there's a dozen search engines before Google, there's a dozen social networks before Facebook, et cetera. It's really interesting, because it's an entire market driven by technologists right now, right? Everybody's getting nerd sniped into a few areas. And that, as you said, may be right. But I think driving factors for me is just how much cross pollination there is between people who would understand the edge of research and then particular domains.

Sarah Guo

The ability to operate on financial or accounting data with these models seems like a very useful commercial capability and actually perhaps much more tenable and a sort of linear path to commercial value. Thank some of you. The things that we're describing, like if you go talk to a experienced executive in the last generation of great biotech platforms, or, you know, successful robotics players, these are not, they're not easy industries traditionally, right? But there's a lot of accounting and finance software in the world. There's a lot of professional accounting services.

And I just think it's like, there's just not that much interaction between accountants and controllers and systems engineers and research scientists. That's part of my explanation of it. Yeah, it's also just there's been a few technology breakthroughs across these different fields that suggest that suddenly some of these things are more tractable than people thought. And so I think there's also a technology. Why now?

Elad Gil

Or at least some proof points that make people excited to go and build out these things that have now been discovered and shown to be possible. What do you think are some of the other areas that is worth looking at? Big changes. I mean, video may be an example. I know you're involved with a couple of companies there.

Do you want to talk about the video landscape and some of the shifts you're seeing? Yeah, I actually think that it is kind of a mistake to look at it as just a modality to be solved. And we're going to have and the rest of the Sora team from OpenAI on no prior, so we should certainly ask them. But I think video and the control of video for a commercial application is a little different than video generation. Understanding as belongs in a general multimodal model.

Sarah Guo

And we'll see if the persistence of that is two years versus ten years. This is one of these areas where it is so obvious that the demand is unbounded for commercially viable or just shareable, high quality video generation. And then it's not one form, right? Like many people cut it into a roll and b roll. And today, despite the extraordinary advances from 1 second, very small clip with obvious artifacts, to where we are with things like Pika and Sora today, we still have a very long way to go, both in terms of interfaces and controllability, like length, quality, et cetera.

And so I think it's an area that deserves a lot more investment. But I think the set of things that you might want to make those assets commercially viable is actually a very deep product problem with specific research involved. And as you mentioned, one of the companies that I'm an investor in is a company called hey Gen that has grown really, really quickly over the last year with prosumer and commercial attraction focusing just on video avatars, or the ability to have people, a clone of you, or a spokesperson for an organization, speak on film to a camera. Film camera, right. Generated pixels.

But it's very cool to see the specific ways that you can progress this if you are focused on it, because we get so many requests from end users who have very little idea how or no idea how any of the technology behind the scenes works. And people are very creative and they want full control. And so one of the releases from the company this past weekend was, I have video of an avatar moving, walking around, going through gestures, and I want to replace what they were saying. One of the things that was very much learned, you mentioned the last generation of mobile image apps. One thing you learned from the last generation of video applications was if you take certain dimensions of creativity away, or you just control them and you lower the required quality of different dimensions, you make it much, much easier for people to create.

Right. And so I think that's one path that companies like hen are going down. I want to ask you about one thing that I think Devon and cognition has really woken up a lot of engineers and product teams too, which is how much space there is in exploring agents and different model user experience. Yeah, it's interesting. I was actually working on a post on this in terms of, it feels like there's a shift in terms of agentic uis and what they look like, because I think a lot of what people were doing before was modeled on either chat, BT or copilot, where is either forms of chat or just auto complete, different nature, sort of inline or things like that.

Elad Gil

And I think the Devon Ui was really interesting from the perspective of it was a new way to think about how you display information in terms of what an agent is doing and even just seeing. In Devon, they have four tabs. There's the plan that you're doing the shell, the code that's being written, and then there's a chat interface. Here are the steps that I'm doing. And so you know what's coming you can see the code that's being written and you can kind of redirect the agent to do other things.

If you think it's going down the wrong path or what it's browsing, it'll prompt you for things like API keys or tokens, and so you can interact with it and resteer it along the way. And I've started to see that UI now pop up in other use cases where people whose products were demoed to me a week or two before have suddenly shipped it more down this direction if they're doing anything agentic, because you realize that most people don't want to just sit there and wait and wonder if the agent is actually doing what, what they want. They want to be able to see it and maybe interrogate it or interfere and put things on the right path so it gets it done faster. And the way to think about agents today, I feel, is almost like a junior intern. They're very eager, they're trying really hard to please, but they still have a lot to learn, or you kind of need to give some direction.

And so this is a mechanism by which you kind of almost get that update email from the intern saying, hey, here's what I'm up to. And you say, oh, actually, could you go do this other thing a little bit different? Or have you considered doing these three things? And so I think there's a lot of really interesting things that'll be coming as we rethink uis, and then eventually the entire UI will go away once agents get good enough. Right?

And so I think this is kind of the intermediary step of human in the loop. And eventually, as agents get smarter and smarter, and that's going to be through breakthroughs in the base models, but also breakthroughs in reasoning and other areas, we'll start to see. Actually, I think some of the UI go away over time. And so I think we're in the sort of early form of this stuff, and it's very exciting to just see these new paradigms being, uh, created and happening. You know, one thing that you mentioned that I think is kind of interesting is that a lot of the big use cases in AI, surprisingly, are prosumer ones.

Right? That's, that's, um. Hey, Jen, that's chat GPT. That's perplexity. That's a variety of things.

How do you think about the whole prosumer market? Like, do you think the first real wave of AI is prosumer in some sense, or. I think it structurally has to be right? Just based on like the pace of ability to adopt things in the enterprise prosumer applications, they get to carve this path of direct user value which we think we can create a lot of with AI capabilities that is neither like, oh, we need to fight existing incredibly strategic and embedded, for example like consumer social networks with their network effects, but actually have these capabilities that generate their own distribution. And it's not that you don't need to be smart about distribution to consumers and prosumers, but really these companies are growing on the backs of just great product that people want.

Sarah Guo

It is very, very hard to get to millions of enterprise users in a year simply because of the decision making and security processes and roadmap involved in getting a large customer to change something internally. And the risk tolerance of all those versus I want to do something that is $10 a month valuable to me or makes me more productive, is a much faster decision. So I think structurally it's something we should expect. I also think it's just interesting that the canva numbers are quite public now, and a billion of its 1.7 billion in ARR is prosumer. And so I think I'm just sort of respecting the data here.

The argument for some of these prosumer companies is that you can often grow into a more professional set of use cases, and the increase in capabilities is creating huge markets where they didn't exist before. Right. And you should see many more of these companies, and I do believe that argument. I think theyre going to be sneaky big markets. Yeah, its kind of notable because if you look at the first Internet wave, if you look at the nineties, the first wave was all consumer and then the second wave was b two B.

Elad Gil

And so they used to talk about it about b two C versus b two B. Right. And I think we have this odd parallel or analog here, where the very first adopters of this technology are consumers and prosumers. And then there is some enterprise related stuff like what Harveys doing or others. But the initial wave seemed to be more driven by people who were using it in their personal and professional lives first, which is very similar to what happened with email and the Internet more broadly.

And so it's an interesting parallel. And maybe this is the dynamic of truly fundamental technology shifts. You could argue that was also a lot of the wave of mobile, all the really big new companies, or most of them were consumer companies, between Uber and Instacart and all the rest, WhatsApp, etcetera. Very interesting parallels. Find us on Twitter, opryorspod, subscribe to our YouTube channel if you want to see our faces.

Sarah Guo

Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week, and sign up for emails or find transcripts for every episode at know dash pryors.com.