AI hype vs. AI reality

Primary Topic

This episode delves into the discrepancies between the expectations set by AI advancements and the actual performance and reliability of these technologies in practice.

Episode Summary

In "AI Hype vs. AI Reality," The Globe and Mail explores the recurring pattern of AI technologies failing to meet the public and corporate expectations. Joe Castaldo discusses several high-profile AI missteps, such as biased image generators and unreliable AI summaries in search engines, illustrating how these technologies often fall short. The episode scrutinizes the rush to release AI products without adequate testing, driven by fierce competition and the desire to be first to market, often at the expense of accuracy and public trust. This conversation sheds light on the challenges and ethical dilemmas facing AI development, emphasizing the need for improved regulation and more responsible deployment practices.

Main Takeaways

  1. AI technologies frequently fail to live up to the hype, leading to public disappointment.
  2. Companies often rush AI products to market to stay competitive, despite potential inaccuracies.
  3. Mistakes in AI applications can erode public trust and raise serious ethical concerns.
  4. There is a growing need for better regulation to ensure the safety and reliability of AI technologies.
  5. Despite advancements, AI still struggles with inherent issues like bias and inaccuracy that are not easily fixable.

Episode Chapters

1. Introduction

Overview of AI's promise vs. its reality, with examples of AI failures that have impacted public trust. Manika Ramon Wilms: "The promise of generative AI technology is that it will change our lives for the better."

2. The Pattern of AI Launches

Discussion on the typical cycle of AI product launches and their tendency to underdeliver. Joe Castaldo: "There's a lot of hype that this new tool or model is going to improve our lives... and then it falls flat."

3. Case Study: Google's AI in Search

An analysis of the flawed rollout of AI-generated summaries by Google, highlighting the issues with AI accuracy. Joe Castaldo: "Google only rolled it out in the US... and very quickly, users started noticing that these AI overviews could be wrong."

4. The Silicon Valley Mindset

Exploration of the minimal viable product strategy and its impact on the quality and ethics of AI product releases. Joe Castaldo: "The minimal viable product... a bare bones version of something that a company will build and release to test market demand."

5. The Complexity of AI Problems

Discussion on why generative AI problems are complex and hard to fix, contrasting with previous tech solutions. Joe Castaldo: "Generative AI is different in that this technology is a bit more unwieldy. It doesn't always behave the way that you want it to."

Actionable Advice

  1. Be skeptical of AI product claims and maintain a critical perspective.
  2. Research AI technologies before adopting them to understand their limitations.
  3. Advocate for stronger AI regulations to ensure ethical deployment.
  4. Stay informed about AI developments to better navigate their impact on society.
  5. Encourage transparency from companies regarding the functionality and testing of their AI tools.

About This Episode

Artificial Intelligence has been creeping into our lives more and more as tech companies release new chatbots, AI-powered search engines, and writing assistants promising to make our lives easier. But, much like humans, AI is imperfect and the products companies are releasing don’t always seem quite ready for the public.

The Globe’s Report on Business reporter, Joe Castaldo is on the show to explain what kind of testing goes into these models, how the hype and reality of AI are often at odds and whether we need to reset our expectations of Generative AI.

People

Joe Castaldo, Manika Ramon Wilms

Companies

Google, Microsoft

Content Warnings:

None

Transcript

Manika Ramon Wilms
Big tech companies have been leaning into artificial intelligence.

Joe Castaldo
We unveiled the new AI powered Microsoft, Bing and edge to reinvent the future of search. We want everyone to benefit from what Gemini can do. You're using it to debug code, get new insights, and to build the next generation of AI applications.

Manika Ramon Wilms
Mike, it seems like you might be gearing up to shoot a video or maybe even a live stream. Yeah.

In fact, we've got a new announcement to make. Is this announcement related to OpenAI? Perhaps it is. And in fact, what if I were to say that you're related to the announcement or that you are the announcement?

Joe Castaldo
Me?

Manika Ramon Wilms
The announcement is about me. Well, color me intrigued.

The promise of generative AI technology is that it will change our lives for the better.

But so far, many of these AI rollouts are not living up to expectations.

Joe Castaldo
There's the Microsoft Bing chatbot last year that expressed love for a New York Times reporter and suggested the reporter leave.

Manika Ramon Wilms
His partner, Joe Castaldo, is with the Globe's report on business and has been extensively covering artificial intelligence.

Joe Castaldo
Google's Gemini image generator, which did things like depict America's founding fathers as not white chatbots, inventing legal citations and getting lawyers in trouble. Even back in 2016, Microsoft released a chatbot on Twitter called Tay, and people very quickly figured out how to make it say quite heinous things. And Tay was never heard from again.

Manika Ramon Wilms
Today he'll tell us why AI hype is often different from reality, why companies roll out these technologies that don't seem quite ready and what that does to public trust.

I'm Manika Ramon Wilms, and this is the decibel from the Globe and Mail.

Joe, great to have you here.

Joe Castaldo
Thanks for having me.

Manika Ramon Wilms
So, Joe, we've now seen a number of launches for new AI tools. Just tell me, though, about, I guess, the usual pattern that these launches often tend to follow.

Joe Castaldo
Yeah, there does seem to be a bit of a pattern. Generally, there's a lot of hype that this new tool or model is going to improve our lives, improve the way we work or get information, and then the thing is released to the public and it kind of falls flat. It looks a little half baked. People very quickly find out all the ways that this new model or application fails or makes things up and gets things wrong, or it says things that are just unhinged and share examples online. And there's sort of this negative media cycle of bad headlines and bad people.

Sometimes the company acknowledges the mistake and promises a quick fix, and it just seems to happen again and again. And Google being the latest example with its AI overviews in search.

Manika Ramon Wilms
Yeah, let's talk about that as an example because I think this is in people's minds a little bit because this was fairly recent.

What happened there? How did this not go exactly as planned for Google?

Joe Castaldo
Yeah. So this was a big change, actually, to Google search, which makes the company billions of dollars and it's our gateway to the Internet, has been for years. And so they started putting on top of search results, an AI generated summary of whatever the query was.

Manika Ramon Wilms
And this was probably, this wasn't in Canada, it was only being tested in a few places, right?

Joe Castaldo
Yes. Google only rolled it out in the US to start last month and their plans to roll it out into other countries, including Canada, down the road. And Google pitched this as that's a way to get information faster and easier. You don't have to do the hard work of clicking a link yourself. And so again, very quickly, users started noticing that these AI overviews could be wrong or just flat out nonsensical. So an AI overview recommended eating rocks, eating one rock a day for the nutritional benefits in pizza recipe. It included glue as a way to get that cheese to stick to the pizza.

And they're just flat out factual errors. One query was, who was the first muslim president of the United States? And the AI overview said Barack Obama, perhaps picking up on this conspiracy theory that he's some kind of secret muslim. So Google responded fairly quickly and said these instances are rare, but they also made about a dozen technical fixes, they said, and they were clear that this isn't a case of AI hallucinating, which is the phenomenon where an AI model just makes stuff up.

Manika Ramon Wilms
That is actually the term people use, hallucinating them.

Joe Castaldo
Yeah. And it's, you know, people debate if that's an appropriate word or not, but, you know, AI generative AI makes stuff up, basically. And they said it wasn't so much that it was more pulling from websites, maybe, that it shouldn't have been. And so they tried to address that problem and it just has the appearance of something that was released when it wasn't quite ready.

Manika Ramon Wilms
Yeah. So, I mean, this seems to fall into the pattern of what you were talking about earlier, Joe, right. Where we see this happen with these new releases that aren't quite set for the public. But I guess the big question is, why is this happening? Why does this continue to happen?

Joe Castaldo
Yeah, there's a few reasons. I mean, I think the obvious one is just competition.

When chat GPT was released toward the end of 2022, it really touched off an arms race where every company had to do AI. Now, generative AI was seen as the next big thing, the next huge market opportunity. So companies are willing to make mistakes and risk some bad pr in order to get something out in order to be seen as first.

If you're too slow, there are consequences. Google, for instance, has been developing generative AI internally for a long time, but it wasn't necessarily releasing everything to the public when OpenAI released chat. GPT. All of a sudden there's a lot of pressure on Google to start doing something with all of this research.

And as a quick example, Google had an image generator in early 22 called Imagine, but it wasn't released to the public. And the team that made that later left Google and started their own company in Toronto called ideogram, partly because they felt they could move faster outside of Google. And look at Apple, too.

It's one of the few big tech companies that wasn't really doing anything with generative AI. It's a device company in many ways.

There are a lot of questions about what does AI mean for Apple. They finally had an event recently where they're going to partner with OpenAI and integrate generative AI into iOS and a bunch of different ways. And the stock price is up quite a bit since that event.

Manika Ramon Wilms
In response to that.

Joe Castaldo
Yeah, I think because investors, there's some relief on the part of investors who can say, okay, finally, Apple is doing something with AI now.

Manika Ramon Wilms
Okay, so what you're describing is really this pressure on these companies to kind of keep up with each other and roll these things out, even if they're not totally ready yet.

It's interesting. I think we should dig a little bit deeper into this idea because this concept of releasing something, even though it's not really set, seems to be, I guess, part of the Silicon Valley mindset. If I can say that, Joe, it goes beyond just AI. Why is it the way that these companies tend to operate?

Joe Castaldo
Yeah, there's a couple of things there. I mean, there's this concept of the minimal viable product, which is like a bare bones version of something, some tool, some application that a company will build and release to test market demand, customer need, rather than spend a lot of time and money releasing something complete that might flop. So it can be a smart way to do things.

Manika Ramon Wilms
Make sure there's a market before you throw a lot of money into it then.

Joe Castaldo
Exactly, yeah. And also the move fast and break things ethos has been part of Silicon Valley for quite some time.

Facebook being the poster child for that. When the company was really growing. It endured a lot of scandals about privacy concerns and data breaches and being hijacked to manipulate elections and so on. So, yeah, that mentality is there in tech, but I think there's something a little different going on with generative AI, like Facebook. For all of its faults, the core product more or less worked. You add friends, you post pictures you like, you comment, you get served up ads. Generative AI is different in that this technology is a bit more unwieldy. It doesn't always behave the way that you want it to. It makes mistakes. It outputs things that are not true. And that's not a bug that can be fixed with some more work and coding. It's just inherent in how these AI models work.

And companies are doing lots of things to try to improve accuracy, but it's a very, very hard problem to solve. And so until that's addressed, we'll see more flubs and mistakes and launches that go sideways, I guess.

Manika Ramon Wilms
Why is it that these problems are so complex when it comes to generative AI? Maybe that's obvious, but I guess, why are the problems that we're seeing now, why aren't they as easily fixable as previous problems, like you were saying with Facebook before?

Joe Castaldo
Yeah, I mean, this is a simplification, but with a chatbot, for example, or the large language model that underlies the chatbot, it's effectively predicting the next word in a sequence based on tons and tons of data that it has analyzed.

But an AI model has no idea what is true and what is fiction.

Manika Ramon Wilms
We'll be right back.

Jo, can we talk a little bit about how these products are tested? Because, of course, companies are testing these models before they do roll them out.

Do we know what that means, though, exactly what kind of tests are actually run on these tools?

Joe Castaldo
Yeah, there's a concept called red teaming, which is fairly big, where employees, a team of employees, tries to test the vulnerabilities of an AI model.

Can you make this AI chat bot say something it's not supposed to? Can you make it say conspiracy theories or something discriminatory?

Manika Ramon Wilms
So red teaming is kind of like trying to break it in a way to see if it will break?

Joe Castaldo
Yes, exactly like ethical hacking in a way so that you can better understand the vulnerabilities and fix them before it's released to the public.

So that's a big focus, but it's not sufficient.

Manika Ramon Wilms
So I guess still, though, why are we still seeing these problems, even with these measures, this testing that is happening, why is that still such a struggle.

Joe Castaldo
Yeah. It's hard to know without having insight into a particular company before a release. But red teaming, there are tensions between needing to commercialize something and making sure it's safe.

There has been a shift where generative AI previously was. It's kind of a research project.

University labs were working on it, corporate labs were working on it. Again with an eye to commercialize something down the road, it wasn't seen as ready for public release, so there's presumably some tension there with red teaming. Do they have enough time?

Are there enough people?

Is the team diverse enough to find bias and other vulnerabilities?

Manika Ramon Wilms
This is something that we've talked about. Generally, that can be an issue with tech.

How does that play into this?

Joe Castaldo
Well, so let's take image generation, for example. It's a well known problem that image generators have bias and stereotypes kind of built into them, and that's just a reflection of our society and our bias and our problems, because AI models are trained on data that we as humans put out there in the world.

So if you ask an image generator to produce a picture of a CEO, chances are it'll be a white man, a doctor, a man, a nurse, a teacher, a mugshot. Chances are it might overly represent black people, for example. So it takes a diverse team to think about these issues and try to address them before launching. In Google's case, with its Gemini image generator earlier this year, it may have overcorrected, so people found that it was producing historically inaccurate pictures. So again, like America's founding fathers as black people, for example, or German World War two soldiers depicted as people who are not white. Google was trying to inject more diversity into the output, but went too far, perhaps.

And Google paused image generation on Gemini so that it could work to address this. There was a bit of a narrative that like, oh, AI is woke. It's too woke. That's the problem, which is just silly. It's not about that. It's just an indication that these models are hard to control, hard to have accurate, predictable output, and some of the blind spots of teams that are developing them.

Manika Ramon Wilms
Yeah, yeah. It seems to really illustrate that. I'm wondering what experts, I guess, have told you about all of this, Joe, because obviously companies, I guess, see an advantage of releasing products this way. They continue to do it. But what did experts tell you so far about the issues that we've seen?

Joe Castaldo
Yeah. So Ethan Malek, who's a professor at the Wharton School of Business in the US, has this really interesting take that perfection is the wrong standard to use for generative AI. So something doesn't have to be perfect in order for it to be useful. So he and his colleagues in the Boston consulting group did this really interesting study a while back, where they gave some consultants access to GPT four, which is OpenAI's latest model.

Other consultants did not have access to AI, and they gave them a bunch of tasks to do.

To simplify what they found is the consultants who had access to GPT four were much more productive, and they had higher quality results on a lot of tasks than the consultants who did not have AI. And these were tasks like, come up with ten ideas for a new shoe, write some marketing material for it, write a press release for it, so more on the creative end.

But they also designed a task that they knew AI could not do well. And so what they found is the consultants who used AI, their results were worse. Like much worse than consultants who did not use AI. So you might think that's kind of obvious. Like, if a tool isn't up to the job, of course the results are going to be worse. But I think the important takeaway is if you don't have a good understanding of where the limits of generative AI are, you will make mistakes. It can be detrimental to you.

Manika Ramon Wilms
I know you spoke to another expert who was talking about something called the error rate. Jo, I guess this is a little bit about how we trust these tools. Can you tell me about the error rate?

Joe Castaldo
Yeah, so, just trying to figure out how often an AI model might hallucinate. And so that was, I was speaking to Melanie Mitchell, who's a computer science professor in the US.

And this ties back to knowing where the limits of AI are. But it's a little tricky because she was saying that if we know chat, GPT makes mistakes 50% of the time, we won't trust it as much. And we will check the output more often because we know there could be a lot of mistakes. But if it's only 5% of the time or 2% of the time, we won't. Right?

Only 2% of the time. Let's chance it's probably fine, but it might not be. So in that way, more mistakes could slip through. So she was saying that the better system in some ways could actually be riskier because we're more likely to trust it.

Manika Ramon Wilms
This idea of us, how we understand and how we trust these tools, I think is really fascinating. And I guess I wonder, to come back to how these launches are actually done, does the way that they're rolled out and they break and they do all these strange things. Does that actually, I guess, erode our trust, the public's trust in this technology?

Joe Castaldo
It could.

If your first exposure to something new is kind of ho hum, or if it's influenced by a lot of negative coverage, maybe you won't try it again or try it at all.

But the thing is, we're getting generative AI, whether we like it or not.

There's a lot of AI bloat.

Manika Ramon Wilms
What exactly does that mean?

Joe Castaldo
I guess I think of AI bloat as unnecessary AI features that make products, in some cases more expensive or just more annoying to use.

There are laptops coming out that have a Microsoft copilot button on them to get easy access to AI tools.

Manika Ramon Wilms
I think even in WhatsApp now you have the meta. Ask me anything that's there as well.

Joe Castaldo
Yeah, same thing on Instagram. The meta AI chatbot is in search, so it's coming.

But there are still a lot of questions about is this better? Is this something people want?

How does this improve a user's experience?

And you have to wonder too, this companies add in more AI features, are they going to have to jack up the price? So it's a value for money question.

Manika Ramon Wilms
Just in our last few minutes here, Joe. So these products are not perfect right now. We've talked about this extensively. At this point is the understanding, though that they're going to get better over time, that this is just kind of a temporary phase until we actually get over this hump to something better?

Joe Castaldo
Yeah, I mean, that's the arc of technology generally, but the question is how fast?

I think a lot of AI developers assume that this technology is going to improve very quickly.

And if you look at the past few years, it certainly has. Like take for example, I'm sure a lot of people saw the AI generated video of Will Smith eating spaghetti.

Manika Ramon Wilms
Uncle Phil, come try this fresh pasta of Bel Air.

Joe Castaldo
Which was hilarious but also horrifying.

Compare that to some of the AI generated videos now from companies like Runway or OpenAI sora. And the leap in quality is quite astounding. It's not perfect, but it's huge. But progress doesn't necessarily continue at the same rate. The approach to AI now is get a whole lot of data, a whole lot of compute or GPU's. And the more data, the more GPU's you have, the better the AI at the other end of this process.

But there are real challenges in getting more data, and there was a study earlier this year from Stanford about the state of AI and one of the things that noted is that progress on a lot of benchmarks has kind of stagnated, and that could be a reflection of diminishing returns of this approach. Progress might not be as linear as some people are assuming.

Manika Ramon Wilms
Just very lastly here, Jo, when we look at these releases, that the way things are rolled out here, what does this tell us about how seriously these companies are taking the big questions like ethics, the safety of these tools? What can we glean from this?

Joe Castaldo
Yeah, I mean, there has been a lot of concern about the pace of progress and companies releasing AI into the wild. Like last year, there was a very high profile open letter asking for a six month pause on development so that regulations could catch up. And of course, nobody paused and nobody stopped.

And so I guess what we're seeing now with this kind of rush and companies scoring own goals and making mistakes that could have been avoided to some extent with more care and thought, doesn't necessarily bode well for the future, especially if AI models become more powerful, more sophisticated, more integrated into our lives, that arguably carries more risk. So this is where regulation comes in, why so many people are concerned about regulating AI. The EU has passed their AI act.

There's a bill here in Canada. There's lots of efforts in the US.

So there's an argument to be made that if you want companies to behave responsibly, take ethics and safety seriously, you have to force them to through the law.

Manika Ramon Wilms
Joe, this was so interesting. Thank you for being here.

Joe Castaldo
Thanks for having me.

Manika Ramon Wilms
That's it for today. I'm Mainica Ramon Wells. Kelsey Arnett is our intern. Our producers are Madeline White, Cheryl Sutherland, and Rachel Levy McLaughlin. David Crosby edits the show. Adrian Chung is our senior producer and Matt Frayner is our managing editor.

Thanks so much for listening and I'll talk to you soon.