SaaStr 738: How Shopify Implements AI Across Sales and Product with Mike Tamir, Head of AI at Shopify
Primary Topic
This episode explores how Shopify leverages AI across its sales and product teams to enhance business operations and customer experience.
Episode Summary
Main Takeaways
- AI's role in enhancing business productivity involves automating routine tasks and providing deeper insights into data.
- Proper data management is crucial; the quality of data significantly impacts the effectiveness of AI models.
- Companies should integrate AI into their core business operations rather than treating it as an auxiliary tool.
- Cultural adoption of AI within a company requires clear communication and understanding across all business levels.
- Developing AI infrastructure must be tailored to specific business needs and includes considerations of cloud computing resources like GPUs.
Episode Chapters
1. Introduction to AI at Shopify
Mike Tamir discusses Shopify's approach to integrating AI across its operations, emphasizing data quality and management as foundational elements. Mike Tamir: "Data is often the messiest part of AI implementation."
2. AI's Impact on Enterprise
Exploration of how AI is adopted in enterprises and its impact on business processes and customer interaction. Rudina Seseri: "AI adoption at the enterprise level remains superficial in many companies."
3. Building AI-Ready Infrastructure
Discusses the technical and cultural infrastructure necessary for effective AI implementation in a business environment. Mike Tamir: "The infrastructure for AI goes beyond technology; it involves preparing people and processes."
4. Practical Challenges in AI Implementation
Tamir outlines the practical challenges encountered when implementing AI, from data management to ensuring model accuracy. Mike Tamir: "Overfitting and data cleaning are significant challenges in AI modeling."
5. Future of AI in Business
The episode concludes with predictions about the future of AI in business, emphasizing continuous adaptation and learning. Rudina Seseri: "Ambient AI will increasingly become a part of professional environments, enhancing productivity."
Actionable Advice
- Prioritize clean, well-organized data to leverage AI effectively.
- Embed AI within core business processes rather than as standalone projects.
- Cultivate a company culture that embraces AI through education and transparent communication.
- Consider the scalability of AI solutions to ensure they grow with your business needs.
- Regularly evaluate the impact of AI on your business metrics to align strategy with outcomes.
About This Episode
SaaStr 738: How Shopify Implements AI Across Sales and Product with Mike Tamir, Head of AI at Shopify, and Rudina Seseri, Managing Partner at Glasswing Ventures
At SaaStr AI Day, Mike Tamir, Head of AI at Shopify, and Rudina Seseri, founder and Managing Partner at Glasswing Ventures, level-set about where we are in the cycle for Enterprises adopting AI and the critical work being done at Shopify to leverage AI and solve real problems.
Mike and Rudina do a great job of explaining the complex AI terminology for everyday non-technical founders and leaders to understand and apply to their businesses.
People
Mike Tamir, Rudina Seseri
Companies
Shopify
Books
None
Guest Name(s):
None
Content Warnings:
None
Transcript
Welcome to the official Sastre podcast where you can hear some of the best sastre speakers. This is where the cloud meets. Hey everybody, we're so close now to Sastre Europa 2024. Me and the entire SASTR sales team and over 3000 SaaS and cloud enthusiasts will be together June 4, two and fifth in London, England. Everyone from around the world will take over London.
We're down to the last chance tickets, so use my code Jason 20 to save 20% off the last tickets. Now that's Jason 20. Why pay more? Feel stuck in your talent search remote talent is a leading job board for remote first companies and startups. Visit remote.com jobs today to start building your dream team anywhere in the world.
That's remote.com jobs.
Get more when Northwest registered agents start your business, they'll form your company fast and stand up your entire business identity in minutes. That means business free domain business email website hosting address mail scanning, business phone app all within minutes. Visit northwestregistered agent.com sastr to start your business today.
This episode is sponsored by our good friends at Pendo, the all in one product experience platform. Pendo brings together product analytics, guides, discovery and replay so you can improve your product experiences for every customer and employee. Try pendo for free at Pendo IO Sastr. That's Pendo IO sastr to try pendo for free.
Up today is a great session from Shopify's head of AI, Mike Tamer, who joined us at our first AI Day. We're gonna do a huge all day AI summit at Sastra annual this year in September. You should be there. And this was a gift. Together with Glasswing ventures, they went all across how Shopify actually uses AI across their sales and products.
So this is a great one for folks wondering how AI actually works in SaaS and B. Two B. Hello everyone. What Mike and I thought about doing today is how does one think about AI adoption, both at an individual level, in a business context, as well as the critical work that Mike is doing at Shopify and has previously done at Uber, Saskahana amongst others, as well as in the cutting edge work he does teaching at UC Berkeley. So by way of background, I'm managing partner at Glasswing Ventures.
Rudina Seseri
We are an AI originalist firm. We in fact founded Glasswing back in 2016 with the goal of investing in AI native companies that solve real problems for the enterprise and cybersecurity markets. Mike, with that, would you like to do a little bit more on your background? I think I would be doing it in this service. Well, I think you did more than enough.
Mike Tamir
But just to recap, I'm Ed Bar, one of the distinct, distinguished ML scientists in running ML and AI functions here. Lovely. So let's delve in the first part of this conversation, I will do a bit of level setting around what's going on with AI for enterprise adoption and where we are in that cycle. An important level setting point is the fact that today there are many initiatives across large and mid sized enterprises around AI. In fact, we tell our AI native company is not to lead with AI when they're speaking to prospective customers because it's such a top of mind topic that is a superficial indication of interest.
Rudina Seseri
But most of the engagement, with few notable exceptions, is still at that superficial level with early sort of measurements, if any, on the impact on the enterprise on an adoption basis. Now, we have seen a few green shoots, if you will, in this space, particularly, for example, what we saw with Klaviyo and the impact that they have had by bringing AI from a project siloed or customer interface type level usage to their core business, where they automated 700 humans and over two thirds of customer success is now AI supported and generated and contributed 40 million million to their bottom line. And it is because they brought it to their core product, to their core business. And as we expect in the mid and long term, the impact is sizable. But they are noteworthy because they are one of the few companies, and again, Mike will talk to what Shopify is doing as another leader in that category, that have brought AI to the core business.
So what is the mindset, what is the actual work that needs to be done around building the infrastructure to leverage AI for the core business and the highest impact? I would articulate it along three dimensions. One is software and the technology infrastructure, and how, whether it's the debt, whether it's the tech decisions, whether it's the data availability, et cetera, but getting the enterprise or the facets of the business to be AI ready. The other piece is really around cleaning the data. So one is the algorithmic set of technology side.
One is the data that you're going to feed. And oftentimes there is almost naive assumption that the data will be easy to leverage. Oh, it's an asset and we'll use it and leverage it. And it turns out that oftentimes is the biggest problem. And thirdly, culture, culture of adopting AI.
And again, we'll delve more into this as we proceed in the discussion. So where are we heading, therefore, as we try to adopt AI in the enterprise? I have a term that I use called ambient AI, which is basically that we, let's think of ourselves as professionals and productivity sort of driven professionals, we will be surrounded and leveraging AI in this copilot dynamic and or want to wrap one's mind around. What does it mean for me to be leveraging ambient AI? If the input is you, the business individual leader contributor, your activities and your environment, and you think of that as the input into ambient, into your existing AI tools that you've surrounded your self protocol piloting dynamic, the outputs are insights that you wouldn't have been able to derive on your own.
And secondly, the automation of the mundane, the low productivity, highly repeatable to free one up to be more valuable. So as we think about our roles as professionals and what will make us successful going forward, worth considering that in addition to the well tried, high impact, high productivity, people and professionals will have one leg up. If in fact they are leveraging the tools, if in fact they are intelligent business workers, not simply in the intellectual sense, but leveraging the intelligent tools. And as we shift to the enterprise, it's really these topics of data, infrastructure and culture. I will use this layout and framework to turn it over to Mike to talk about the impact, starting with the data.
My first question stated here, in fact, how do you know if you have the right data? And is data really the claim that I made is oftentimes the biggest bottleneck and the biggest problem for this business audience, not necessarily a technical audience. How do we think about that change first, absolutely. So you couldn't be more correct that data is often the messiest. And the time that you think, while we have the data, why don't we just get to the building the model, when in reality it's one of these iceberg cliches, where most of the time you spend is on gathering the data and cleaning the data and making sure that it's ready, that it's of the right quality.
Mike Tamir
And even once you begin the modeling, you're not done. Modeling is really an iterative process where you train your models, you see what signal you're getting from it, you see how good your performance is, and this speaks to the measurement and making sure that you have the right measures. And of course, every different ML technique is going to be slightly different. But very generally, and certainly when you hear about llms and a lot of the modeling techniques are in vogue, it's trial and error, right? A model will see an example of this is what information we have, and this is what we're trying to predict.
It'll take the information that you give it of what we have ahead of time and make its own prediction, and then look at how far it missed. Right. How far is what it tried to predict from the truth of what it should have predicted, and then it'll take that error, right. We hear about failure is the key to learning. This couldn't be more true than in machine learning.
Machine learning literally takes that error and with a bit of math, tries to adjust all the little parameters, all the little calculations that it makes along the way from input to output, in order to make that error just a little bit smaller the next time. And this leads into a lot of the things that you have to do with your data as best practices. So if I take all of my training data and I teach my model with my training data to close that gap between its predictions and what it should have done, then I might be overfitting. So overfitting is if your model has taken that time to close the gap between its predictions and what it should have done, maybe what it then is just learned for those specific examples that you gave it to learn from. Right.
And so it has overfit to those specific examples. But we don't want to build models that can just take the inputs you give it and give the outputs that you told it to learn. We already have that data. What we wanted to do is to learn the pattern. And to learn the pattern means we have to do all of these different things in order to constrain the model in various ways so that it predicts not just on the data that you're giving it, but it also predicts effectively on the data that it's never seen before.
And we usually call that validation data and ultimately test data, different stages for figuring out when is, how is this model going to predict accurately out of sample, out of the training exam. And so if I am a p and L owner in XYZ business, and I've had this data for 20 years, and Mike's team is helping me sort this out, why is it taking them multiple weeks? What's so hard about it? What's so hard about it? And there's two parts of this which also speaks to, I think, the culture, right?
Especially in a world where you can talk to an LLM and it seems like it's giving you right answers and you want to say, you only live once. Let's just use whatever the answers are. And I'm not going to name any deployments that have happened in the last several months, but we all know news headlines where maybe that didn't work out so well. So having the culture of checking your results and making sure that you have the right metrics to evaluate whether it's an LLM or a more traditional model that's not generating text or generating images, but just estimating numbers or estimating classes of whatever the task is, having that culture where you are focusing on what are your metrics and you're evaluating those results, is really important. On the why is it taking so long side of things when you take the data?
And we're going to have a very nice concrete example of this towards the end of this talk, it can be very the goal, the business goal that you have might be something like, I want to make more sales, or I want to delight my customers and see that in feedback. Whatever that metric is, we have to translate that. Speaking to the culture of measuring our results, we have to translate that to a concrete metric. And so this is going to be some sort of mathematical function of saying, how bad did my prediction miss? From what I said the right answer was?
And hopefully it's very close and you get some fireworks. Now, figuring out that answer is actually very complicated. It's complicated in finance. It's complicated in producing recommendations or returning a search for a query. That gap between what I want to mathematically measure, which is how the mushroom is going to learn based on that error and what my actual product result is, really matters.
Another area where you really have to think about what matters is how the data is actually structured. If I give a bunch of examples where of this is the kind of input and this is the output I would expect, or this is the kind of output I would not expect, the model is going to learn that specific pattern. So if you give it bad examples or you give it dirty examples, either examples with the wrong answers, or examples where a negative case isn't as informative as almost positive case, and we're going to get again to a concrete example of that soon, having that clean data is really going to be the lifeblood of how the model worked and ultimately how the product is successful. Got it? All right.
Rudina Seseri
Continuing our discussion on the infrastructure side, if you are a productivity business user, you can only affect the decisions that get made in the infrastructure level so much. What should one know? What constitutes a modern AI stack that enterprises can scale on the basis of? Ultimately, that's going to be a little bit dependent on the use case in the context, right? Every company is going to have slightly different needs and going to have slightly different applications when it comes to doing the more advanced training and deep learning models that I've been describing.
Mike Tamir
You're going to need GPU's for one thing, right? Which means you may need a single GPU and you're a very small startup and you're trying to just work on a budget. You may need a cluster of GPU's, in which case you need infrastructure management. In the last several years there's been obviously a very big shortage of GPU's and so it has historically been the existence of cloud platforms. AWS, GCP, Azure has been very helpful in sporadic needs.
Oh, I need a, I need to rent a GPU for a few hours to do my training and then I don't need it anymore. Versus purchasing or long term rental, right, where you reserve an instance for days or for months and that's going to end up costing thousands of dollars in oftentimes. So being able to size up what kind of training you need and then picking what level of are you going to need at ad hoc, are you going to need it as a reserved instance? Maybe you buy your own infrastructure, which is also often cheaper. Answer if you already have an on prem infrastructure system, that's really going to depend on what your AI training expectations are anticipated.
Rudina Seseri
So I'm going to throw a curveball. If we think about the migration that the enterprises made from on premise to the cloud, the underlying value prop was log. The variability is fairly cheap, you can ramp up fairly fast. In fact, I have this term that I use that it's a variable fixed cost on your p and l, meaning that you can scale it up and down the line will always be there for you as a business and that's the cloud cost. As we move to the world of AI nativeness and leveraging, it is your belief that the existing incumbents will actually, whatever you're seeing with the large foundation models providers will become the swiss army knife.
So I'm speaking for example to the AWS and Amazon approach where they're saying, listen, you want to use OpenAI, you want to use anthropic, we have our offering, we have bedrock, you don't have to switch. It's going to be one more feeder. So does this line remain or do we have a new fixed variable line? It's a very good question. And certainly when you talk to AWS, you talk to GCP, they all have partnerships and obviously azure very much or so with OpenAI, they have these partnerships.
Mike Tamir
But it's worth being aware of this using a first party solution. Let's say OpenAI or with Anthropoc is going to have different performance, different latency than if you do it via an intermediary like a cloud. So you maybe get some simplicity by not having to onboard different vendors and different support. But there is going to be an impact in terms of performance and that's something that might be a temporary problem is certainly a very real consideration. Is that a high pain point in this moment in time the delta in performance or do we believe it?
And your view is right now it's worth not ignoring for the use cases that I've had. And something else maybe hopefully transient for 24 is the option for ad hoc GPU's when you go higher than like an l four. If you need an a 100 or an h 100, something that can host some of these larger open source, large language models. We talked about commodity models. What if you want to do something like a 70 billion or minsbro or mixture or any of these?
Those are going to be very hard to get on an ad hoc basis and you're going to end up having to pay the price for a reserved instance if you want to be able to serve something like that. And those are you've now touched on the trade offs between using open source platforms versus existing walled garden incumbents. Culture. How do you effectively implement AI from a cultural process people point of view. One of the both beautiful but also double edged things that is happening with the ease of libraries.
In the olden days. Ten years ago, if you wanted to put together a model, a neural net model, you would probably have to layer it, put it up layer by layer. You'd have to design it yourself for the most part. Now you get a pre train, that is to say a model that somebody else has trained. And then you're just going to tweak its decisions a little bit to adapt it to your particular data.
Or the days of coding up your own specific classification model. Or there's a whole host of supervised and unsupervised et cetera models that you might want to do. The days of having to have ML experts do that by hand, or that usually doesn't happen. At most you're doing tweaks, you're doing minor corrections. And what that means also is that you can have non experts who are not used to a lot of things that I've already touched on, like how do you make sure that your model's not cool for shitting?
How do you make sure that your data is clean? How do you trust then have the right metrics so that you don't have an embarrassing situation where your LLM offered free x of service, and now you're on the hook for actually backing up whatever your LLM did, even though it wasn't trained or aligned, it's called for that particular mistake. So enabling your engineers in order to leverage all of these great use cases, including the models that seem to take care of themselves, sometimes deceptively, and making sure that you still have very rigorous metric supervision and evaluation of how your models are doing and when the worst cases are going to happen. Making sure that you stratify another technical term that means you consider all the different situations carved up into little boxes so that you make sure you're not just performing well on one, you're performing across the board of all of the different use cases. Having that strong culture to back up the performance of your model before you go into release is probably one of the biggest issues in terms of having a strong ML culture when you're as it gets easier and easier for non experts to start leveraging these tools.
Rudina Seseri
And if in a five minute conversation with a business owner, you have the opportunity to completely overwhelm them by throwing all sorts of technical terminal terms, if that is not what they have done day in and day out, but they're very good at running the p and l, and all of a sudden we are in a new paradigm. How can they stress test what you are doing? They don't speak your language. You throw the three letter acronyms, like where do they find themselves? It's a wonderful question.
Mike Tamir
I love it. Yeah. So it's very easy to throw technical acronyms and to maybe go deeper than is really necessary when you're coming from a technical perspective, understanding exactly what a transformer model does or what a multi head attention network does, and why you need rotary versus non rotary embed, embed positional embeddings or different softmax loss functions and temperature. That honestly doesn't make a big difference in terms of making the business decision. Is this model going to perform the way I want?
Is it going to perform a sample that is, say, when we're live? Are we cleaning our data? Are we using, do we have metrics that match what the model was trained to do? Our loss function, our leading metric? How did my prediction match the truth versus what the product is supposed to do?
And asking questions about why those matches is going to be the most valuable thing for you to dig in on, because that's what's ultimately going to determine the performance of how your product that's AI driven is going to succeed. I couldn't agree more. So let me turn it over to you to delve into some of these examples. Yeah. Shopify is one of the largest e commerce platforms in North America.
And in particular, that means making sure that we can answer a fundamental question of if a buyer has a query and is searching for a query, which products would match that query, which ones are relevant matches, and which ones are irrelevant matches. Now, that becomes relevant when there's a particular merchant who is a single shop, who is using Shopify, and somebody goes to that shop and wants to search for particular products within that shop. We also have a shop app where you're searching much other ecommerce experience across multiple merchants and multiple products that are hosted across several different providers. And so search relevance in those search contexts, those kind of search contexts that as an end user you're very familiar with are first and foremost very important. Now, there's also just making sure that you have high quality.
Right. If I search for boots and I end up with boot socks and boots and ski ski boots and a lot of things that might not be relevant, I need to be able to evaluate that. And I need to be able to evaluate that, say, on the first and different results to make sure that when we do these key searches, we're giving the end user buyers the right kinds of results. And to do that, we need to have an evaluation method. Right.
How does this product match with this query? And again, this goes back to what I've been banging the drum on all along, is you need to care about your metrics. So this is as much a metric as a product itself. Mike, if I may interject, because I think we can't stress the metric point enough. It's also not all metrics are created equal across different businesses.
Rudina Seseri
So what's a good enough metric? I'm going to make up a basic example. If I search for an item on Shopify, the boot example, and I get the right item 50% of the time, that might be great. Or not. You will tell us what's relevant in one context but completely insufficient in another context.
So tell us a little bit about not all results are created equal. Back to this matrix point in context matters. Sure. Yeah. And maybe to your point, in product search, there's two dimensions.
Mike Tamir
There are two principal directions that you might want to have relevance in, right? One might be I'm looking for a product and so I want to see that product. I want. I'm looking for boots. I want boots.
Another might be I'm shopping for boots. And so I'm going to buy boots and I'm going to buy socks that match the boots, and I'm going to buy maybe some cleaner that makes sure that my boots stay in nice condition and in the wintertime. So, cross selling and relevance, right. And these can actually be two different metrics. And maybe what you would want to do is have one metric that shows for direct relevance and another that shows complementary relevance.
Right. And then it is somewhat a product decision of how much you want to balance these, but it's also something that you can balance with GM, with your bottom line. Right. With how much do, if we balance for, say, 50% are going to be direct on the nose, relevant, 50% are going to be more complementary. Maybe if we tweak that to 60 40 or 70 30, you might actually see a change in your bottom line.
And that's the sort of, that's the way your product, your end goal is going to be determined by certain parameters that you choose in your actual model deployment. And so back to this interdependency between your teams and your organization and the P and L owner or the merchandiser, whoever is responsible for the business unit. And that bottom line, how does that interaction back and forth work? Whatever tweaking you are doing on 40 or 50% or 2% improvement has ramifications in terms of their bottom line, but also the resources that you have leveraged. So if we wanted to express the metric in terms of results for the end customer in this example, what about from an ROI perspective, how much do you have to put in to get what out?
Yeah. So there's a couple aspects to that. One is, as a data person, obviously, I love seeing the numbers and letting the numbers tell me. And so having experimentation, being able to experiment with, to take the simple example of how do you balance these two kinds of experiences, is one answer. And then you need to invest in experimentation frameworks, right?
And hopefully, again, these things get easier and easier as there's existing open source frameworks and tools and vendors that you can leverage for that. That's one answer. The other answer is really keeping close communication with your stakeholders. This really speaks to the culture aspect of it, right? Just having ML, ML engineers and ML scientists working in a vacuum and coming up and saying, this is your model.
I'm going to go work on the next fun technical thing is a mistake, and it's been a mistake. And we've seen different versions of this. Back when my job title was called data scientist, and that was the only kind of title there was what you need is to have a pretty mixed, you need to have engineering for serving, you need to have your ML scientists and engineers, and you need to have product, you need to have product in the trenches with your ML builders so that way they can understand what are the pros and cons of not just making a tweak in a parameter and what a metric is, but also that key value that I keep driving home, which is does your end goal product metric and what you want to get there match your metrics that your model is trying to learn from? And that's going to make a huge difference in terms of what you actually see when you get your bottom line results. Got it.
Rudina Seseri
Vectorization and multimodal llMs. Llms most people have heard in one form or another. Multimodal is initially we started with text to text. Now it's text to image, text to video, two d to three d, etcetera, authorization in plain English. And then how that bears relevance in the context of the exchange we've just had.
Mike Tamir
Yeah, we had, in the good old days it was keyword search, right? And people have learned, oh, I just put in one in the good old days, maybe 15 years ago. When you're searching, you search for one word, you don't really phrase it in a sentence. You don't talk to Google that way. At least you used not to do that because it's just looking for the main words.
It's ignoring that we call the unimportant word and trying to match those in the products or whatever it is that you want to search for the documents. If you're doing a search on web pages, it's a somewhat complex topic, which really can be boiled down to a very simple image which is in a multidimensional space. And you can imagine three every word in its context and every image in its context can be represented as a point in that space. And if you're doing your vectorization correctly, then similar words from similar context or similar images are going to end up located in that similar part of the space, right? So all your cat pictures are going to end up here.
All your dog pictures are going to end up close to the cat pictures, but not the same place. All of your fireworks predictions we've seen a lot of those are going to end up way far away from those. And the words corresponding to those are going to be, if you do it right, your image vectorization, where your pictures end up as a point in that space, are going to end up in the same part of that space as your words. So if I describe cat, it's going to end up where the cat pictures are. If I describe a dog, it's going to end up where the dog pictures are.
And when I say in context, what if I say bank? When I say bank, I could say the robbers were leaving the bank and they were going at high speed and crashed into the river bank. And I've said bank twice here. And those are very different words, right? One is going to be in the second one, in that context, is a place by a river.
The first one is a place that has money. And so modern techniques for vectorization have done a very good job of looking at the rest of the sentence, looking at that context, and figuring out that the first instance of the word goes where with pictures of industry, industrial places that manage money, and the other one has pictures of rivers and that sort of thing in it. What Mike just described is about 80% of the backbone of Genai, and why we have the ability to interact with some of the applications in plain English and get human like interactions. Because leveraging these vector technologies and delivering what's called embeddings, which is really leveraging the vectorization through deep learning just so that folks actually internalize what this all means. Yeah.
And when you hear embeddings, it's pretty much a synonym to how I'm using the word vectorization, embedding the word or the image into a vector space. So Mike, you're using this and how you're constructing, whether it's the merchandising, the images, the text, the descriptors, pricing. Are you leveraging AI for dynamic pricing of any kind? So, dynamic pricing is a complicated area. And very generally, I know from past lives that you never want to do dynamic pricing, you want to do dynamic discounts as a product framing.
Bit of advice. Now, certainly having tools helping merchants to understand things like demand and supply curves and how those trade offs are, that's certainly something that Shopify is interested in providing for merchants. As the first party customers of Shopify, we want to give them the tools so they can be effective entrepreneurs. Real quick, just to frame the evolution of what's going on here, right, we started with keyword matching, then we might compare vectors, right? And typically I had on the previous slide, cross encoding and two towers.
These are just very technical neural network architectures where we actually look at the vectors at the same time versus looking at the vectors separately. And if I do a query, I don't want to have to compare that query to every single product across all of Shopify, let alone even maybe in a single large merchant. And because it's not scalable, if you have hundreds of millions to billions of products to search for it, you can't do that. No, it's scalable. We mean it's very expensive from a compute perspective.
Yep, expensive, yeah, expensive. And honestly, prohibitively it would take too much time in order to return it. So that's one side of, and that is in terms of serving, in terms of giving results, when somebody produces a query, you're going to want to sever those. Now if you want to just look at quality and you can do just smaller quality checks, then you can look at them simultaneously, you can get a little bit more information. The same story is now true with llms.
Llms, if you domain adapt them, right? That is to say, you train them on your data and have them learn your specific behaviors that you want to see correctly, can be very good at also taking any slower context. I look through this query, I have this product, are they relevant together or I have this product, what are the relevant features? What are the attributes? What are the sorts of structured features?
If you have a t shirt, it might have a v neck, it might have a turtleneck, it might have different colors. What sorts of thing can I extract about this image and this product description that I can use in search queries down the line? And so llms are remarkably, almost too good to be true effective at this sort of work. Good. What's the cons?
They're very slow and they're very expensive. And if you pay a commodity LLM, which right now does usually beat out open source Llms, you're going to end up paying per token and you can't scalably. If you have the right size of a business, do this for every query or every kind of use case. So what do you do? You might want to take an open source model and do what we call distillation.
Distillation is a technical term. It means I have a small model, I have the answers that I've gotten my big model, I know what goes into the big model and what comes out of the big model. And so I train my little model to learn to mimic the big model answers. And how's that different than fine tuning? So fine tuning is you might have any size model.
The parameters are already trained on a lot of data by someone else for their data. That data might be close to yours, but it's not exactly the same data as yours. So if I'm going to fine tune my model, I'm going to take my specific data and I'm going to adapt to my specific domain on my data, the base model. So for an LLM context, llms are really remarkably great at grammar and understanding the subtle in these days, understanding the subtleties of language and often during multilingual. And there's a lot of baseline things that I don't want to put in the work on that.
Right. That work has been done and been done very effectively and there's no reason to have everyone be another OpenAI or another anthropic when it comes to baseline. But when it comes to my specific data, whatever it is, I'm going to want it to be extra good at those nuances. And there are all sorts of ways of fine tuning. There's all sorts of ways of.
There's a whole lot of technical literature on different ways that you can adapt your model to your specific data. That's fine tuning is what the main one. Back to your example. You were going to do the LLM, the large language model, and then almost like a small language model that's separate from fine tuning, you're fine tuning on top of the LLM. This is above and beyond, sits on top of it.
Rudina Seseri
I just want to make sure people have it. Okay? Yes, you fine tune your smaller open source. And when I say smaller, it might be seven to 70 billion. Not so small, just small compared to the bigger one.
Mike Tamir
Right. You might fine tune it in order to mimic the performance of the large commodity model. Got it. I know we have some more context, but what do you wish I had asked you about your work that we didn't get to? Maybe it'd be worthwhile just spending 60 seconds to talk about the difference, about the data issues because we Easter egg that.
So in particular, let's think about search results.