Why 'open source' AIs could be anything but, the derailment risks of long freight trains, and breeding better wheat

Primary Topic

This episode delves into the complexities of "open source" AI, the risks associated with long freight trains, and innovations in wheat breeding.

Episode Summary

In a varied discussion, the "Nature Podcast" team explores several intriguing topics. They start by examining the concept of "open washing" in AI, where entities like Microsoft and Google claim openness without fully meeting the criteria. This transparency is critical as it influences the European Union's AI regulations. The conversation then shifts to the increased derailment risks of longer freight trains, highlighting U.S. research showing that train length significantly correlates with accident likelihood. Lastly, the episode covers groundbreaking work on wheat breeding, focusing on leveraging ancient wheat varieties to enhance modern crops' resilience and nutritional quality against environmental challenges. This episode not only sheds light on technological and scientific advancements but also underscores ongoing debates and legislative developments affecting these areas.

Main Takeaways

"Open washing" is misleading in the AI sector, with major companies failing to meet true openness criteria.
Longer freight trains pose greater derailment risks, sparking safety debates and potential regulatory changes in the U.S.
Ancient wheat varieties are being reintroduced to modern breeding programs to combat challenges like climate change and disease.
Transparency and regulation in AI development are increasingly influenced by how "open" a model is deemed by regulatory bodies.
The episode highlights the intersection of technology, policy, and practical applications in addressing contemporary challenges.

Episode Chapters

1: Open Source AI and Its Misconceptions

This chapter discusses the mislabeling of AI systems as "open" by major tech companies and the implications for regulatory and ethical standards. Nick Petrichow: "A lot of models saying the word open aren't necessarily hitting all these criteria."

2: The Risks of Long Freight Trains

Analysis of U.S. research indicating that the longer the freight train, the higher the risk of derailment. Dan Fox: "A 100 car train is 11% more likely to derail than 250 car trains."

3: Breeding Better Wheat

Exploration of efforts to integrate ancient wheat varieties into modern breeding to enhance resilience and nutritional profiles. Benjamin: "Researchers went back into the past, looking for genetic and phenotypic diversity."

Actionable Advice

Evaluate AI Claims: Scrutinize the openness of AI systems and seek transparency in their development.
Safety in Transportation: Consider the implications of train length on safety and advocate for stringent regulatory oversight.
Support Agricultural Innovation: Engage with and support initiatives that aim to improve crop resilience through genetic diversity.
Stay Informed on AI Regulations: Keep up with legislative developments around AI to understand how they impact transparency and accountability.
Promote Sustainable Farming: Support and implement farming practices that leverage genetic diversity for crop improvement.

About This Episode

Many of the large language models powering AI systems are described as ‘open source’ but critics say this is a misnomer, with restricted access to code and training data preventing researchers from probing how these systems work. While the definition of open source in AI models is yet to be agreed, advocates say that ‘full’ openness is crucial in efforts to make AI accountable. New research has ranked the openness of different systems, showing that despite claims of ‘openness’ many companies still don’t disclose a lot of key information.

People

Nick Petrichow, Dan Fox, Benjamin

Companies

Microsoft, Google, Meta

Books

None

Guest Name(s):

None

Content Warnings:

None

Transcript

Nature Plus
The Nature podcast is supported by Nature Plus, a flexible monthly subscription that grants immediate online access to the science journal Nature and over 50 other journals from the Nature portfolio.

More information at go dot nature.com plus.

James
Hi, I'm James, CEO of Area Strategic Mining, and I'm coming to you from the lost sheep mine here in Utah. If you're investing in critical minerals, add floorspar to your list. Our operation is the only permitted floorspar mine in the States and we're inching closer to production by the day.

Get in early, search lost sheep mine today and invest in the USA.

Benjamin
Hi Benjamin from the Nature podcast here. Slightly different show for you this week. We're going to be diving into the nature briefing and chatting about some of the stories that have appeared in it over the past couple of weeks. And joining me to do so are two members of the podcast team, Nick Petrichow. Nick, how are you doing?

Nick Petrichow
I'm very well, thank you.

Benjamin
Ben and Dan Fox. Dan, how's it going today?

Dan Fox
Hello. I'm good, thanks, Ben.

Benjamin
So three stories to talk about today. Nick, why don't you go first? You've got something about AI this week.

Nick Petrichow
That's right. I've been looking at a story about something called open washing, and this is a story I was reading in nature by a good colleague and friend to Lizzie Gibney. So you guys, you know, you science journalists, you probably come across the term greenwashing. This is a similar sort of idea. But instead of giving something more environmental clout than it deserves, this is about giving something specifically AI, maybe more open clout than it deserves, right?

Benjamin
I mean, yes, washing has negative connotations. And how is it affecting the world of AI and who is potentially doing this washing?

Nick Petrichow
Well, potentially a lot of people. So this news article focuses on analysis that has basically ranked different large language models. So these are the models that power things like chat, GPT, and it's ranked them on how open they actually are. And it uses different criteria, such as how much of the training data people are able to see, whether people can understand the weights, which are the bits and bobs of the model that help it decide, like how much weight to put on something and that sort of thing. And yeah, it seems like a lot of models that are sort of saying the word open aren't necessarily hitting all these criteria. So big players like Microsoft, but also like meta and Google, they have said that things are open, whereas these researchers are saying, well, actually you're not really meeting some of these criteria.

Dan Fox
So if those are the offenders, who's looking good. Are there any actually open, open models?

Nick Petrichow
Well, the researchers highlight one standout model as being very open, and this is a model called Bloomz, which has been built by an international and largely academic collaboration. And as I said, the researchers point to this as an example. That's one that's particularly open because you can see its code, you can see all the data that's gone in. It's got like a scientific paper around it, you can mess around with it, you can change things in it with the application programming interface and all sorts of things like that. They say that this one's a real good example of it. And whether or not something is open is going to be very important as the EU's AI act comes into force, because under the legislation, as it's written, at the moment, if an AI is open or open source, then there are less requirements that it will need to meet. So the EU wants very extensive transparency requirements. If a model is deemed open by the EU, then you wouldn't have to do that. And there are other obligations as well that you may not have to meet if your model is open. But the problem is, and the reason that this is an issue is there isn't really a good agreement on what it is to be open in AI or open source in AI. And that's why the researchers have been trying to do this ranking and we've.

Benjamin
Covered a lot on the show and you've covered yourself, Nick, the black box that is so often associated with AI, and nobody really knows how their brains, in heavy inverted commas, actually work. So having an open system like this presumably would give researchers more. More insight into what's going on.

Nick Petrichow
Yeah, exactly. Because while you may never be able to know some aspects of what a model does, just because of the sort of complicated way in which they work and the way they sort of develop like neurons coming together to make a network of weights and measures and all the rest of it, there are many things that we can know. We can know what a model was trained on, for instance. So if a AI starts spouting, you know, hate speech or something like that, if you're able to see what it was trained on, you may be able to understand why it's doing that, and that could help you sort of improve it. And, you know, if you're able to modify the model as well, which if a model is like, truly open source, you would be able to do, then you can better tweak a model to be better for different circumstances and avoid things like that. And, yeah, more open the better is the way that many researchers certainly view AI.

Benjamin
I mean, one could potentially argue that it's somewhat tricky for companies because they want to protect, protect the products that they've developed, while at the same time trying to meet whatever the definition of open is.

Nick Petrichow
Yeah, I mean that is an argument. And companies may also want to protect themselves from litigation. If there is something that the model has been trained on that turns out to be copyrighted, they could be liable for that. But again, there are different ways in which things can be open. There can be other aspects of it that companies make open, and some companies have shown that it is possible to be open. Now from the company side, in the article, Lizzie reached out to a couple of the companies and a Google spokesperson says that the company is precise about the language. It uses the term open rather than open source to describe its model. Gemma LLM and it argues that existing open source like concepts cannot always be directly applied to AI system. And Microsoft said as well, they're being very precise and they said they have made many things available. Meta did not respond to a request for a comment from nature. So, you know, there are definitely arguments to be had. And as I said, there's not really a good definition. So one of the key things that needs to happen is researchers need to come together and actually develop a definition of what open source is. And particularly the EU needs to define what that means. And there are concerns that as the EU tries to define this, that could be at risk of being lobbied against by different companies and that. So we'll have to see how this one develops.

Benjamin
I mean, it seems like this is a debate that will run and run for a while, especially as more players get into the game.

But let's move on to our second story this week. And, well, it couldn't be more different. Dan, you've got a story about trains.

Dan Fox
Yeah, this is an article originally published in scientific american about how longer freight trains drive up the risk of derailment. So this is based on some research done in the US looking at these kind of enormous freight trains over a mile long, the sort we don't really see here in the UK. So this study was published in a journal called risk analysis, and it showed that the odds of a derailment increase as a train gets longer. So a 100 car train is 11% more likely to derail than 250 car trains, and then if you carry that on a 200 car train is 24% more likely than 450 car trains to derail. And that's taking into account that it's less trained. So even though that's one train versus four trains, once you get to that sort of length, it's 24% more likely to derail. So these are still relative risks. I mean, derailments are still quite rare comparatively. But as the freight train industry in the states looks to kind of bring down costs and be efficient, these longer trains are being looked at more and more, and this could become a safety issue quite quickly.

Nick Petrichow
So longer trains, more risk of derailing is, I guess, the main point of this paper. But how have they worked this out? Is this based on, like, real life data or is this sort of simulations of trains as they get longer? Does something wacky happen?

Dan Fox
So they used quite an interesting method, actually. So it's from real life data, but the sort of data you would need to really build a sample in terms of how many trains, of what lengths are running all around the country isn't publicly available. So they've used a method that's previously been used to study car accidents called quasi induced exposure. And basically, while you can't get the data about what trains are running, the Federal Railroad Administration does record when there is an accident. And when they record an accident, they get data on how long the train was that had the accident from where it was rotting and where it was in the country. So using that data, they picked a different type of accident to a derailment. So they picked what's called a beat the train type accident as a proxy. So that's when car driver tries to kind of get out in front of the train at a crossing before the train comes across and gets hit by the train. So using that data, they were able to build out this sample along with derailment data. So the key thing being that the beat the train accidents are hopefully independent of train length. So drivers who are kind of trying to get out in front of the train aren't necessarily looking at how long the train is. They're just trying to get out in front of it to save themselves some time. That allows them to use this data to build a model.

Benjamin
And this model then suggests that one very long is potentially riskier than for the sort of equate to the same length.

Dan Fox
Yeah, yeah. So the same number of cars. But if you put them all in one long run, that's statistically more likely to have a derailment than if you were to break those cars up into separate trains.

Nick Petrichow
And, you know, you said that it's hard to get some of the data on this. So do they have any sense as to why it is that these longer trains are at risk of derailment?

Dan Fox
Not really. So that's not what this paper's looking at. It's very much looking at the statistical likelihood of derailment, rather than the reasons. And they do kind of do a bit of a literature review in their paper looking at causes for derailment, which are sort of things you might expect, sort of different grades of rail and areas that are kind of checked more frequently, being less likely to have derailments than areas that are less frequently used. But obviously, that doesn't explain why these specifically, these longer trains might be more prone to derailing. But in the article, there is a quote from a former locomotive engineer who conflates driving one of these very long freight trains to a slinky toy with the kind of couplings between each car. And they suggest that one problem that comes up in these very long trains is having a mixture of different cars with different materials being carried. So the weight distribution can be different in different parts of the train, and that makes safely driving this train that could be, you know, a mile and a half long along the track very difficult.

Benjamin
And you mentioned at the start, then, the trend seems to be that more of these larger trains are potentially coming in to service. I mean, what do you think this information will do for people who run the railways? I suppose so.

Dan Fox
This specific paper, the authors said they, you know, they wanted to add this evidence into the discussion, and there is a discussion going on in the US around railway safety. There's currently a Railway Safety act of 2023 being debated in the states, which, if enacted, would require the development of regulations regarding freight train length. Now, the authors of this paper say, you know, they're not against long trains, and they point out the benefits in terms of fuel consumption. They're just, you know, reporting this potential safety hazard. Having said that, an assistant vice president at the American association of Railroads has disputed the study's risk estimates because they say, it fails to take into account different types of train or different car types, and make a good point that a 50 car train in the study could mean a train with incredibly long cars or a train with very short cars, because they've used number of cars rather than specific lengths. And so perhaps more research is needed, perhaps with a bigger data set.

Nick Petrichow
Well, it sounds like this will be useful information as this railway act goes through all the sort of legislative hurdles, and hopefully there'll be less train accidents in the future. But for now, I want to know what you've got on the briefing this week, Ben.

Benjamin
Well, I've got a story that I read about in science, and it's based on a paper in nature, and it's about wheat, okay? A staple crop, of course, for many, many, many people around the world. And this is looking at efforts to improve it, to give it new traits. And these traits have come from an unexpected place. They may be new traits, but actually they're from old wheat.

Nick Petrichow
I see. So is this more hybridization of wheat sort of crossing them together, or is this genetic engineering to introduce these traits?

Benjamin
Well, this story seems to focus on the crossing, the hybridization of wheat, and that is super important in the story.

This crop, right? So current modern wheat was created during the 19th and 20th century, okay? Through crossbreeding of a few key varieties. And it created, you know, a wheat that was really high yield, but it was kind of vulnerable to disease, you know, drought, things like this. And wheat is facing a lot of threats, you know, climate change, fungal infection, these sorts of things. So in the hunt to give this wheat new skills, as I say, the researchers went back into the past, and they were looking for genetic and phenotypic diversity from different types of wheat from different areas, okay? Now, these are known as land races. And these land races come from an antique collection. And many of these different sorts of wheat kind of disappeared a very, very long time ago. And they came from a collection that began in 1924, started by Arthur Ernest Watkins here in the UK. He was studying wheat anatomy, and he amassed loads of different samples of grain from across the world, you know, 32 different countries, as I understand, thousands of samples.

Nick Petrichow
Man loves his wheat.

Benjamin
Well, clearly he did, right? And it turns out, not just him, curators have kept this collection going, right? And sowing and collecting seeds every few years, with the exception of during World War Two, the article states, when some of these land races were lost. And what's interesting is there's kind of a snapshot in time, right? These wheat land races were collected a long time ago, and the researchers behind this study really wanted to see what sort of made them tick, I guess.

Dan Fox
So what were some of the exciting features of past wheats that the researchers found?

Benjamin
Well, they found a bunch of different things, actually, and it was an absolutely herculean effort. So the wheat genome is enormous, right? Lizzie and I talked on the podcast before about how odd some plants are. They just have enormous genomes, and apparently the wheat genome is 40 times larger than the rice genome. So the team behind the work did a bunch of sequencing, and the article says that they had to post a suitcase full of hard drives to their collaborators with all this genetic data in it. And in total, they did 827 historic land races and 208 modern varieties. A huge amount of work. And one of the researchers describes it as a gold mine of kind of genetic data. But genetic data on its own is one thing, right? Having a sequence is one thing. And what the team had to do is to work out which of these land races could have desirable traits. And what they did was they crossed them with different wheat, they made loads of different breeding situations, and together they created over 6000 unique populations of wheat, growing them in greenhouses and in fields in the UK and China. I mean, this is like a decade worth of work. And then they measured the different traits and developed algorithms that could sort of link those traits back to this genetic sequence. They could see, okay, well, that gene or that area of the genome seems to be important in making them taller or making them less susceptible to heat or whatever it is. And so they've gone through that. And I think, as I said there, it's kind of an interesting snapshot.

And a lot of these samples came before the mass use of cheap fertilizer, right? So they were interested in how some of these land races would cope in low nitrogen environments, because presumably they must have grown before fertiliser was widely used. And it turns out they have found a cluster of genes that looks to be related to nitrogen use. And previous work related to this had found a gene that seems to give resistance to this fungal disease called wheat blast. Now, this is pretty bad, and it obviously can decimate wheat crops. And the paper says that breeding programs involving this gene have already started happening around the world, which is kind of interesting.

Nick Petrichow
No, that's super interesting. And obviously, as you said, it's a staple crop, it's essential for a lot of people in the world. But I wonder, how does this transfer from this massive experiment to farms and then eventually people's plates?

Benjamin
I mean, that's a really good question, and I think it is the long game. So this is kind of really the start. It's kind of a toolkit. They describe it as. Now, there are other efforts involving trying to enhance the ability of wheat to grow in salty soil, for example, and these sorts of things, but it takes a while, right? So breeding in a desirable trait is one thing, but you need to make sure you don't breed in an undesirable trait at the same time. Right? It's not just a kind of a one. And done. So there'll be lots of crosses and lots of plant breeding going on to make sure that wheat with these new old traits, or old new traits, I suppose, are available. And in the article, someone says that this could take, you know, a decade or even longer for this kind of style of plant breeding to come up with the goods. So it could be a little while yet, but I think it really opens the door to researchers really getting in there and working out, as I say, what makes wheat tick. And, well, it's a goldmine, as one of the researchers is quoted as saying.

Dan Fox
Well, that story's made me hungry for a sandwich, so I think that's time for lunch.

Benjamin
Agreed. I think that's a good place to end this week's Nature podcast and listeners for more on those stories and where you can sign up to the nature briefing to get even more of them delivered directly to your inbox for free, head over to the show notes for some links. And all that remains to be said this week is Nick and Dan. Thank you so much for joining me.

Nick Petrichow
Thanks for having us.

Dan Fox
Thanks very much.

Nature Plus
Deep dive into the world of science with nature. Plus, from the vastness of the distant star systems to the intricacies of infectious diseases due to climate change, we've got you covered. Enjoy access to over 55 cutting edge journals, breaking scientific news, and over 1000 new articles every month. Whether you're a seasoned researcher or just curious, NaturePlus simplifies complex studies. Plus, it's all available right at your fingertips on nature.com.

nature, the key to unlocking the world's most significant scientific advances. Subscribe today at Go dot nature.com plus.

F
Hey, us cellular customers, we've got good news, so don't hit that skip forward just yet.

G
We're talking about their special customer event, us days.

F
What's us days? Well, it means exclusive offers just for their customers just to say thanks. Like $1,200 off any phone, plus $300 off any tablet.

G
Penn absolutely has that right. He didn't read it incorrectly. They must really like you guys.

F
Us days at us cellular exclusive offers just for you. Just to say thanks.

G
Right now, us cellular customers get $1,200 off any phone, plus $300 off any tablet. Terms apply.