How End-to-End Learning Created Autonomous Driving 2.0: Wayve CEO Alex Kendall

Alex Kendall founded Wayve in 2017 with a contrarian vision: replace the hand-engineered autonomous vehicle stack with end-to-end deep learning. While AV 1.0 companies relied on HD maps, LiDAR retrofits, and city-by-city deployments, Wayve built a generalization-first approach that can adapt to new vehicles and cities in weeks. Alex explains how world models enable reasoning in complex scenarios, why partnering with automotive OEMs creates a path to scale beyond robo-taxis, and how language integration opens up new product possibilities. From driving in 500 cities to deploying with manufacturers like Nissan, Wayve demonstrates how the same AI breakthroughs powering LLMs are transforming the physical economy. Hosted by: Pat Grady and Sonya Huang

Published: Published Nov 18, 2025
Uploaded: Uploaded Jun 11, 2026
File type: Podcast
Queried: 00

Full transcript

Showing the full transcript for this episode.

AI-generated transcript with timestamped sections.

0:00-1:30

[00:00] You know, if you're building a vertically integrated robotic solution, maybe you can go deep. But our ambition is to be the embodied AI foundation model for all of the best fleets and manufacturers around the world. [00:11] And to do that, unless we want to overload the company by building a separate neural network for each application, we need to be able to generalize. We need to be able to amortize our cost over one large intelligence and to be able to very quickly adapt to each different application that our customers care about. That's what we're trying to push. [00:40] Bye. [00:43] Beep beep beep beep beep beep. [00:45] Today we're talking with Alex Kendall, CEO of WAVE, about the shift from software 1.0 to 2.0, or from classical machine learning to end-to-end neural networks in autonomous driving. [00:55] Wave sells an autonomous driving stack to auto OEMs, similar to Tesla FSD, but for non-Tesla automobiles. Major car manufacturers globally, like Nissan, are choosing Wave to power their AV stacks. Alex started Wave back in 2017, when most self-driving software stacks were massive, hand-coded C++ code bases, covering every possible edge case, like navigating around double-parked cars. Alex bet the farm from the beginning on an end-to-end neural net approach to self-driving, and on the use of synthetic data and world models as the ultimate path to generalization and scaling. [01:25] Today, that architecture is reshaping AV and all of physical AI, including robotics. [01:29] Enjoy the show.

1:32-3:15

[01:32] Alex, thanks for joining us on the show. [01:34] Hey Bat, hey Sonya [01:36] One of the things that is very special about your company is that it sort of typifies the [01:40] AV 2.0. [01:42] meaning a new architectural approach [01:44] that I think is kind of demonstrated to be superior to, [01:48] to the AV 1.0 approach, [01:51] that people toiled with for so many years. Can we just start by defining what was AV 1.0? What is AV 2.0? [01:58] For sure. When we started the company in 2017, the opening pitch in our seed deck was all about the classical robotics approach at the time was to take a perception, planning, mapping, control, essentially break down the autonomy problem into a bunch of different components and largely hand engineer them. [02:16] And our pitch was that, OK, we think that the future of robotics is not going to be a system that's hand engineered to drive with a lot of infrastructure like high definition maps, but instead, [02:26] And we thought that the future of robots would be intelligent machines that have the onboard intelligence to make their own decisions. And of course, the best way we know how to build an AI system is with end-to-end deep learning. So for the last 10 years, we've been promoting an approach, a next generation approach, AV 2.0, that replaces that stack with one end-to-end neural network. [02:45] Now, of course, that may seem more obvious today, but it has been contrarian for many, many years. But I think today it's maybe unfair to make that basic distinction because, of course, anyone who's worth a grain of salt will use deep learning in various parts of the stack. But what you see in more incumbent solutions to autonomous driving is, of course, deep learning for perception and maybe for each different component, but still a lot of hand interfaces, still a lot of infrastructure on hide-efficient maps.

3:15-5:10

[03:15] and perhaps reliance on a lot of hardware. So our solution is still somewhat moved on. But today, rather than just being an end-to-end network, today, of course, we start to talk about foundation models. We start to talk about more of a general purpose intelligence, one that can understand not just how to drive that car, but many cars with different sensor architectures, with different use cases. And so really it all boils down to how do we build the most intelligent [03:43] robot that can scale without needing onerous infrastructure. So WAVE is sensor inputs, [03:52] Motion output. [03:54] gigantic neural net in the middle. [03:56] That's right, at a very simple level. But some of the interesting things you see that are maybe different from the story we've all heard with large language models is with autonomous driving, of course… [04:07] there are some interesting new factors. One is, of course, safety. [04:12] The system we need to make sure is safe by design. And what that means is that we can't just pump more data in and hope that hallucinations go away. But we need to design an architecture that is still end-to-end data-driven, but is both functionally safe and we can build a robust behavioral safety case. So that introduces some interesting architectural challenges. And then, of course, we also need to run real-time on board a robot, on board a vehicle. [04:42] with the onboard compute and onboard sensor limitations, make it an interesting challenge. But yes, it's the same narrative we're seeing playing out in robotics that we've seen play out in all these other AI fields, like language or game playing agents. It's that an end-to-end data-learn solution is out-competing anything we can hand code. And what we're excited to be pioneering is that the exact same narrative here in robotics and autonomous vehicles. - And when you guys started this in 2017, and it was a very contrarian approach,

5:11-6:44

[05:11] When people from the industry said, well, that'll never work because. [05:15] How did they finish that sentence? [05:17] I could count hundreds of those meetings. Yeah, typical arguments were, look, it's not safe. [05:25] It's not interpretable, can't understand what it's doing, or even simply it doesn't make sense. We haven't heard of this AI thing. And look, I think five, 10 years ago, it was probably reasonable to say end-to-end deep learning. [05:42] wasn't interpretable. But I don't think that's true today. I think today we have a lot of really great tools for understanding and responding to insights about the way these deep learning systems reason. But moreover, I think if you have the ambition to build any intelligent machine, I think it's naive to think you can build a [06:02] complex intelligent machine and actually make it [06:04] um, [06:05] you know, let's say strictly interpretable to the point where you can point to a single line of code or a single thing that [06:11] causally made the outcome occur. The beauty of intelligent machines is that they are so wonderfully complex and there I think the way that we're going to not just design them but understand them is through a data-driven structure. [06:26] Mm. [06:27] Can you say more about the before and after of the AV 1.0 stack and the billions of lines of code that goes into those systems versus... [06:37] the two of those systems today. And how quickly is that changing? Because my sense is that

6:44-8:20

[06:44] Deep learning, large neural nets hitting the physical economy is a much more recent phenomenon than people might appreciate. [06:51] Well, especially when you think about the path to distribution and deploying these systems, I mean, the automotive industry has just gone through a seismic shift in bringing out software-defined vehicles and the right hardware on these cars to be able to make them drive. Maybe one common point of debate is, is it camera only or camera radar, lidar, as a sensor approach to autonomy? [07:13] And just to be clear on our position, Wave, we want to build an AI that can understand all kinds of different sensor architectures. [07:21] only solution makes sense. Sometimes we're camera radar, LiDAR, and we train our embodied AI model on all of those permutations from very diverse data sources. And the car we just drove in is a camera only stack. We've got other cars that we work on with partners that have radar and LiDAR. And of course, there's different trade-offs that you take there. But more generally, we're seeing mass-produced cars from the best manufacturers around the world have a GPU on board, have surround [07:51] beautiful about that is there's now the opportunity to see this AI come out and benefit people around the world. I think that kind of [07:57] software-defined infrastructure is happening in automotive, [08:02] has perhaps not yet happened to the same degree in other robotics verticals, but I'm sure the market's going to move that way as well. And in general, having the right level of computing infrastructure in a scalable way and opening up these platforms to AI, I think, is what's really making this possible. And that's gone through a tipping point in the last couple of years.

8:30-10:16

[08:30] two or three years? Do you think it was FSD-12 that did it? Or when did that mindset start to shift? I miss the contrarian day. But even today, I was in a conversation this morning where I still see a lot of folks still say, yes, we need end-to-end AI. They've brought the big tech narrative around the future of AI. But they say things like, we need end-to-end AI with hard [09:00] can be some um you know belief that some hybrid approach is the way to go where uh where you want to try and try and take a rules-based stack and an end-to-end learn stack but often these approaches can get the worst of both worlds or just add cost and complexity so um you know i still think there is a distribution in the market of those that are leaning and moving fast and those that are uh you know are perhaps you know have some some catching up to do um but of course crediting the the breakthrough that [09:28] All of us that have been working in deep learning that really made this world changing and mainstream, of course, we've got to credit the large language model breakthroughs. I think they've inspired the world and opened up the market's mind to be curious about this technology. But... [09:43] Also, what we've been doing at WAVE, a year ago, we were just driving in central London. [09:47] Central London, I think, is a great proving ground because it's this unstructured, incredibly complex and dynamic city that our AI has learned to navigate around very smoothly, safely and reliably. But in the last year, we've taken it to highways to Europe, Japan, North America. Our cars were in New York City last week driving around there. And so bringing it global, being able to take it to different manufacturers' vehicles and show a product-like experience.

10:17-11:57

[10:17] growth is I think also really opened up a lot of inspiration around the world. [10:22] Hmm. [10:23] Why is it that you're able to launch in hundreds of cities worldwide... [10:27] And some of the AV1.0 companies need to actually go out and build an HD map. Just say a word on the difference in how... [10:35] technical differences are actually leading to differences how the machine's able to learn and how you're able to roll out. [10:42] autonomous driving is all about generalization generalization means being able to reason about or understand something you've never seen before [10:50] every time you go for a drive, you're going to see something new for the first time. [10:56] What did we see today? We saw a road worker rolling out some carpet thing in front of the road, but on a pedestrian crossing, but not wanting to step out. And we had to reason about could we pass them without yielding, for example. There's just an example from earlier today, but you could think about all the new things you see on the roads every time you drive. You're never going to see every experience in your training data. [11:20] generalize to things you haven't seen before to be safe to be useful around the world and that's what [11:26] has motivated our entire approach. So whether it's us, a manufacturer giving us one of their vehicles, and within a couple of months, us being able to drive it on the road. A couple of weeks ago, in September this year, we unveiled a vehicle to media with Nissan in Tokyo. [11:45] Just four months earlier was the first time we'd even driven in Tokyo and got hands on this vehicle. Four months later, we were having media drive in the car, experience it. And that was a new country and a new vehicle for us.

11:58-13:44

[11:58] So, [11:59] What that showed is that our AI was able to generalize. It's trained on very diverse data from around the world. It's trained on diverse sensor sets, vehicles. And so it was able to understand that vehicle's new sensor distribution and, of course, the complexity of driving around in central Tokyo. So I think that's a really great demonstration of generalization. And if we think about... [12:18] If you're building a vertically integrated robotic solution, maybe you can go deep. But our ambition is to be the embodied AI foundation model for all of the best fleets and manufacturers around the world. [12:29] And to do that, unless we want to overload the company by building a separate neural network for each application, we need to be able to generalize. We need to be able to amortize our cost over one large number [12:39] intelligence and to be able to very quickly adapt to each different application that our customers care about. [12:45] That's what we're trying to push. You mentioned reasoning in there in terms of how the model is reasoning through, you know, there's construction work or what do I do now? [12:53] In the LLM world, obviously reasoning is its own... [12:56] separate track of lots of scaling inference time computes techniques. Are you deliberately training your models to reason? Is it an emergent behavior of the models? Say more about what you mean about reasoning. [13:10] I think... [13:11] Reasoning in the physical world can be really well expressed as a world model. [13:16] in 2018 we put our very first world model approach on the road it was a very small 100 000 parameter neural network that could simulate a [redacted address] in front of us but we were able to use it as this internal simulator to train a model-based reinforcement learning algorithm there's a fun blog post if you want to see the history on that but fast forward to today and we've developed a Gaia it's a

13:45-15:19

[13:45] full generative world model that's able to simulate multiple camera and sensors and very rich and diverse environments. You can control it and prompt the different agents or scene in it. And that's an example of reasoning where we can train in the ability to simulate how the world works and what's going to happen next. [14:03] What happens when you bring this kind of representation on the road is you get some really nice emergent behavior. Like today when we saw we were driving around unprotected turns that were included, you saw the car nudge forward until it could see for itself and then completed the turn. Or when it's foggy in London, you see the car slow down and drive to what it can reason about. [14:33] I think that's key for getting... [14:37] safe and smooth autonomous driving. [14:38] So the world models are really key to teaching the model how to reason through their new scenarios. [14:45] You mentioned earlier the diversity of your data. Say a word about where all the data comes from. [14:50] It's becoming an enormous amount of data because, of course, [14:54] So unlike the language domain or image domain, when we're dealing with a typical self-driving car that has a dozen multiple megapixel cameras, radar, maybe a LIDAR, you're dealing with, when you aggregate that up, it's very quickly tens or hundreds of petabytes of data. So it's an enormous amount of data you have to train on, but it's the diversity that's really key.

15:19-16:56

[15:19] And we've solved for diversity in two ways. First one is by becoming a trusted partner across the industry and aggregating data across many different sources, from dash cams to fleets to manufacturers to robot operators. And the second one is being able to filter and really understand the data. Here, we've really worked hard to develop different unsupervised learning techniques to be able to cluster and find [15:44] unusual or anomaly experiences, and of course find the scenarios that our system is [15:51] performing poorly at and then drive the learning curriculum on those. But yeah, today we learn from a diverse set of vehicles, a diverse set of sensor architectures of countries, and that's really one of the key things that drives the level of generalization. [16:07] Does the increased growth of world models and simulated data, does that mean that you just don't need as many actual on-road miles? [16:16] I think there's two sides to that question, right? On the one side, yes, learning efficiency really matters. The second, you can't only rely on learning efficiency. At the limit, if we take our current approach and just scale it up, I'm sure it'll produce generic level five driving. [16:33] you know at the limit if you have unlimited training data this is really just a lookup data table with some some prior experience but that's not economically or technically feasible and so the question is how can you train this to be the most efficient data efficient system because I think efficiency will lead to not just improved costs but faster time to market and more intelligence so um

16:56-18:31

[16:56] uh efficiency comes from a number of different factors there's uh most importantly how the data curriculum you put in place but then the the learning algorithms how do you magnify the learning you have and i think world models are a really great opportunity for that they generate synthetic data and synthetic understanding that doesn't replace real world data but it recombines it and magnifies it in new ways it lets you pull in interesting insights and um [17:20] And I think these kind of approaches can really, really improve data efficiency. But across the board, I think [17:26] Working under resource constraints has forced our team to develop so many innovations. I'd also call out just the workflow because [17:36] In traditional robotics, when you're [17:38] tuning parameters or algorithms or designing geometric maps and things like this, there's very well established cultures and workflows. But [17:47] Our team, when we have 50 model developers working on one main production model or when we have... [17:53] an end-to-end net that we need to understand and introspect, or even the way that we deploy these systems to simulation or to the road and feedback. We've developed the entire culture from the ground up at Wave has been developed for [18:04] embodied AI for end-to-end deep learning for driving. The data infrastructure, the simulation, the safety licensing before we put systems on the road, this has not been a hedge or a side bet for us, but this is the entire essence of our culture. And I think [18:20] Doing this under resource constraints and doing this with full mission-driven conviction has led to a bunch of interesting innovations that, look, getting to where we are today, everything is about iteration speed.

18:32-20:12

[18:32] Speaking of your culture, um, [18:34] I'm picturing... [18:36] you know, a bunch of AI research types, machine learning engineers, that sort of thing, [18:40] How does the culture of your organization work? [18:42] differ from similar... [18:46] applied lab type environments. [18:49] Given the customer base that you serve, given that you're going after the automotive industry specifically, you're going to be a business owner. [18:54] with all of its quirks around supply chain and all of its requirements around safety. How does that influence the culture of your business? [19:03] Hugely in fact [19:05] For the first few years of Wave, we were really a group of passionate, embodied AI researchers. But in the last couple of years, I'm really, really proud of how our team has built out deep expertise in understanding the automotive industry, but also the ability to reliably deliver to our partners there. And that's a different culture. [19:35] made there is extraordinary. What have you all learned from them? I mean, I'm sure part of your job is to teach them about what's going on in the world of AI. What have you learned from them? [19:45] So [19:46] I think... [19:47] Some of the main things I'll call out [19:49] have been... [19:50] efficiency and reliability. The difference between technology and a product would be some of the main themes. I mean, the level of reliability required, but also the level of quality that is seen to really robustly prove these systems out before deployment and the pride that these companies take in that has been exceptional.

20:20-22:02

[20:20] car to to to drive how can your driving person personality really match the brand's preferences how can you provide that experience that really gives um gives brand differentiation and the great news is that i think we've been able to riff and brainstorm off these and come up with some some really neat technical ideas down down um uh you know down that uh vein but um [20:42] yeah ultimately safe high quality and personalizable ai has been some great feedback we've got from the industry [20:50] Can you talk about your path to market, actually, in partnering with the other OEMs? How did you decide to do that? And then how do you think the market landscape will play out for how autonomy rolls out? [21:00] Yeah, of course. Great question, Sonia, because... [21:03] since the beginning of wave we've been focused on the pitch i gave around in 20 learning being the approach to autonomy but we've tried a number of different go-to-market approaches over the years but [21:14] In the last couple of years, I've been hugely energized about working and partnering with the biggest and best consumer automotive manufacturers around the world. Why is that? Well, I mentioned how they've begun to introduce software-defined vehicles, so they have the infrastructure to work with autonomy. There's the market belief that this is a technology that can really thrive. And also, it's the chance to get to scale far beyond what we're seeing with the city-by-city robotaxis we're seeing right now. [21:44] Um, [21:45] But moreover, these are OEMs that are investing in the right infrastructure to go from not just driver assistance, but to eyes off autonomy, where you can actually take liability for the drive and give the user a safe and give them time back from their driving experience. So that's awesome.

22:15-23:48

[22:15] of the market, I think there's an opportunity to partner to work with some of these innovative platforms and to bring our AI to market to make these autonomous products possible. And it will only grow from there. These manufacturers don't want to stop at driver assistance. We're working together to build eyes off and driverless robo taxi products. But the key thing is that by avoiding retrofitting our own hardware on these vehicles, by putting them in natively as a software integration, we can move fast at scale. We can build low [22:45] I think this is going to be the path to see tens and hundreds of thousands of robotaxis rolled out around the world at an affordable price. And of course, this is all possible because of the level of generalization that this AI enables. [22:59] Tesla FSC is just such a game-changing product. My friends that have it, they can't imagine driving any other way. It's really cool that you're going to empower the 88 million other vehicles sold every year to be able to sell that experience as well. [23:16] A lot of people would jump in our car and come for a drive with some being skeptical about autonomy, but without exception, they step out with a smile on their face. It's a magical experience. And yeah, I can't wait for people to be able to try it around the world and make autonomy not just a robotaxi tourism experience, but bring this experience to people in eventually every city. [23:38] Mm-hmm. [23:39] What do you make of the sensor fusion confusion debate? Yeah. [23:42] The one that plays it on Twitter every year or so of Tesla gets confused if there's both camera...

23:48-25:41

[23:48] Uh... [23:49] And Lydar coming in. [23:51] Sorry, right there. [23:52] I think it's the wrong debate to be having. It's not the frontier question. The industry is really, I guess outside of Tesla, has really coalesced around a common architecture of a surround camera, surround radar and a front facing LiDAR stack. Now, this costs under $2,000. So it's automotive grade components, not the retrofit robotaxi components you see today. But having a [24:15] Frontier GPU, automotive grade GPU on the car, and that kind of sensor architecture is a really great platform to build. [24:23] l3 l4 autonomy eyes off or driverless it gives you the necessary redundancy it lets you deal with edge cases that you know cameras alone i agree they can get you to human level but we want to go beyond human level um and so i think this kind of architecture is is affordable scalable it's got the supply chain for mass manufacturer um and uh and it can [24:43] eliminate, I think, you know, eliminate all accidents and really drive superhuman levels of performance. So that's what we're seeing many manufacturers bring out on their vehicles, and we're integrating our AI. Of course, for a driver assistance system, camera only can work for a human level driverless system, or of course, I should clarify, 90 something, you can look at different stats, but 95% or above accidents, unfortunately, are caused by human error. [25:13] can you be human level, but you can eliminate a lot of human attention and accidents caused by that. But there are still accidents that to be able to solve would require perception capabilities that go beyond vision. And if we want to tackle that long tail, there are many ways to solve it. One of the ways would be to bring in some other sensing modalities like radar and LIDAR. So we're excited to be working with those kind of platforms, but crucially natively integrated into the OEMs vehicles themselves.

25:41-27:11

[25:41] Is it the same neural net that can drive on one OEM's car and another's car? How does that even work? Because I imagine each vehicle has slightly different positioned cameras, things like that. It comes from the same family. So we train a very large scale. We regularly train very large scale models. Of course, we iterate them on them month on month. [26:02] You know, that's one model that's common to [26:05] all of the fleets that we work with. But as you optimize to a specific sensor set or a specific embedded target, of course, you can start to specialize the model. But the beauty is that [26:15] 99% plus of the cost and the time and the effort is training their base model. And then we can build very efficient personalization to the specific customer. And so this lets us the scale, but gives us the ability to squeeze it to very efficient real time platforms and make it adapted to a specific specific use case. Are you going to let Pat personalize a super aggressive driver model? And I need to. What driving style would you like, Pat? [26:42] Yeah, pretty aggressive. [26:45] Safe, very safe, but you know. We can do that. [26:50] It's really funny when you build distributions around driving behavior. Yeah, you can really tell from the human training data we have, you can really tell when it goes from being helpfully assertive, let's say, to unhelpfully aggressive, and we can draw a clean line there. Yeah, there you go. What about you, Sonia? How was the drive we just had? Fantastic. It was comfortable. It was safe.

27:12-28:51

[27:12] actually like the way it was kind of nudging up when it couldn't see on the turn. It was very human. Yeah. Well, it's, uh, as, as, [27:20] you know, complex as we can get in Silicon Valley, but come to Tokyo or London or I was in the weekend in downtown San Francisco and, um, [27:28] Yeah, you really need the ability to predict and reason about other folks around you to be able to drive in a human-like way. And what we find is that [27:37] So if you're not able to smoothly go around double parked vehicles or deal with other dynamic obstacles, or even the prevailing row of traffic might not be aligned to the specific lane, but maybe there's a human-like way of driving, then what's awesome about the intelligence that we've built is it's able to reason about these things and keep the traffic flowing, keep interacting with road users in a very human-like way. I think this is going to be key for societies to accept and love robo-taxis. I can't wait to make that a reality. [28:07] have a hard time with today. [28:09] There are loads, and it's really hard to generically talk about one because they're so rare. [28:19] It's very hard to say, oh, it's always these types, because it's always... [28:22] you know, [28:23] a corner case is a couple of edge cases coming together in a corner and it's it's always confounding factors when you get something really obscure but we're driving we've driven in over 500 cities this year uh and so when you're driving at that level of scale of course you see things that you've never seen before road signs are written in a new language um actually maybe one way to to break it down is often we talk about driving broken down into safety utility and flow yeah safety being of course um safety critical behavior flow being the style of driving is it

28:53-30:43

[28:53] and then utility being the navigation and road semantics. And safety and flow we've found generalize exceptionally well throughout the world. We get almost uniform metrics in every country we operate in, in terms of safety and flow or comfort of the drive. But utility has been the really interesting one. As we've gone global, how do you navigate and how do you deal with road signs? How do you read different languages? How do you deal with different driving cultures? And so that's the one that's been interesting. [29:23] about this when we went from the UK to the US we needed hundreds of hours of data to be able to drive you know within 10% of our frontier performance but then when we went to Europe into Germany of course we'd already learned to drive on the right side of the road coming to the US would learn to do right turns at red lights then coming to Germany we had to learn to still drive on the right side of the road but [29:47] of course you can't turn right at a red light there but then on the autobahn you'd like this uh you need to drive we drive today up to 140 uh so uh so pretty fast there but um yeah uh it gets more efficient each time with exponentially less data in each new market because you've seen some of those things before yeah [30:05] You mentioned at the beginning that large language models were part of what flipped your approach from contrarian to consensus. Are you integrating large language models at all into your models? And I know some of the robotics companies that are getting started now are starting from this... [30:19] VLA, VLM base. Is that part of your architecture? 100%. In 2021, we started working on language for driving. I remember my team came to me at the time and said, hey, we should start a project on language. I said, no, no, no, guys, start up all about focus, keep focus. But they actually gave some pretty compelling arguments. So we started to play around with these things. And a year or so later, we released Lingo, which is the

30:49-32:26

[30:49] was special about this model was it could not only drive a car, see the wheel drive a car, but also converse in language and it'll let you talk to it, ask it questions, you know, what are you finding this risky, what's going to happen next, or even it could commentate your drive. And what's interesting about this is that [31:04] So there's a few benefits. One is bringing language into pre-training, of course, just improves the representations. Power gives more interesting information to learn from than just imagery alone. But then second, aligning the representation with language opens up a ton of interesting product features. It enables you to create a chauffeur experience where you could actually talk to your driver. No longer do you need a PhD in robotics to understand the system, [31:34] If you want to race around the commute super fast, then you can demand that. But then third, it gives you a really nice introspection tool where you can start to actually, you know, you could imagine regulators or our engineering team converse with the system and language to really diagnose why it's doing what it's doing or get it to explain its reasoning. So I think these are really clear benefits, which we're really excited to be pushing. That's super cool. And you're running it on the embedded compute. [32:00] We are. So we've put out demos that run off-board. On-board's challenging with what's in the automotive market today, but some of the next-generation compute, for example, the NVIDIA Thor that our next-gen development vehicle is going to be built with, will be large enough to run it on-board. [32:13] That's going to be cool. Very cool. [32:15] You've talked about how autonomous driving sort of provides a path to more generalized, embodied AI. Can you paint that picture for us, how you go from autonomous driving to...

32:26-34:13

[32:26] I don't know, humanoid robots or whatever other things you might want to embody AI. [32:33] I think we're going to be in the future looking at a ton of interesting use cases for robotics. [32:39] What we're seeing is that mobility is... [32:42] becoming possible, I think, much before manipulation. Manipulation is challenging in terms of access to data, [32:50] global supply chains for hardware and actually even the hardware designs themselves like i think tactile sensing is still a really hard challenge but inevitably it'll be a a massive transformative thing but maybe you know maybe is it the maturity of where self-driving was in 2015 but today you know our system is rapidly becoming a general purpose navigation agent giving it an arbitrary sense of view and a goal condition it's able to produce a safe trajectory so i think we're going [33:20] rapid advancement from not just consumer automotive, robotaxis, you can think about trucking and other applications, but this AI will enable manufacturers and fleets, [33:29] who want to build robots in any kind of mobility application. And of course, we're really excited to be working with frontier developers and applications over time as you go out across that robotics stack. And I expect we'll see more maturity in the coming years from manufacturing and manipulation use cases as well. But in the end, I think... [33:51] the benefits of having a large foundation model that certainly in automotive we can i think we have access to the largest robot and data supply chain and so we're really lucky in that regard to be able to push forward the intelligence there but generalizing that intelligence in new applications i think there'll be benefits from the model being able to experience multiple different verticals and it'll only make it more more general purpose

34:14-35:49

[34:14] um any applications you're excited about i mean i'm psyched to have humanoid robots walking around yeah me too i think um [34:21] I think they're going to be neat. Whichever form factor, I think humanoids will play a big part. I think other forms of locomotion as well, and then manipulation. There's some really interesting challenges in those spaces, but I think the same story is going to play out a lot [34:37] working on a narrow application [34:40] like when self-driving went to Phoenix, Arizona and put in a ton of infrastructure and expensive hardware to make it work is going to, I think, have limited runway. But working on general purpose, lean, low-cost hardware stacks that really focus on making the system most intelligent and robust, I think this is the recipe for scale. So, yeah, let's watch that space. Yeah. Do you think there are major research breakthroughs needed to reach kind of physical AGI, so to speak? [35:10] What do you think is the most promising direction? [35:13] Absolutely, I do. I think there's so much more ground to scale up the current approaches and we'll do that, but [35:20] I think... [35:23] We will get compounding returns from, I usually talk about four factors that drive performance. There's, of course, data and compute, but then also the algorithmic capabilities and the embodiment. What is the hardware and capability on the robots? And I think we need to push all four. And on the algorithmic side, there are so many opportunities for growth. I think a key one is measurement. How do you actually measure and quantify these systems?

35:53-37:26

[35:53] really have a simulator that closes the real world gap at scale and can run efficiently. I mean, it's no secret that these generative world models are very compute intensive, but [36:06] Having a good measurement system will just drive efficiency and iteration speed. So that's a key one. People often talk about being a chicken and egg. If you have a perfect simulator, you've solved self-driving and vice versa. And I really believe that. [36:19] AlphaGo showed that when you have a perfect simulator, you can just solve problems through Monte Carlo Tree Search. And so I think that's going to be the case in robotics as well. [36:30] So, yeah, one is measurement. Another pillar is measurement. [36:34] building more generality into the model how can you build out more modalities and align those different modalities in their reasoning i think this is going to open up [36:43] um, [36:43] new use cases, particularly when it comes to human-robot interaction and navigation. I was going back to the utility problem before. Some of these things I'm super excited about. And then the last one is just [36:56] engineering efficiency. I mean, training these systems and the data requirements is extraordinary. And so I wouldn't understate, I think the most sexy part of this problem is the [37:08] efficient infrastructure to train and serve these models. Yeah. And getting that right. [37:13] I think it's a real competitive advantage or disadvantage. [37:19] We start by talking about AB 2.0. [37:22] Thank you. [37:22] Someday, I imagine we might be talking about AV 3.0.

37:26-38:59

[37:26] What could AV3.0 look like if you go 5, 10, 15 years in the future? Are there any other big leaps in this industry that you think we'll see? [37:34] You said that with such deadpan. [37:39] So the whole premise of AV 2.0 was all about [37:43] putting intelligence on the car and not needing infrastructure and a ton of, you know, ton of overcooked hardware, but really making the system intelligent. [37:52] And so I think we're seeing that emerge now with this system that can generalize to the world with all of the onboard scalable intelligence and compute. [38:02] If I were to speculate where AV3.0, we haven't thought about in depth lately, but one idea could be taking the intelligence outside the car. So, I mean, when you start to have... [38:14] majority prevailing autonomous vehicles, you could imagine a ton of new things you could do when they start to communicate, when they start to interact with each other. Why do we need traffic lights in the future if they can coordinate? Why do we need all these sensors if you can actually just... [38:29] communicate with the av in front of you to be able to see around corners i mean of course i'm speculating here it opens up tons of interesting cyber security questions um communication latency questions things like that but uh i don't know i'm uh i'm all up for embodied ai and um uh if we can build a safer and more accessible system by taking the intelligence uh not only in the car but beyond maybe um [38:52] Maybe that's a path, let's see. I think that's really interesting. If AV3.0 is the point at which it's sort of a mesh network,

38:59-40:44

[38:59] And, you know, at that point, maybe humans aren't allowed to drive because they can't communicate with the mesh network the same way that the robots can and, you [39:06] Or maybe there are special places that humans go to drive just for recreational purposes. We always try to do a taste check. You know, it's all autonomous. [39:14] Yeah, interesting. How do you hire and how do you attract people with how hot the AI market is these days? I love that question because, you know, at the end of the day, our team is our product. Our team are the most important thing to making this possible. And we talk a lot about at Wave about... [39:30] being a place where you can do the best work of your career. And what that means for me in Embodied AI is having a set of colleagues around you that inspire and excite world class in what they do, having the right resources, the right culture unblock you. But. [39:45] I think uniquely at Wave, we are able to bring together [39:49] really a frontier AI environment with a near-term product opportunity in automotive. So if you want to work on intelligent machines, [39:57] and see your system brought out with the scale of impact of ChatGPT in robotics, I think this is a place where we can do it. The other thing is that we've gone global. I mean, we have teams in London, Stuttgart, Tel Aviv, Vancouver, Tokyo, Silicon Valley. And, you know, [40:17] wherever there's an amazing, almost, you know, some of the major AI and automotive hubs. And, you know, we're really looking to build a global culture that can bring this product to the world, work with customers around the world, and, you know, and most importantly, collaborate with the very, very best people. Yeah, so anyone who's interested in pioneering embodied AI, pushing the frontiers, and actually turning it into a game changing product, come chat, we'd love to speak.

40:44-41:34

[40:44] Wonderful. Alex, you've believed in the future for end-to-end neural nets and self-driving and in the physical economy. [40:53] for longer than just about anybody. And it must be incredibly fulfilling to see that vision start to come to life. [41:00] Congratulations and thank you for joining us. [41:02] Thank you, Sonia. Thank you, Pat. It's such a privilege. [41:05] *music*

Want to learn more?