Loading summary
Jessica Rumbelow
Foreign.
Brian McCullough
Welcome to another bonus episode of the TechMe Ride Home podcast, another portfolio profile episode. As always, I'm your host, Brian McCullough. And hey, look at this. Our friend is back.
Chris Messina
Hey, I'm back.
Brian McCullough
Chris Messina is here. Hey, Chris.
Chris Messina
Hey. Hello.
Brian McCullough
It's been a while, but we're here because we are going to talk about a company that Chris and I invested in through the Ride Home AI Fund. This company is Leap Labs, and we're speaking to Jessica Rumbelow. Jessica and Jugal Patel. Hi there. And you are the founders of Leap Labs. So this is going to be a more getting into the tech and what you're actually doing. This company is making an advancement than, oh, we have a product. But before we do that, just tell us a little bit about Leap Labs, what you're attempting to do, and then let's get into the science of it.
Jessica Rumbelow
Yeah. So we are automating scientific discovery from data. There's a lot of data in the world. Companies spend huge amounts of money gathering data, doing R and D. But the outcomes from this process are like, pretty uncertain, pretty noisy, pretty path dependent. There are lots of good reasons for this, which I'm kind of excited to talk to you guys about. But what we're able to do basically is extract even complex combinatorial nonlinear patterns from arbitrary data sets at incredible speed and scale. And we've made a bunch of novel.
Brian McCullough
Scientific discoveries doing this, science being the key. And we're going to get into all this specifically, but just in a broad sense, you've even written about this online. Is there a sense that the current models of ML and especially LLMs, they're not exactly perfectly designed for scientific research and the like?
Jessica Rumbelow
Yeah, yeah. So there are a couple of major problems here. These are language models. Right. They're trained on language. In the first case, in the first instance, really, language is actually just a really noisy abstraction over the real world, over underlying data, over observations, over the true generative functions of the world. So there's that problem. The much bigger problem is that our scientific literature is absolutely terrible. The replication crisis is real. And LLMs can't tell the difference between papers that replicate and papers that don't. So, like, a lot of the time they're just wrong. Even when they're not hallucinating, even when they are perfectly recalling facts from their training data, those facts are fundamentally unreliable.
Brian McCullough
Right. So you're saying this is even beyond the hallucination problem, that any user of an LLM is familiar with what you're.
Chris Messina
Saying we have to first Explain actually what the replication crisis is, which essentially means that do that for the basic listener. It's sort of like, how does a log get made? It's sort of like how does science get made? And it's like you have a theorem or an idea thesis about the world, and then you go about designing experiments or several to test out that hypothesis. I'm sorry, I'm getting my words confused. See, this is the lossiness of language. And as a result of doing successive trials of that hypothesis, you come into clearer coherence over time that asserts that whatever you thought is true, perhaps is true. That's one set of data about the world. And the question is, can you replicate that elsewhere? That's the replication part of this. And so if you can replicate it over and over again, for example, one plus one equals two, doesn't matter where you are, what language that's in. I mean, not that you would scientifically prove, I suppose, a mathematical theorem, but one plus one is two, basically. In most cases, almost all that we've been able to reproduce comes out the same. And so that's the replication aspect, where science gets its foundation, as opposed to human laws or human insights about the world that are subjective or simply not subject to replication. And therefore they're whimsical, perhaps. And so I think what I'm hearing, just to play it back again for the listener, is the real challenge is that we have a lot of science out there that asserts that the world is a certain way. There are some attempts at replication, some of which has been successful, but in fact much of which is not. And if there isn't an enormous amount of effort put into replication, then the science from which we derive so many assumptions about the way things work actually is built on faulty pretenses.
Jessica Rumbelow
Exactly. It's actually really upsetting. It's kind of horrible. Right. Because this happens because the incentives largely in academic publishing, are. Are just fundamentally misaligned. They incentivize paper count, they incentivize citation count.
Chris Messina
Sorry, can you say what those incentives are first and then you can talk about the outcomes?
Jessica Rumbelow
Yeah. So if you're a research scientist, say normally in academia and in order to progress in your career, in order to get jobs, get promotions, become an esteemed scientist, you have a.
Chris Messina
The most important thing to be esteemed.
Jessica Rumbelow
Naturally, like, people care about that.
Chris Messina
It's like being an influencer on TikTok or something. You want to be esteemed in the scientific world?
Jessica Rumbelow
Of course you do. Who doesn't?
Chris Messina
Of course.
Jessica Rumbelow
And the way, like, we kind of metricize this in science is.
Chris Messina
That's a good word.
Jessica Rumbelow
Thank you very much. How many novel papers have you published? How many citations do those papers have? Are those papers published in, like, good journals? Right. And on the face of it, this sounds pretty sensible, but actually it's a terrible, terrible idea, and we should immediately stop doing it and do something else. Because what happens is you're a scientist, you've got some data, you do some experiment, you have some hypothesis. Maybe you don't actually find anything that interesting, or maybe you find something interesting, but only if you do the analysis in a very specific way. Or maybe you run your analysis many, many times and pick the one that is most exciting and convincing. And then you inflate the importance of your discovery and gloss over the inconvenience details so that you can get that sexy publication that will get lots of citations. And, like, who has the time to replicate other people's work? Right. It's boring.
Chris Messina
Replications almost never get, especially if it's so idiosyncratic. Right. You have to, like, recreate the biases that led to the outcome, which therefore is unlikely to produce the same results. And so you sort of just wasted a bunch of time. Whereas you could be exploring novelty, you know, for fans of, like, the PC revolution, like, if you played a game called Civilization, like the old version, you know, it's sort of like the scientific world, as you're describing it is sort of like going out into the black areas and just like, finding new spaces. But then you're always being raided by barbarian hordes. You know, it's just sort of like you never get to build a civilization. So I think this is kind of.
Jessica Rumbelow
What you're talking about in this case being peer reviewers, correct?
Chris Messina
Yes. Yeah, that's right.
Jessica Rumbelow
Yeah. So obviously, you know, this is a generalization. Most scientists are not out there committing academic fraud. I hope, like, lots of scientists actually take.
Chris Messina
How much of it is intentional, would you say, where it's actually, like, fraudulent versus, like, incidental? And the, as you say, the incentives encourage a set of behaviors that are about, you know, glowing up the research.
Jessica Rumbelow
I think everybody who publishes is.
Chris Messina
That's a large indictment.
Jessica Rumbelow
Okay, well, if you want to work as a scientist.
Chris Messina
No, but realistically, sure. I mean, I think what I'm interested in is the scope and scale of the problem. And what it sounds like you're saying is that all the incentives point in one direction, which is sort of like in the social media world, like, number goes up. So if we destroy democracy in the process, that's fine because we got more followers and we got more engagement. Right.
Jessica Rumbelow
We got more grant funding for our research lab. You know, our university has like really.
Chris Messina
High new buildings named by other people, et cetera. Okay.
Jessica Rumbelow
And it's completely understandable, you know, like it's not really the scientists fault at all and a lot of them are extremely, extremely concerned about this. But it's very hard to kind of change the system whilst also succeeding in it. Right. Yeah. So I don't, I just, I want to be very clear. I'm a scientist by, by, by, by training, by background. I was in academia for a long time. I have a PhD. I've been through the system. Like the vast majority of our employees here at LEAP are also scientists of one kind or another. Like we love scientists, we're here for the scientists. But they're working inside of this structure, this incentive structure that is actively pushing against doing really novel work, really exciting work. The incentives are to play it safe, big up your results.
Chris Messina
Yeah, sorry. Also, just because I think the diagnosis of this problem is critical to arrive at the solution which you're going to describe momentarily. And I think it's important then to I guess ask the question about how science became somewhat perverted and if it's because of the nexus of science and capitalism where capitalism tends to infect everything that it sort of touches and therefore absorbs the elements of the profit motive in order to organize effort or labor. So for example, if you can imagine, and I didn't live back then, but I understand if there were patrons you could sort of invest in the sciences and the idea would actually be that the ideas would battle and it was less about blowing up some big theory, but instead of having big ideas about the world and then trying to find ways to discover if those ideas were valid and then developing various tests and then the replication piece was actually the economic kind of driver of participation in science and that was obviously in contrast to religion. Am I off in this? Correct?
Jessica Rumbelow
My history, I mean that sounds, that sounds broadly correct. However, I would point out that science actually seems to work a hell of a lot better in industry than academia because like your outcomes are directly tied to how successful your company is.
Jugal Patel
I see.
Chris Messina
But so I'm trying to sort of like, like create the lineage of the incentive structure in academia where like blowing up the outcomes of your results, where you're only doing incremental kind of, you know, expansions of a thought or an idea is like, that feels how the incentive structures are misaligned. So you're doing incremental work, but you're trying to blow it up into something that's much more significant. And then you're moving very quickly through the process to get more money and grants to just keep the game going. So it's like an infinite game, but it's not quite the way it works in academia, or I'm sorry, in industry, where the outcomes of your effort will actually lead to products that get to market and then you're actually competing in the real marketplace. And so if your stuff doesn't work, then obviously you can't sell a product. And so that's the sort of corrective aspect that exists in the direct capitalist market.
Jessica Rumbelow
Absolutely. I think it's important to note that these problems in academic publishing in general also infect industry. And so because everybody's drawing from this literature base which is incredibly unreliable.
Chris Messina
Okay. And it forms this, it's feeding itself. It's a set of like corrosive functions on the information.
Jessica Rumbelow
Large language models are going to make this. So that's right.
Brian McCullough
Okay, so, okay, again to bring it back for dummies to understand here, essentially what we're saying is we've had this LLM revolution. Everyone's like, great, let's train it on the corpus of scientific literature and we're going to get novel insights. And your hypothesis is that maybe that's not going to be successful. And your solution to that is the discovery engine. Correct. So tell us about. Yes, please tell us about the discovery engine.
Jessica Rumbelow
Yeah, so kind of leaning into this idea that language is a really lossy abstraction over data. The logical thing to do is to go straight to the data. The problem is that humans are actually really, really bad at looking at massive or even small numerical data sets and finding patterns in them. We have some tools, we have some statistical tests, we have some analyses that we can run, but it's incredibly laborious, it's incredibly path dependent, it's full of confirmation bias because you can't, well, up until recently, you can't systematically find all of the insight there is in a data set. You can't find all of the patterns.
Chris Messina
So you use this phrase path dependency. And I think that's also a little bit jargony. My understanding is that it sort of requires that you do a series of steps, and in those steps you actually cut off a bunch of other possibilities, even if those other possibilities are valid. And so it's almost like going from a cpu, which is like sequential, into sort of like a gpu, which is like relational. And so essentially you're Creating path dependency means that you don't get to find lateral or latent relationships that might be present because you've gone down a certain path and going backwards is just too costly or just won't work.
Jessica Rumbelow
Is that exactly that? Yeah. You end up exploring only like a tiny fraction of all of the possible insight discoveries, information that might be there in your data, which is a problem. And I guess our key insight at LEAP and the thing that powers our technology is that machine learning models, especially deep neural networks, are just extremely good at finding complex patterns in data. And this has been true for a while. The issue has been we are really bad at understanding neural networks. So maybe they learn all of these interesting novel patterns that would be really important for us to learn about, but we've got no way of getting them out. And that's kind of where LEAP comes in. Our core research is really interpretability. We train big neural networks or even smaller machine learning models on completely arbitrary data sets and then we use interpretability to extract what those models have learned from that data. Often it's a lot of stuff that scientists already know because they're domain experts, but way more often than you would expect, we find stuff that's completely new. And that's where our recent publications over the past few weeks.
Chris Messina
Do you have some examples that could like bring this to life?
Jessica Rumbelow
Yeah. In fact, Joogal loves to talk about our. Actually this was our first ever case study that we did. Yeah, it was pretty exciting with math.
Jugal Patel
Yeah, yeah, yeah. So we had spent months working in R and D trying to get this system to work end to end and it bugging and it was such a struggle.
Chris Messina
And I about that was real time. But bugs in software.
Brian McCullough
No, no. Share, share your struggles.
Jugal Patel
Real anything. But the magic happened when we were thinking, oh, we're going to have to go through a ton of data sets and work with a ton of scientists before we find anything that is worth knowing. That's a novel discovery we can publish. The very first collaborator that we worked with, he was a plant biologist from an institute, a research institute in France. He was working on trying to figure out the right combination of genotype of the plant and nutrients of the plants and environmental conditions in order to make the plant root growth more efficient. And this is very important because in order to grow climate resistant crops, you need to understand how to make these plants work in a different way, to be flood resistant, to be drought resistant, et cetera. And this is incredibly important for food security. So he had this data set and we were not very hopeful about this data set, because it only had 700 rows, 700 samples of data. These samples only had 20 features. And then when we actually narrowed it down to the features he cared about, it only had seven. So you're talking about like a tiny data set. When you're talking about. When you think about the size of data sets that are used, you know, in AI today.
Chris Messina
So to sit. To set what you're about to say up, I think tell me a little bit more about how this data was collected. Like, over what time period, period? Because 700 rows, it sounds like, okay, that's a good amount of information. But it's maybe not so much as you're saying. Right. You'd have 700,000 rows, 7 million rows. Much more. So in this case, how long did it take him to assemble this? And what roughly was the process by which he gathered this?
Jugal Patel
That's a really good question.
Jessica Rumbelow
I can say I know all of that. So, Matt, our collaborator from Institute for Plant Sciences, he does most of his experiments in a screening lab. So he grows. Can't pronounce the name. It's in our blog post. It's one of test species that are used because they grow really quickly.
Chris Messina
I see. So it's like, what is it? A fruit fly of plants for plant biologists.
Jessica Rumbelow
Exactly.
Chris Messina
Okay.
Jessica Rumbelow
So, yeah, so he grows plants for only like, maybe 15 days. And he takes lots and lots of measurements. Both of the roots, they're on these really cool slides. So you can see digitally measure. Measure the root structure and also of. Of all. Of all of the conditions. So he will typically take plant that he's growing. He will take one measurement per day. And obviously that measurement would also contain all of the information about, like, the mutation, the genotype, and the nutrient profile of the soil that the plant is growing in.
Chris Messina
Got it. Okay, great. So, like. And also just to, like, put a finer point on this, in the world of digital simulations, the idea is how many times can you simulate something over a frequency? And the more you can do, obviously, the more you can sort of see different things happening. But in the biological world, if you're dependent on a life organism doing the simulation, then you're a little bit less independent from. Well, you're more dependent, I guess, on the actual world of biology. Okay, go ahead.
Jessica Rumbelow
We want to stay as close to the real world as we can in our data. You know, we don't.
Chris Messina
And also on this point, right. If he's actually observing the real world and getting data from the real world, this also might help to address Some of the issues that you were talking about before where if you've got all this data that's in the LLMs, but it, you know, wasn't actually captured with a great amount of fidelity or authenticity, then that can also cause spoilage down the line.
Brian McCullough
Okay, not to interrupt one more time, but I. This might be useful as people are listening. Their website is leap-labs.com L E A P-labs.com and you can see some of the papers and blog posts that we've been talking about. So Jugal, continue how this research went?
Jessica Rumbelow
Yeah, absolutely.
Jugal Patel
So we put this data through the system and it flew through in a matter of hours. And what came out was not only patterns that the scientist knows about that he knows to be true within his domain, which gave him a lot of confidence in the system that it's working, but also a novel genotype and nutrient combination that he was unaware of that maximized this root growth feature that he really cared about to maximize the efficiency. This was after he had already, as a domain expert, spent months scrolling through Excel trying to find patterns in this data. And our system, which is completely agnostic, was able to find these patterns that he had.
Chris Messina
What does it mean to scroll through Excel? Like literally he's like looking at numbers and trying to make correlations. That seems like a wild.
Jessica Rumbelow
It's like I imagine he's doing kind of your standard scientific analyses.
Chris Messina
Okay.
Jessica Rumbelow
But they kind of fall down if you're looking for non linear patterns and.
Chris Messina
Not, I imagine if I were like a fly on a dartboard trying to figure out what surface I was on, that feels kind of what you're describing, you know, and then you like zoom out, you're like, oh like, you know, here's the red square and you know, this one. Anyways, not really a dart player, but yeah. Okay, I see.
Jugal Patel
Yeah. I think what was the most exciting was when we got on a call with him after the delivery of these results and he immediately said, when can we work on another data set together and we have something here. And then also he's already changing his experimental process because of what our system allows him to do. So previously he would do like a very simple targeted experiment and only measure certain things because he only has the capacity to go through the results in this very.
Chris Messina
I see, so he can broaden the aperture effectively. Yes, and take in a lot more data. Because he has the capacity to take in more data. But before he couldn't actually analyze it. And so you're giving him sort of a super, like a super Skeleton tube or like a brain on top of his brain to be able to understand. Okay, got it.
Jessica Rumbelow
And so his research is going like.
Jugal Patel
Now and he's using what he got from the discovery engine to guide his research.
Chris Messina
Amazing.
Brian McCullough
Is another analogy for the discovery engine that like, as opposed to, okay, I have a hypothesis test, yes or no. Hypothesis test, yes or no? Hypothesis test, yes or no? Maybe I use an LLM to prompt it to generate hypothesis. What you're saying is the discovery engine, basically it's a delivery engine for here's 18 new hypotheses you might want to try.
Jessica Rumbelow
Kind of. Okay, but I think it's really important to note that everything we find is empirically validated. It's not the model. Well, we do two things. We provide patterns, discoveries, insights, whatever that are empirically validated in the data. So these are not the model extrapolating saying like, hey, why not try this? This is, here is a pattern that I have found in your data and here is all of the evidence for it.
Chris Messina
And then like citations for these discoveries effectively leading back to the data, the.
Brian McCullough
Source of the data or validation. Yeah, yeah.
Jessica Rumbelow
So it's like here is a subset of the data and if you filter by this pattern on this data, you will see exactly the pattern that the model has found. So it's empirically validated, built in. Of course we can also get the models to extrapolate from the data that we have, for example, with the plant biology stuff, to find combinations of variables that aren't actually present in the data set. But we flag these as more speculative. The model thinks for maximizing the thing that you care about. This region of the parameter space seems promising. But a lot of the time, because this data analysis, when you do it manually is so laborious, there's so much low hanging fruit just by finding these combinatorial patterns automatically.
Chris Messina
So one of the things that sounds interesting, challenging and I don't know, I suppose this is like where you guys are at in terms of the business is thinking about like context and focus. So I'm sort of imagining that what you guys are building from this discovery engine is almost like Google Earth but for reality. And so you can sort of say, okay, if you want to discover some new plant type that survives very well in a certain region, then Google Earth knows where all the temperature zones are and sort of like points you in a part of the world. And then it's like, okay, now you want to zoom in and the level of zoom that you want to have will determine your ability to then Maybe try a set of experiments or to learn about that part of the world. Now this is obviously another very gross bastardization metaphor. But in terms of the known reality that people could relate to, perhaps it's like the world is there, reality exists. The question is, how do you understand it and how do you bring together the right sensor data about it in order to make interpretation of reality? And then how do you apply these mathematical models or machine learning models to see those patterns that exist perhaps through the world from one end of the planet to the other? So from an information perspective. So my question is kind of that it sounds like it would be great to be able to dump in all of the data. Let's say if you just got rid of all of the scientific knowledge that's ever been produced and you just started today with all of the sensors that exist in the world so that you have some ground truth and you just let the models run to say, where are you seeing patterns? And then later on we develop language to describe what these patterns mean. I mean, that would be, on the one hand, amazing. It would take a long time. It would take probably all the compute in the world and we'd have to drain the sun to make that happen. How do you then sort of apply this to the right size problems? Right? Like you've talked about this, like plant biologist. That sounds like very specific, very tight. Now he's expanding because he doesn't have to worry so much about the data analysis. These patterns will be discovered roughly as a result of him producing more data and putting the data into the system. But at some point you almost end up with too much noise. So is noise ultimately a problem in terms of getting too much data or is that not something that you're worried about?
Jessica Rumbelow
I mean.
Brian McCullough
While single AI agents can handle specific tasks, the real power comes when specialized agents collaborate to solve complex problems. There is, however, a fundamental gap. We have no standardized infrastructure for these agents to discover, communicate with and work alongside each other. That's where agency agntcy comes in. The agency is an open source collective building the Internet of Agents, a global collaboration layer where AI agents can work together. It will connect systems across vendors and and frameworks, solving the biggest problems of discovery, interoperability and scalability for enterprises. With contributors like Cisco, Crewai LangChain and mongodb, Agency is breaking down silos and building the future of interoperable AI. Shape the future of enterprise innovation. Visit agency.org to explore use cases. Now that's a G N T C Y.org.
Unknown
Everyone knows that Feeling wanting to experience more stories, but struggling to find the time. That's where Audible changes everything. With over a million audiobooks and Audible originals, there's a story waiting to spark anyone's imagination. Take the Paris Apartment by Lucy Foley, the gripping psychological thriller that's keeping listeners on the edge of their seats. Imagine unraveling its mysteries during your morning commute or losing yourself in its twists and turns while doing household chores. That's the magic of Audible. It transforms daily routines into opportunities for thrilling discoveries. The best part? Members get access to thousands of included titles with new content added regularly. From best sellers to hidden gems, every genre imaginable is at their fingertips. And with one easy to use app, switching between favorites or discovering new passions has never been simpler. There's more to imagine when you listen. Start a free 30 day Audible trial and get your first audiobook free at audible.com wondery that's audible.com wondery okay, so.
Jessica Rumbelow
There'S like loads and loads of stuff to talk about in what you just said.
Chris Messina
I know, I'm sorry.
Jessica Rumbelow
No, no, no, that's fine. It's good, it's good.
Chris Messina
I end up sort of dropping these zip files and then we expand them and they're like all these like files and it'll go into all these different folders.
Jessica Rumbelow
You're like, right, well, I need to talk about that rock. What can I say? I think noise is a really interesting point. We can talk about sources of noise in the journey from the real world to like understand.
Chris Messina
And also I don't want to be like pejorative about noise. Noise is beautiful, you know.
Jessica Rumbelow
So, yeah, that's a lovely sentiment. I'm not, I'm not such a big fan of noise myself, but I think that the point about like data scale is also really interesting. Maybe I'll say that first because like a couple of years ago when we were like, we have this idea, we're building this like really cool interpretability neural networks. We think probably know stuff that we're not aware of. Maybe this could be a new scientific method. Like we think we might have something here. We were kind of envisaging something very similar to what you suggested, like massive data sensors, robots, all of the data in and we will find all of the patterns and it will be amazing what has actually happened. Kind of as our case study with Matt. Every first case study has shown that there is actually so much low hanging fruit even in small data sets. Humans, God love them, we're just really, really bad at like finding these patterns manually ourselves. Like we, we estimate there were probably like trillions of dollars on the table hanging around in R and D data sets on servers just, just, just because we don't know how to find them. I figured we're going to do that first and then later we will tile the known universe with sensors and figure.
Brian McCullough
Out, well, to that end, I mean we're talking about biology. You know, you're thinking of like medical discovery and stuff like that. But is this applicable to basically anything like materials science? Like what, what if I'm listening? What's a left field thing that maybe I could potentially use this for?
Jessica Rumbelow
Oh, left field, yeah. I don't know. So I was having a conversation with my friend the other day who's like really into like Brian Johnson and Quantified self and health and longevity.
Chris Messina
The don't die movement.
Jessica Rumbelow
Don't die. I'm a big fan. I don't want to die. And you shouldn't either. Yeah, yeah, so, so I, I mean obviously I see everything in terms of data sets these days. So I was like, hey, give us your data, we'll run it through disco. We'll find the patterns. We are very much by design, domain agnostic to the neural network. It's all numbers. Doesn't really matter in terms of go to market and actually serving this technology to scientists in a way that makes sense for them and fits in with their worldview and their processes. That's obviously a little bit different, but yeah, under the hood, I mean you can train neural networks on anything these days.
Chris Messina
Go ahead, Brian.
Brian McCullough
Yeah, you mentioned one, the first case study. How many folks at least to date are you working with? So it's not just the one case study. How many other folks have you been working with?
Jessica Rumbelow
Yeah, so we've got, how many publications have we got? Yeah, we've got, we have published four preprints. We are also working on a collaboration with Meta, that should be another publication soon. That's actually in materials. Yeah, we've done a couple of other case studies that didn't make it to publication because it was just validation. Like we found a load of known patterns but nothing's.
Brian McCullough
But across multiple different areas, people are like, this is useful. So you're proving out that it's useful to folks in a lot of different areas.
Jessica Rumbelow
Yeah, we've got plant biology, meteorology, advanced materials, immunology, catalyst stuff. I guess that's advanced materials as well. Yeah. Oh, Alzheimer's, all of that medical clinical stuff, that organism thing. Oh, ocean proteomics.
Brian McCullough
Yeah, I want to get this in here. Again, because I'm imagining people listening and being like, oh, I'd like to test this out. So we're not wrapping yet because I want to hear your backgrounds. And Chris has some more questions too. If I am intrigued by what we're talking about right now, where should I go to start working with your model?
Jessica Rumbelow
You should email us. We are in the process of standing up a self service dashboard, which is obviously very, very exciting. But yeah, very, very much.
Chris Messina
Also, what's a good email for you guys?
Jessica Rumbelow
Helloeap-labs.com. yeah, dot com.
Chris Messina
Yeah, yeah, yeah.
Jugal Patel
Checking out our blog page on our website will give you a good idea of, like, the variety of different scientific domains you've worked in, what we've been able to find.
Jessica Rumbelow
Yeah. And you can follow me on Twitter for occasional rants about how science is broken and how we must immediately fix it.
Chris Messina
Okay, like, sorry, Brian, I'm going to jump in, like, because I could, I could like, obviously like spool out this conversation, you know, indefinitely. I guess it would be valuable to get a sense for where you guys in terms of, like, where you are in terms of your startup journey. What are sort of like the next steps? What's the roadmap? You know, I'd love to continue to talk about, like fixing the incentive structure in science, but at the same time, you guys do live in the capitalist system. We did put investment into you guys and so, you know, we'd love to know kind of where things sit in terms of the evolution of the business.
Jessica Rumbelow
Yeah. Do you want, do you want the story from the start or just the fact that.
Chris Messina
Yeah, why not?
Jessica Rumbelow
Yeah. Okay, cool. So Jigal and I founded leap. Two years.
Chris Messina
Yeah, two and a half years.
Jessica Rumbelow
Basically, to continue some interpretability research that I'd been working on. And initially we were like, we're going to build an interpretability engine because interpretability is really important. You can use it to detect bias, you can use it to predict failure modes on out of distribution data. Oh, and maybe you can use it for scientific discovery as well.
Chris Messina
Sorry, just so I understand, when you talk about interpretability, what is the format of the output? Like, what do I get a report that says, here's how to interpret what you found, or is it like something else?
Jessica Rumbelow
So in interpretability in general, there are many, many, many different methods. A bunch of these are like proprietary stuff at leap, and they can output information in all kinds of different formats. We're really leaning into using violin plots and bar plots at the moment because.
Chris Messina
I'm sorry, what plots?
Jessica Rumbelow
Violin plot. I Need to show you a picture. If you are curious about violin plots, visit.
Chris Messina
Violin. Like the. Like the. Like the musical instrument. Like the. How do you spell that?
Jessica Rumbelow
Violin. As in. As in.
Chris Messina
Okay, okay, got it. All right.
Jessica Rumbelow
You understand it if you look at them. Maybe the short answer, because I know time. Time is tight, is that there are many different ways to kind of express the patterns that we find in a human readable format. We do, like, some charts and plots of various kinds. We also provide logical rules that allow you to filter the data to kind of find the samples that support this pattern. But there are many different ways. Like, data visualization is incredibly interesting.
Chris Messina
Totally.
Jessica Rumbelow
Yeah.
Chris Messina
Okay, so that's kind of like how it started.
Jessica Rumbelow
Yes. Yay.
Chris Messina
Okay. And so you went from interpretability into disco.
Jessica Rumbelow
Yeah. Because we decided that of all of the different use cases of interpretability, scientific discovery was the most difficult. And we should probably be.
Chris Messina
I appreciate the ambition.
Jessica Rumbelow
It's the most important thing. Right. Scientific progress is the bottleneck on humanity flourishing. Like, it's the biggest lever. So we want.
Chris Messina
Well, sorry. Just like it does. It does occur to me also, like, in terms of, let's say, the last 2,000 years of culture, science has a very specific place in it. But truth, reality, authenticity are aspects that are becoming even more important when you can synthesize and generate nearly anything. And so to your point, the faster we can get to an ability of almost like turning raw data into intuition about reality, then that will actually settle a lot of the. The polarizing topics of our time, because we can simply, as you say, look to the data and have an interpretability layer on top that essentially says, look, here are the patterns that are there that you as humans with your grandiose ideas about the world, but your very limited perspective on reality should know about what's actually happening here behind the scenes.
Jessica Rumbelow
Making scientific methodology better is a multiplier on basically everything that I care about. So totally seemed like a good path.
Chris Messina
Seems like a good. Yeah, good path.
Jessica Rumbelow
So, yeah, so we. Sorry. In startup Journeyland, on our, like, interpretability research that we'd done, we raised a seed round. After that seed round, we kind of decided to really focus more on the discovery application of the interpretability research that we'd done. Yeah. And so, like, we started prototyping this system, we knew it needed to be automated because lots and lots of scientists can't train machine learning models. And this is okay.
Chris Messina
Reasonably. Yeah, they can focus on what they're good at. And you can do the other part.
Jessica Rumbelow
Absolutely. But we were like, it's not going to fly if we make them train models for everything. So we'll do that and that's fine. So we built this prototype system and it worked which was incredible. And then like very, very. But it was quite like manual and messy and stuff. You know, it was a proof of concept.
Chris Messina
How long ago was this?
Jessica Rumbelow
About a year ago. So we've gone from super scrappy prototype to like full automation end to end system. Yeah. And now we're, now we're doing fully automated discovery.
Chris Messina
What's your. Just quickly, what's your tech stack?
Jessica Rumbelow
Oh God. You need to talk to the cto. My background is as a research scientist. AI research. So I use Python and Pytorch and I used to use matplotlib but now I get chatgpt to make my plots.
Chris Messina
But nice, nice. I'm wondering are you guys on Raw Nvidia Compute or are you using something else? What is the cloud solution there?
Jessica Rumbelow
We're on gcp. We have a distributed fancy automl set up on there that spins up the cluster and stuff. And then the front end is not interesting for me to talk about.
Chris Messina
Okay.
Jessica Rumbelow
That means I don't actually know because I'm not a software developer.
Chris Messina
It's not interesting for you as in it's not interesting to you. You're like okay, it's incidental.
Jessica Rumbelow
There be tests and engineers and things.
Chris Messina
I see. I understand it's a black box as far as you're concerned.
Jessica Rumbelow
Well above my pay grade.
Chris Messina
Understood. Okay. So where are you guys at in your startup journey now then?
Jessica Rumbelow
Yeah, so we are looking for our first industry pilots. We're talking to some really good guys. Very, very excited about that and we are just starting to raise our series A.
Chris Messina
Exciting and how far into that? So you're basically like what kind of investors are you seeking?
Jessica Rumbelow
We want stupid. Well.
Jugal Patel
We'Re looking for investors that are familiar with deep tech.
Jessica Rumbelow
Right.
Jugal Patel
So investors that were very early in DeepMind or very earlier in Anthropic or very long term big vision folks that get it and want to get in early and want to get it on floor and like are familiar with developing a really groundbreaking world changing technologies like that. Yeah, yeah.
Jessica Rumbelow
We've. We're also fortunate in that we've been I guess like building relationships with, with, with some funds that we really like for, for a little while now. So, so yeah, it's feeling pretty good. We're, we're having conversations. Oh and I'm going to be in San Francisco from. We both, we both will be in San Francisco 26 for like three, three, three weeks coming up.
Chris Messina
Where are you guys based?
Jessica Rumbelow
Typically London and San Francisco.
Chris Messina
I could hear that somehow in the accent.
Brian McCullough
So as you can hear at this point, I tried to come back into the conversation and again, it didn't work out. All I tried to say was, if you're interested in Leap Labs at all, look them up@leap-labs.com, send them an email if you want to work with them. They're taking all comers at the moment. I will have an email in the show notes as well as the white paper and all that other good stuff in the show notes as well.
Chris Messina
No, I think this is great. This is super helpful. I'm really excited about where you guys are at. I was thinking also, and I feel like this is one of the things that got me excited about the investment when we first talked. You know, I'm kind of a fan of Alan Watson. He describes this concept of the grid of words. And the grid of words essentially suggests that human language could be understood as if it were on a graph paper. The words that we use to describe reality are just the dots. And in fact reality is made up of all the parts in between all the negative spaces. And so in a large way what you guys are talking about is being able to map and understand what those negative spaces are. And so the more that we're able to actually kind of blur out from the dots to see the entire picture, the better off we'll be in terms of understanding reality. And so that's ultimately what I think you guys are building and applying machine learning and AI to do. So that's why I'm personally excited about it and I'm super excited that you guys are here at this part of the journey and your thought leadership will continue to grow, especially once you come to San Francisco and start to talk to people here. Like it's just, it'll be infection or infectious rather. So anything else you guys want to leave with?
Jessica Rumbelow
Just. Yeah, like if, if you're a scientist and you've got some interesting data, even if it's just a few hundred samples.
Chris Messina
Oh, that's exciting. So with this self serve thing, are people going to be able to go to someplace and like upload a file and just like the, the thing's gonna happen. Are you like charging for this? Like how does that work right now?
Jessica Rumbelow
So right now this is all we're trying to figure it out at the moment. Right.
Chris Messina
Okay, great.
Jessica Rumbelow
But my, my hope is that we'll be able to make the self serve platform completely Free for academics and then, like, do enterprise sales.
Chris Messina
Sure, I got it, actually. Yeah. So on that point, I think this will be important for that, for both of those audiences, which is around, like, privacy, ownership and ip. Like, these things are whatever they are, but obviously they're probably quite important to those different groups. And so how does that factor into what you guys are doing?
Jessica Rumbelow
We can do on prem. Basically, we have a secure cluster and stuff. If you've got your own compute and you want to run disco on that, that's fine, too. We can support that. Yeah, basic. We totally get it. We're all scientists, too. We want to protect your data. Yeah. And we don't keep the data. We don't aggregate it, we don't sell it. We don't do anything nefarious at all. We are here for the science and slightly a bit more.
Chris Messina
Especially if you've got data, you've got this discovery engine. It's going to look at it in this. It's like sort of. Have you ever seen those infrared cameras looking at flowers? It's like you can, like, see entirely. Have you ever seen this before? You got to, like, check this out. It's amazing. So it's the way that birds and I don't know if it's birds, maybe it's bees. Anyways, that's a whole different conversation. But there's, like, different ways of seeing. An infrared that allows you to see the world in an entirely different way. And so flowers actually become almost like these, like, landing pads. Like, they're like these. These targets that are so clear and easy to see when you're seeing infrared, but, like, when you see them as, you know, human eyeballs, you don't see it that way. So I feel like that's kind of what you guys are offering in terms of this different way to see through the data and to get these insights.
Jessica Rumbelow
You should make all the world a rose garden.
Chris Messina
Love it. Love it. All right, we'll end it there. Thanks so much, guys. This is exciting.
Jessica Rumbelow
Thanks. Thanks.
Chris Messina
See you all in the bay.
Jessica Rumbelow
See you soon.
Chris Messina
Cool.
Techmeme Ride Home: Leap Labs Episode Summary
Episode Information
In this special portfolio profile episode, Brian McCullough interviews Jessica Rumbelow and Jugal Patel, the founders of Leap Labs. Joined by returning guest Chris Messina, the discussion centers around Leap Labs' mission to revolutionize scientific discovery through advanced machine learning technologies.
Key Points:
Notable Quote:
Jessica Rumbelow [01:01]: "We are automating scientific discovery from data. Companies spend huge amounts on data gathering and R&D, but the outcomes are uncertain and path-dependent. We're changing that."
The conversation delves into the current challenges in scientific research, particularly the replication crisis and the inadequacies of Large Language Models (LLMs) in contributing meaningfully to scientific advancements.
Key Points:
Notable Quotes:
Jessica Rumbelow [02:04]: "Language models are a noisy abstraction over real-world data. The scientific literature is terrible, and LLMs can't differentiate between replicable and non-replicable papers."
Chris Messina [03:00]: "The replication crisis means much of our scientific assumptions are built on faulty pretenses."
Leap Labs introduces their groundbreaking Discovery Engine, designed to address the limitations of existing ML models by directly analyzing raw data instead of relying on language abstractions.
Key Points:
Notable Quotes:
Jessica Rumbelow [12:18]: "Machine learning models are excellent at finding complex patterns in data, but understanding these patterns has been a challenge. Our interpretability work extracts meaningful insights from these models."
Jessica Rumbelow [22:36]: "Everything we find is empirically validated. The model provides patterns with evidence from the data itself."
Jessica and Jugal discuss their first successful collaboration with a plant biologist aiming to optimize plant root growth for climate-resistant crops. This case study exemplifies Leap Labs' capability to derive actionable scientific insights from modest data sets.
Key Points:
Notable Quotes:
Jugal Patel [15:08]: "After months of manual analysis with little success, our system identified a novel genotype-nutrient combination that maximizes root growth."
Jessica Rumbelow [20:40]: "Even small data sets reveal a lot of low-hanging fruit that humans miss due to the complexity of pattern recognition."
Beyond plant biology, Leap Labs is actively engaging with multiple scientific fields, demonstrating the versatility and robustness of their Discovery Engine.
Key Points:
Notable Quote:
Jessica Rumbelow [31:08]: "We've published four preprints and are collaborating with Meta on materials science. Our system works across various domains, providing valuable insights."
The founders outline Leap Labs' progression from a research-focused startup to a company seeking industry pilots and preparing for Series A funding.
Key Points:
Notable Quotes:
Jessica Rumbelow [33:42]: "Two years ago, we founded Leap to enhance interpretability in ML. Now, we've evolved to focus on automated scientific discovery."
Jugal Patel [39:20]: "We're seeking investors who understand deep tech and have a long-term vision similar to early backers of DeepMind or Anthropic."
While Jessica provides a high-level overview of their tech stack, the emphasis remains on their proprietary interpretability methods and the forthcoming self-service platform.
Key Points:
Notable Quotes:
Jessica Rumbelow [38:00]: "We're on GCP with a distributed AutoML setup. Front-end details are handled by our engineering team."
Jessica Rumbelow [43:29]: "We can support on-premises deployments, ensuring data privacy and security. We don't aggregate or sell data."
The episode wraps up with an invitation for scientists and interested parties to engage with Leap Labs, emphasizing their commitment to improving scientific methodologies.
Key Points:
Notable Quotes:
Jessica Rumbelow [42:04]: "If you're a scientist with interesting data, even a few hundred samples, reach out to us. We're here to help you uncover valuable insights."
Chris Messina [44:11]: "Leap Labs offers a different way to see through data and gain insights, much like infrared cameras reveal hidden aspects of the world."
This episode of Techmeme Ride Home provides an in-depth look into Leap Labs' innovative approach to scientific discovery. By addressing critical issues in the current scientific landscape and leveraging advanced machine learning techniques, Leap Labs positions itself as a pivotal player in accelerating research across multiple domains.
For more information, visit leap-labs.com or reach out via email at hello@leap-labs.com.