![Databricks: From Data to Decisions - [Business Breakdowns, EP.238] — Business Breakdowns cover](/_next/image?url=https%3A%2F%2Fmegaphone.imgix.net%2Fpodcasts%2Fc8cb334c-ea82-11f0-bc79-fbe847221fae%2Fimage%2Fba1dc3755031eca19702da38ec4311a0.jpg%3Fixlib%3Drails-4.3.1%26max-w%3D3000%26max-h%3D3000%26fit%3Dcrop%26auto%3Dformat%2Ccompress&w=1920&q=75)
Loading summary
A
This episode is brought to you by Portrait. It's the AI research system that I used to prepare for today's episode and for all business breakdowns episodes. Portrait was built by former buy side investors and they understand great investing isn't just about having more information from low quality sources. It's about having the right information organized the right way. And if you listen to the show, you appreciate Diligence consists of many things Diving into the history of a business, framing the nuanced competitive dynamics, tracking key signposts around your thesis. And historically that would take up material time that you do not have. But Portrait is basically like adding an army of analysts to your team. It's powered by an AI system specifically designed for investment research workflows so you get nuanced idea generation. Portrait assesses the same types of qualitative attributes that we discuss on this show and that can help identify businesses which fit your frameworks. Portrait also customizes research report generation and I used Portrait to generate a primer and lay out bold bear cases ahead of today's episode to help frame the conversation. And third, there's intelligent thesis monitoring. And that's where Portrait assesses thousands of data points across value chains each day, extracting the insights driving the business again. All this work would typically take hours and hours and hours. It's at your fingertips now. Visit portraitresearch.com to start your free trial today.
B
This is Business Breakdowns. Business Breakdowns is a series of conversations within business investors and operators diving deep into a single business. For each business, we explore its history, its business model, its competitive advantages, and what makes it tick. We believe every business has lessons and secrets that investors and operators can learn from and we are here to bring them to you. To find more episodes of breakdowns, check out joincolasis.com all opinions expressed by hosts and podcast guests are solely their own opinions. Hosts, podcast guests, their employers or affiliates may maintain positions in the securities discussed in this podcast. This podcast is for informational purposes only and should not be relied upon as a basis for investment decisions.
A
This is Matt Russell and today we are breaking down Databricks. My guest is Alan Tu, Portfolio Manager and Analyst at WCM Investment Management. And you may actually remember we broke down Databricks with a different guest about 3 years ago. But given how much has changed in this business and the subsequent capital raises, I was personally interested in revisiting this story and I had a conversation with Alan about nine months ago. WCM had invested in Databricks in December of 2024 and I was Curious. Just to get a better understanding of what, what they saw in the business. And we got into that in that private conversation and then really focused on it in this conversation. So we start with what exactly databricks does for its customers. And I think this is the large private company that might be least understood by the general public, and perhaps that dates back to the unique founding team and origin story which differentiate databricks and probably play a role in terms of how it's evolved from a successful initial product into the commercial platform that it is today. Now, this conversation was recorded on December 10th of 2025. So all numbers are reflective of what was publicly available on that date. And please enjoy my conversation with Alan Tu on databricks. All right, Alan, I am pumped to have you here to talk databricks. It's rare that we go into the private sphere, but there are certain companies that are 100% worth analyzing in this space, databricks being one of them. And what I would say differentiates databricks versus a stripe or a SpaceX or an OpenAI is that most people understand what those businesses do. I think databricks is a little bit more of a mystery to most people who are not close to the business, have not invested in the business. So if you could just start off with the simplest explanation you could give in terms of what databricks actually does.
B
Totally. I think part of the challenge with databricks is it's actually they address so many different use cases. And you may hear folks talk about, oh, we use databricks for recommending movies or pricing strategy or fraud detection. And it's like, well, these are not necessarily super related use cases, but it sounds like databricks is very critical for all of these use cases. And then you go a layer deeper and you ask, well, what exactly do they do? And you'll hear folks say, well, they process the data. Which then for me lends to the question of what does processing the data mean for someone like myself, who I do not have a technical background even that it can be hard to grok. The example that resonates for me is that I think we've all had the experience of getting a data file, a spreadsheet in Excel, and you're interested in running an analysis based on that data. And it might be a very simple analysis. It could be something as simple as, I want to understand the average price of a bunch of different items that were sold. But I think we've all had the experience of, well, you get the data in the spreadsheet but it's not perfectly set up. It's not every column is exactly where the price should be. And perhaps in certain cells, the price is in a certain currency. In other cells, maybe it's even the price written out with text. So you can't just select all that data and just say, what's the average? And, and so you actually end up spending the majority of your time, sometimes 80 to 90% of your time, just going through the process of unifying all the data into the same format so that you can run that very, very simple calculation of what is the average. So to me, that pain point of actually getting data into a format that allows you to ask even a simple question is this idea of data processing. And, and now in the case of databricks, just think of that at a completely different scale. You're talking about tons of different types of data sources. There's this concept of unstructured versus structured data. Anything that's rows and columns that fits in a spreadsheet is more structured data. But the reality is most of the data out there is unstructured. It could be log files that are big streams of text. It could be image or video for folks that are analyzing websites. It could be clickstream data. You take that problem of a lot of different types of data formats and how do you get them into a state where you can actually run analysis against it? That's how I would think about the core of what databricks does. Now, they've since expanded and they do all kinds of different things, but that data processing concept is what underpins the primary pain point. And then you can tie that back to all these different use cases. If you think about an E commerce company that is trying to think about how much inventory should they stock of a particular SKU or a T shirt, whatever it might be, you could imagine that there's a lot of different inputs that might help you make that decision. It could range from how is your digital advertising performing with that particular T shirt. It could be how has competitive type of T shirts been selling? It could be credit card data. It could be all kinds of different data. And if you could get access to that data, you could potentially put that into a process of creating a model to answer that question of how many T shirts should we keep in stock?
A
Makes a lot of sense. I think the painting of the picture of the Excel model certainly will resonate with pretty much anyone. And you can talk to people about how simply taking whether it's unstructured or just not in the proper Format of data ends up being 95% of the work for what you're doing. And oftentimes I think that can create this environment where you have the idea of doing more analysis or running a more complicated model, but just that workload up front of doing that stops you and limits ultimately what you're doing. So I think that actually does bring it to life in a really thoughtful way. I want to go back to the beginning here because a lot of what you mentioned ties back into some of the unique origins of databricks. But I wanted to just start with the founding team, the academic, and there's all these cliches about academics and them not being commercial, but here you have this really impressive story of evolution. So can you bring us back to the beginning stage of databricks, who it was, what it looked like in those beginnings, and tell a bit of the story, which I think is really, really interesting for this business.
B
I think it's a very unique part of the databricks story that to this day, the culture and the DNA of the organization, you can trace a lot of the decisions back to this founding story. And so it's seven founders that came out of Berkeley. They were all working in what's called the amp lab at Berkeley around the 2009 time period. And. And if you roll back the clock to that time period, that was actually the beginning stages of the cloud. And what Ali, the CEO and one of the co founders, would say is that the seven of them were working in this building together doing research. And actually in the floor below was another team that was putting out some of the very early research around the data center being the next computer, basically the early concept around cloud computing. And, and so you had in one part of the building, a lot of innovation around just cloud computing at the hardware layer. And then you had Ali and his colleagues that were thinking about what are some of the software opportunities in cloud. They were so close to the research around the early ages. And unlike with AI today, where there's been a very clear recognition that AI is going to be a big deal, back then, the idea of cloud was still somewhat controversial.
A
Hard to imagine, but yes, totally.
B
This group of folks that were rooted in research gained conviction around the idea of cloud. So then they thought about, well, what are the problems that we should work on within cloud? They actually thought of a few different ideas, but where they ended up was around this idea that data is going to be a really big problem, data at scale. And that if you think about all the different use cases going Back to the beginning of the conversation, there's infinite number of applications around data. The other interesting thing that Ali will say is that during that time period, it was also when Twitter was becoming big and Airbnb was becoming big, Facebook was still becoming big. It was actually a very positive time in technology. And so there was a lot of optimism around entrepreneurism and starting startups. And so that also fed into the energy of the group. The thought was, well, one of the co founders was actually the creator of Apache Spark and they believed that data was going to be an important thing. And so how do we create a business around that? And the other piece that they thought of was using open source as another key bet. And that really was very aligned with this idea of coming from academia and research. When you roll back the clock, there was really three major bets that they had a view on. It was cloud was going to be big, data was going to be big, and open source was going to be a good way to build a business. In hindsight, it turned out all three of those bets were very good. Betsy. But the idea that at the time was less clear, but because of that environment where there was a lot of optimism around technology, I think all of that coalesced together to be the beginnings of databricks.
A
On the point of connecting cloud to data, is it fair to assume that by transitioning to cloud there would actually be more capacity for data to be stored or to be used? Like is that connected in the sense that prior to cloud the data capabilities might have been constrained?
B
Yeah. One of the paradigm shifts of cloud was generally this concept of scale out architecture, which basically allowed the ability to use more commodity hardware to be able to address larger and larger amounts of both compute and storage and data processing in this case. So that was an important underlying trend that enabled this proliferation and this idea of just data explosion that I think you're touching on. When you look at what Apache Spark was, it was leveraging this concept of distributed compute and applying that to data processing. That was very important reason I asked.
A
It's always interesting to think about second order impacts or a certain market enabling another market on the back of it. Particularly with AI today, everyone looking for second order impacts, totally. So it's interesting to hear how they're connected. And to your point, those three ideas, those three bets they made certainly together compounded in many ways on that point of the open source. There are all different ways to approach open source and we have walled gardens versus open source. And there's the very famous Apple versus window example. Or Apple versus Microsoft. Can you talk about the commercialization and that, how that played into things? Because I know that was a major stepping stone for the business in terms of evolving from this tool that gained usership. Oftentimes offering things for free is a great way to get people in. It has to be great. So it certainly was. But that feels like such a meaningful point in time for this business. And I'm curious how you would describe what happened and how they really turned this into a great product and evolved it into a great business.
B
There are a lot of examples of successful open source projects and that's one of the magical things of open source is that it does enable a level of adoption of in the consumer world we talk about capturing magic in a bottle in the enterprise world and with technology, open source is a really good way of getting just massive amounts of awareness, mindshare and adoption. There are a lot of examples of successful open source technologies, but there are actually not very many examples of successful businesses that have been built on top of open source. And a lot has been made about Red Hat was one of the first companies that built a business on top of Linux. But the reality is that it's actually one of the hardest things to do. You have to hit two home runs. This is actually one of the ways that Ali explains it is you need to hit the first home run, which is to develop an open source technology that gets mainstream adoption. But what a lot of folks don't think about is the second home run, which is how do you actually build a business on top of it. And part of the problem is that the open source technology ends up becoming one of the business's main competitors. Because anyone can get the product for free. Anyone, including your customers, but also including competitors, they can also leverage that technology. And actually those competitors that have more distribution, more customer relationships can actually do a better job of monetizing that open source technology. This is where again that coming from academia I think really helped inform the strategy for databricks, which is the benefit of not being immersed in the commercial market is that you don't have the pre existing notions of what you should or should not do when it comes to building a business on top of open source. Because again, the biggest example before that was Red Hat, which basically provided services and support for Linux. But that was the main thing. But because the databricks team wasn't really aware of the precedents, they thought about things in a very first principles way. And when you think about this challenge of how do you monetize Something where there's a free alternative. It's actually a simple answer. And the answer is you need to create a better product that is worth paying for.
A
Simple answer, maybe not simple execution.
B
Totally. And part of the reason why it's not simple execution is that if you've built your brand and gotten a lot of positive feedback for successfully creating the open source technology, it can be weird to then say, I'm actually going to create a competing product and I'm not going to put all of the bells and whistles into that open source technology. And for a lot of folks, that creates a lot of tension. You've got a lot of folks in the community that are like, how could you do that? There's almost a feeling of betrayal. When you're successful with open source, it can almost be a curse because you become very popular, you're viewed by technologists as someone who's brought this great thing into the world, and then all of a sudden you need to be willing to be a villain. I think a lot of people have trouble making that jump, but I think again, in the case of databricks, they just very much realized that there's no way we can compete unless we have differentiation. So creating that differentiation was just a very important concept. And so one of the things they did with databricks, the company, and we can also talk about how they chose the name Databricks, was that they created a new implementation of Spark that was completely proprietary, that had a lot more performance and things that enterprises want, reliability, scalability, et cetera. And they were just very unabashed about the fact that if you want to use this new implementation, you would have to pay for it.
A
Is that comparable to the free tier of an LLM versus the pro tier of an LLM today, just in terms of you can feel the difference and not everybody will have used the different tiers. But I'm trying to grasp the differentiation to your point needs to be there. It's very easy to describe it when you're signing up for a subscription to a website and it's like you get three free articles, but then after that you have access to our proprietary database for the subscription cost. What comparison would you make if there is an analogy for the difference?
B
I actually think that analogy is pretty accurate. But there are nuances to that. As a consumer, we are used to this construct of there's a freemium version and then we pay for the premium version. And I think that decision about what goes into the premium version is where the nuances, again, going back to enterprise, the typical traditional wisdom is oh well, the average developer that's working out of their garage, we like that they can use this technology for free. We really want to monetize the enterprises and so why don't we monetize the enterprise features? Things like single sign on and again governance and security, that actually makes sense. But the reality is that while it is true that enterprises will pay for some of those additional features, how much will they pay for those additional features? And at the end of the day, those additional features are not the core product. So going back to your example of the LLM, there are different ways of deciding how to paywall a product. I think the analogy here would be to say it's for certain freemium products. You can use it 10 times, but then the 11th time you have to pay or in other situations there's extra features that you have to pay for and then you have to go to the premium tier. The closest analogy in the case of databricks would be actually the better model that is smarter and will give you better answers you do have to pay for. And so I actually think that comparison is not a bad one when you think about it through the lens of that core performance of the model, not just these ancillary things that you're paying.
A
For, that differentiation of it's not added features. The core thing is actually higher quality is a meaningful differentiator. And this is something to your point that, that every business has to think through and I talk to many people who want to roll out a premium tier of something and the extra features are just not that valuable and nobody's going to pay for it. I did want to touch on databricks the meaning what's the origin story to the name?
B
To draw a contrast, there are a lot of examples of companies that have been formed on the back of a successful open source technology that have basically named their companies the technology. So whether it's Docker was a startup that was built to commercialize Docker, the technology, you can think of MongoDB as another example. In the case of databricks, the analogous thing to do would have been to name the company Spark. The reality is that there are a lot of benefits to that because again, Spark was a very well appreciated name. The brand and awareness that databricks the company would have gotten from coming out with the name Spark would have been very beneficial actually. But the reason why they went with databricks is because from day one they always felt like it was going to be more than just Spark Databricks the way they thought about it was that there were going to be many, many bricks that you could all be used to apply around this broader problem around data. A simple idea, but underneath the decision to name it Databricks as opposed to Spark is this reflection of this long termism of thinking about what is actually going to set us up to become much more over time.
A
I often think about that when companies evolve, there's an enterprise value of a brand that has multiple different products and you could stick with that core product or you can evolve above it. And it's a little artsy in terms of the way I describe it, but I do think there's something representative of that and I think it alludes to one of the points that you made early on, which I think is a good thing to bring in now, which is they've evolved in terms of what they offer and they have many different things under the hood now. And I think this coincides with you and WCM's involvement in the business. So can you talk a little bit about the evolution of what databricks offers to customers and how it's evolved past that original state?
B
Databricks today has truly reached that point of platform. There's sometimes an easy framework of feature product platform within enterprise software. And I think databricks has made strides over the years along that journey. Immediately following the success of commercializing Spark, what the databricks team did a really good job of after that was recognizing, hey, who are we serving in the enterprise? It's the data engineer and data scientists. These are folks that are, after they process the data, they're actually building machine learning models to run some of these predictions and forecasts and recommendation engines, all those use cases we talked about and there's actually a very complex tool chain to enable that process that databricks very naturally extended to. And one of the products that they came out with which ended up being open source again was called mlflow. And so this was another product that just extended the value proposition along the same use case for these data engineers and data scientists. They then came out with another product called Delta, which was a first step towards data warehouses, which we'll get into the convergence between Databricks and Snowflake. But Delta was another step in the direction of saying, okay, a lot of these use cases for machine learning are for advanced scale out use cases, but they don't necessarily need to have the same level of performance that allows for what's called transactional use cases. That was actually an important next product as well.
A
The Transactional use cases, does that have to do with speed?
B
Speed is part of it. There's a concept in the database world called acid. It's an acronym that's acid. The actual letter stands for atomicity, consistency, isolation and durability. Which is a long way of saying that there are certain workloads that require a certain level of guarantees around the quality and integrity of the data. So when for a lot of traditional analytical workloads, for example, just taking it back to a data scientist that's running an analysis about how many shirts we should stock in inventory, it actually isn't that important that all the data underneath it is perfectly in sync, such that if one of the data sources is tweaked by a little bit, that totally throw off the analysis. But there are other workloads for which you need to have that guarantee that there are no inconsistencies in the data, that there isn't one person is changing a piece of data here which has follow on effects there. So that is another segment of the market. When you think about the types of workloads that databases can address, that was an important extension and evolution of databricks. And so they then came out with a product called Delta, which was a first step towards addressing these ACID requirements. And one other really interesting thing about databricks is that I think one of their core competencies is marketing. There's a funny story where when they came out with Delta, in order to explain to folks what Delta was, they gave out free T shirts that said Delta is spark on acid.
A
That's amazing.
B
So that was another again extension of databricks introducing another product, logical extension of their existing products, but also commercializing it well and understanding how to market the product to the broader community. And then from there to your question, there in my view was a very pivotal moment a couple years ago when we got involved at WCM and investing in databricks, which was when following Delta, databricks started to show the ability to address traditional data warehouse workloads, which is the end part of the ACID journey that we discussed. And that was very important because up until that point you could largely say that the products that databricks had built were all addressing again that core Persona of the data engineer and the data scientist. But the types of folks that actually engage with the data warehouse are more traditional data analysts. So these are folks that typically use SQL to run queries against more structured data versus data scientists running Python and building machine learning models against unstructured data. But because databricks have built the foundation and logically laddered their way to the data warehouse. They were then able to come out with a SQL product that was more directly competitive with one of their peers, public company Snowflake. That was to me an incredible proof point because they already deserved a lot of credit for expanding from a single product to multi product, but then to expand further to multi Persona, for lack of a better term, was just a tremendous TAM expansion in its own right, but also a demonstration that there is so much that goes on underneath that to enable the success of a product like that. And so that was around a couple years ago they introduced their data warehouse product. Earlier this year they announced it's on pace to be a billion dollars in revenue, which is just an incredible amount of scale for a new product. And that to me really started to demonstrate this idea of databricks is successfully becoming a true platform.
A
Yeah, it's very interesting to hear and we had discussions before this conversation about your involvement in the business. And I'm very interested in late stage investors and private businesses and what insights they glean. And when you painted that picture to me of that evolution into a true platform, it checked out to me just in terms of, okay, this was a unique moment. They've evolved on that point about competition at the highest level. If I'm an organization using databricks, am I potentially also using Snowflake? Am I using multiples? Is it not necessarily winner takes all market? But how does that work just in terms of how much dominance there is with customers when they choose one versus the other?
B
The reality of the market is that there tends to be multiple vendors that enterprises will use. And actually I think databricks has contributed towards a trend of enabling more types of tools for more types of workloads. Now databricks, we believe, will then be able to come out with more products to address those different workloads. But to answer your question, it is very much the case that you see customers using both databricks and Snowflake. For example, if you roll back the clock to that core use case that databricks addressed early on around data processing, it's actually a very classic situation where an enterprise might use databricks first to process the data and then store that data in a Snowflake data warehouse. Snowflake has started to try to move upstream to do more data processing and Databricks has moved downstream to do more of the data warehouse. But that can give you a sense of the way that these tools can live together within the same company.
A
I will acknowledge this is very easy for me to say from the cheap seeps, but it would seem as though moving from the processing the unstructured world into the warehousing more structured world might be a smoother evolution than vice versa. Because in my mind, that unstructured world is a very complicated solution that was being offered and not that data warehousing is not. But do you think that's a fair representation and definitely feel free to push back on that assumption?
B
Well, I would say that empirically, if you look at the numbers that has played out to be true with the data warehouse product scaling to a billion dollars, that's dwarfed the analogous revenues that Snowflake has had around moving to data engineering. But what I would say is that not being able to run an AB test in different versions of the world, I would say that there was a lot of databricks execution that led to them having more success moving towards structured. We touched on it briefly earlier, but I thought this was another example of the company being not just great technologists, but actually really good savvy marketers and folks with a commercial gut instinct when they recognized that they wanted to move into structured. And there was this concept that unstructured that what people would call kind of that world would be data lakes and then structured would be data warehouses. And so databricks actually came up with this terminology called the lake house, combining the data lake with the data warehouse. And at the time, you can go back to some of the news coverage, there was quite a bit of ridicule about this idea of lakehouse. It was almost too clever. This idea, oh, you're going to combine the two and you've come up with this name. You fast forward to today and the lakehouse is a very real defined category that industry observers have all coalesced around. So the credit that databricks deserves, not only for executing on the product and the technology to achieve again that we've talked about with data warehouses, but to then do all the work to educate the market around why the lakehouse architecture is the best of all worlds and why that is the future is an incredible piece of the story that I think databricks probably doesn't get enough credit for.
A
Yeah, I think those things are very hard to measure, but you certainly can appreciate them sometimes more after the fact. And I certainly just give companies bonus points when they're having fun While doing this execution. There's something that just seems to matter to me and shows a willingness to enjoy the aspect of business and competition and whatnot.
B
There's A certain amount of fun. And I don't know if they would use these words, but I feel like irreverence in terms of. I think this ties back to the founding heritage and DNA where it's, look, let's have an opinion about where the world is going. I think as an investor, if you go back to the early bets, they would tell you, these are the three bets that we're making. We're making a bet that cloud will be big, data will be big, open source will be a good way to build a business, or at least build adoption. And here it was. We think that the lakehouse will be big, and we think that this is where the world should go. We think that this will help customers, and we are going to bet behind that. It just makes it very clear that if you're betting on databricks, you're betting on this future state of the world. And I find that sometimes with companies that try to have it all, they say, oh, we'll be good here, we'll be good there. But the reality is that that detracts from your ability to execute in a way that databricks did with something like the Lake House.
A
I think things like first principles can get thrown around a lot. But in preparation for this, watching, reading, a lot of things that Ali has done in terms of interviews, it checks out in terms of the approach. And there's a certain clarity to the academic world and being born out of that in terms of understanding exactly what you're doing, that focus, having that clarity in terms of why you're going after things, which I think can sometimes get drowned out by some of the other baggage that comes with academia. But putting that aside, well, I would.
B
Also say that I guess a different way of describing this is that databricks is helping to lead the industry to where they think the industry should go. They identify where the pain points are, and they come up with a solution that they think makes sense, as opposed to looking at an existing market and just saying, okay, well, we can do a me too product just for the sake of expanding our tam. What I've found is one of the underlying things that Daybreaks does a really good job of, is recognizing true value creation as opposed to just monetizing and revenue growth, if that makes sense.
A
Mm. It's based on customer challenges, but also having a predictive view in terms of what's going to happen in the future. There's a little Steve Jobs thing about designing for what the customer doesn't necessarily know what they need yet in There not to draw too hard on analogies on the point on market expansion, tams, all of that. When it comes to both Databricks and Snowflake, both are relatively young businesses. Were they replacing industry incumbents? Was it all new market creation? How would you describe the TAM that exists relative to what it was, whether it's prior to cloud or even within the cloud and using some of the.
B
Incumbents, going back to the starting points of Databricks and Snowflake? Snowflake was really the next gen cloud version of the data warehouse, which was a market that did exist. In the case of databricks it would have been that data lake market, but that had never been as well established for basically lack of good enough technology. There had been a lot of attempts at creating data lake companies. There was a technology that predated Spark called Hadoop and there were companies that were built on top of Hadoop like Cloudera that went public at one point. But the problem there was just that simplistically the technology wasn't good enough. So to your question of tam, I think the market technically existed, but there was this period, people Forget in the early 2010s when big data was a very sexy topic. And there was a period where companies had the recognition or at least the inkling that data was valuable. There was a whole period of a number of years where companies were storing a lot of data volumes and volumes of data with this underlying view that we should be doing that because big data, why not? But the reality was that there was, in Gartner's term, a very hard trough of disillusionment where it was, okay, now we've stored all this data, what are we going to do with it? And it turns out it's very difficult to get anything out of it. And so that's what, again, going back to databricks, core value proposition, it was really solving that problem. And that was massively TAM expansionary.
A
Yeah, that makes a lot of sense. And data being the new oil, I think there's certainly some truth to it. But there's also this massive challenge of understanding that this probably has value, but how do we unlock that value and do something with it?
B
Exactly.
A
And that being a problem they solved is quite interesting. I want to get a little granular just in terms of a use case, so I truly understand what is going on to the extent that you can answer this. One of the examples that I saw presented is I make a credit card transaction. All of this data is flowing through the pipes of my credit card company. Maybe my bank, databricks is involved in that. And if they see I make a transaction with a non traditional vendor, it's for a very large amount and it's in a country that I never make transactions with before I can get a fraud alert. And from my understanding is that flows through the databricks pipes to some extent just in terms of managing all the variables at play that would cause a fraud detection. In that example, how does it work with databricks actually having the ability to make the decision on the behalf of the credit card company to send me that fraud alert versus them presenting this alert back to the company and it coming back to me? It paints an interesting picture of how ingrained they are in terms of with their customers. And I know it's going to differ by use case, but can you talk a little bit about that?
B
Sure. For any given credit card company the implementation might be different than another. But to bring your use case to life, it is very accurate to think about that core value proposition that databricks provides, which is that you can imagine the amount of data that goes into making a decision of whether or not a transaction is worthy of a fraud alert. There's a tremendous amount of input that can go into that and there's probably never enough. You could always add more data to that analysis. And that hits on the core thing that databricks provides, which is the pipelines to bring in all that data, process that data, because all the different types of data are going to be in different formats and feed that into a machine learning model that gets fine tuned by the data scientists and tweaked and will constantly be updated based off of the facts on the ground and the empirical data of also evaluating these models, are they actually accurate after the fact and then tweaking those models Again, all of that is core databricks value proposition. Now once databricks helps a company come to that decision, who is actually sending the fraud alert? Again, there can be different architectures here, but typically companies will build another application that actually takes the action and the model output from databricks will inform that action. Is maybe a classic way to think about the architecture that makes sense.
A
The application that sits atop it's ultimately informed by the databricks models that are analyzing all those things. And I can imagine it being rules based. If a transaction meets these criteria and it gets sent up, then that triggers the fraud alert warning. Where I was going with that is I'm trying to grasp again with you'll have companies that are using multiple different vendors Obviously there's so many different use cases. The one I just brought up is a cost savings. We've already talked about revenue growth and how this can be used. But in terms of ways to measure stickiness and ramping with customers, what does that look like? It just assumes the more you would use the model, the more ingrained it would be in your business, therefore less likely to churn totally.
B
Well, they've disclosed that their net dollar expansion rates are greater than 140%. So there is embedded in that a high level of stickiness and also embedded growth. So quantitatively, that's the numbers that they've disclosed. But I think qualitatively, the right way to think about it is that many of these use cases that we've discussed are very core to the fundamental product that businesses are selling. Sometimes in the world of data analytics, you can just envision a data scientist or a data analyst in the back office running an analysis for the strategy team that may or may not feel sticky. But when you think about these use cases where this is a content streamer that is suggesting the next movie you should watch after you finish a movie that is core to the product and that can oftentimes be revenue generating and very mission critical. So there is that level of stickiness in terms of once you get embedded into use cases in production and then there is also the added layer of stickiness around just the fact that there's a concept of data gravity. And once you put in the work to store and catalog data within a data platform, that becomes very sticky. So. So there's a lot of different dimensions with which databricks becomes very embedded within a company. I think the last thing I would mention is if you've done the work to process data once, you could potentially use that for multiple use cases. You can then again imagine how that becomes very hard for even if a certain product gets sunset. But there's another product that's still leveraging the same data that's very sticky as well.
A
You've made the illusion, and it's the elephant in every room now of AI. I guess I'll just start with the very highest of level of what are the impacts of AI on a business like databricks to the extent that they're beneficiaries, to the extent that there's risks associated with it. I'll let you wax poetic there.
B
Well, maybe just to start with some quantitative framing. So databricks, they're now over 4 billion of arrows. They have disclosed that about a quarter of it, 1 billion is AI related revenue. And so AI has already become a very large part of the business. But I think beneath that, there are different ways to perhaps slice and dice the impact of AI on databricks. And I think for me, one of the things that I've really liked about databricks as an investment is that there are multiple ways to win, starting with the core data processing piece. There's almost a consensus understanding within enterprises now that you don't have an AI strategy without a data strategy. Everyone recognizes, of course, the model providers are doing what they're doing and every generation of models is getting smarter and smarter. But at the end of the day, if you don't have good, clean, well cataloged data, the models can only do so much. One of the ways that AI has really benefited databricks business is that it's just created a tremendous amount of prioritization and awareness of the importance of the core product that databricks has always provided. In my mind, that is a durable tailwind that companies will need to do, have always needed to do anyway, that is actually not dependent on whether we achieve AGI or not, or what does the next OpenAI model do. The reality is that as long as there is a general belief and understanding that AI is important, there will be a driver towards more data engineering and data processing. That's a general tailwind for databricks that I think when you think about databricks, the business as an investor, it actually paints a picture of a more durable growth trajectory that is perhaps not as spiky on the upside, but also not as volatile on the downside either. In the case that sentiment around AI changes.
A
That makes a lot of sense just in terms of a heuristic at a high level for thinking about AI within the business.
B
Yeah, and there's probably a couple other ways to think about AI's impact on databricks. Another way is actually the fact that there is this huge cohort of AI native companies, including the largest AI labs that can and do use databricks as well for themselves internally. This is something that companies in the public market also talk about is that there's how is AI actually impacting product and use cases? And there's also the fact that are you part of the stack that AI native companies are utilizing and databricks is very much so there. And then I think the final part of AI's implication on Databricks business is actually product. One of the big picture bets that databricks is making, going back to this idea that they have a DNA of having an opinion about where the world is going and where can databricks add the most value. It's really around the idea that in the future AI and LLMs have already proven, even if the models don't get any better than where they are today, the ability to automate more work. And when you think about how big of a TAM that is, it's probably just as infinite as the TAM that we talked about initially around data. So what databricks is doing is building products that in the same way that they came out with mlflow that helped that whole process of a data scientist building a machine learning model, they're creating an entire stack with products called AgentBricks and Lake Base that all together will help enterprises build their own agentic applications to automate specific use cases and actually automate labor and work, which is just tremendous amounts of roi.
A
Yeah, and I'll reframe it maybe and you can tell me how accurate this is versus inaccurate. Most of what we talk about with agentic is oftentimes just shrinking the context, giving it very deep context in specific tasks. And therefore the quality of the response is going to be much stronger. It's not going to be pulling random places on the webs. Databricks obviously has the most rich data to use in terms of informing those models and therefore can partner with businesses who might want to develop those agents.
B
Is that exactly? Yeah. So to your point, part of what the industry is realizing is that there are techniques that are important to be leveraged to build effective agentic applications. For example, Rag Richie Walk met in generation, there's a whole set of processes around enabling that with vector databases, for example, embedding. So databricks has offerings there. There's also a very important part of building agentic applications is around model evaluation. So because of the very unpredictable nature of the large language models, it's not quite as easy to always know exactly is our application or agent acting the way that we think it should. And so there's a whole set of technologies around model evaluation, around being able to actually quantify how are these models behaving, are they doing what we think that they should be doing? And databricks again is building products around that. In the same way that it's parallel going back to the machine learning era where you would go through a similar process around evaluating models. There's, you can get a sense of the fact that while there's a lot of focus on the core large language models, if you actually want to build applications in production, there's so much around and beyond just the model. That is really where I think databricks has a strong right to win.
A
Yes, the infamous question of where the value might accrue within the layers of the AI ecosystem is quite interesting. On the opposite side of the equation, there's been this interesting, solid, good relationship with the cloud infrastructure providers, aws, Azure. Does that change at all with AI? Because to your earlier point, the data cleanup and all of that becomes even more important. I have personally been able to clean up unstructured data into structured data much more cleanly with AI. Different story when it comes to me versus doing this at cloud scale. But what's your view on where that relationship, where the cloud providers haven't necessarily moved into this category, does that change at all? Is that a risk? Is that something that you think about at all?
B
Well, it's funny you mentioned that cloud providers haven't really moved into databricks market because they actually do have offerings there. And I think it's actually more a reflection of the fact that databricks has done such a good job of both executing on product but also just market position in that you sort of have that premise. Yeah, but I would say that for as long as I've followed databricks and I first Met Ali Over 10 years ago, they had just signed their first strategic partnership with Microsoft that was actually databricks branded under Azure as Azure Databricks. And that was an incredibly important partnership to jumpstart databricks modernization. I bring that up because DataBricks, from day one, Ali as a business leader has always been extremely pragmatic and strategic about how they operate vis a vis the hyperscalers. And so what I would say is that what has always been the case is that there is coopetition with the hyperscalers. The reality is that for customers that are using databricks, they will also be consuming infrastructure, compute and storage of the hyperscalers. So there is a benefit for the hyperscaler clouds when customers are using databricks on top of their infrastructure. And that coopetition dynamic I totally expect to continue in the world of AI. But I do think it's a very important question because this is a big enough market that the hyperscalers do care about it and that it's been another, I believe, underappreciated strength of databricks, which is their ability to align themselves with the hyperscalers. There are a lot of examples of companies that came out with a great product, had a tremendous amount of momentum, and then Microsoft decided that this was too strategic for them to lose and so they were going to put all of their weight behind killing that product. I think we could all think of different examples here and that has oftentimes been a real challenge for growth stage software companies. I think databricks has just done a very good job of never positioning themselves in such a way that the hyperscalers are 100% incented to kill them. There's enough alignment, there's enough opportunity for partnership and mutual growth that that relationship with the hyperscalers has generally been relatively synergistic, despite the fact that they do represent very real competition.
A
I bring it up all the time, but Amazon with something like FedEx and UPS where they were relying on them and then FedEx and UPS couldn't deliver during the holidays and it was a big problem to the extent that Amazon built out a network and then eventually started to compete with it. But there is something to the coopetition being of high enough quality where it's not creating a problem. There are other dynamics that get involved, but that's very useful framing. I did want to get a little bit more into just some of the financial dynamics. You mentioned the 4 million in ARR at this point. How does it work from a customer perspective? Is it a simple usage based revenue model?
B
It is, yeah. So the way to think about it is that databricks charges based on the actual computer that's being utilized for any of the workloads that are on databricks. And so you can again going back to a tangible example, your credit card fraud example, every single time the customer wants to run an analysis and everything that happens underneath that in terms of the pipelines that pull in the data that incurs compute cost. And that's how databricks aligns itself from a monetization perspective. I would say that we've talked about this open source piece, but databricks has been really smart beyond just the core usage based pricing about recognizing when certain features or products are strategic, but when they actually have a right to charge for those products. One of the examples that Ali has given is that when you think about your smartphone, you have one of the features of the smartphone is your address book. The reality is the address book is an incredibly important feature, not just for phone calling, but a lot expands from having an address book in your smartphone. But the reality is that no handset maker is going to be able to charge for the address book. And so databricks has a lot of different products that are analogous to the address book where they are effectively again in many cases using open source to give it away for free in order to get adoption, but that are actually still very, very strategic. So one example of that is one of their big value propositions is providing a governance layer on top of all the data so that enterprises have a single pane of glass for which they can see all the metadata that they're processing. And so I guess all this is a long way to say that while databricks does use a usage based pricing based on compute, the reality is that at least from my perspective, they're actually monetizing more than just compute. It's a way of monetizing a lot more layers of value that databricks is providing.
A
And there's an intertwined nature to all the different things interacting with one another, which represents value even if it's not directly correlated to the compute cost. On your point of when they realize something has so much value that it's something that should be charged for, does that come from it's burning a hole in their pocket from the compute cost of that's how much value it has, or is it more of a qualitative assessment?
B
I can't speak necessarily for all the conversations that are had internally, but I think my view is that there are a lot of different inputs into every decision around how to monetize what to make open source. And sometimes it's more of a defensive stance of saying, look, we need to make sure that we get adoption of, for example, the governance product, or in other cases it might be offensive. And they've actually also done a very good job of historically recognizing when they can be disruptive. This might get a little bit into some of the technical details, but one of the big things that databricks did very effectively to compete against Snowflake was, was embracing open formats where they purposefully decided not to charge for storage. And that was something that Snowflake had historically done. That's another example of where strategically they're looking at a way to potentially be disruptive, not just from a pure, a lower cost perspective, but actually architecturally to enable customers to keep their storage, keep their data wherever they're storing it, don't force them to put it into databricks. You can actually just run databricks on top of where the data already sits. As opposed to historically, Snowflake was, you actually have to move all that data into Snowflake. So that's another tangential point around this idea of making decisions of not only just when to charge and when not to charge but also having a view strategically around. Is this effective from a defense and offense perspective?
A
Yeah. Competitive forces that come into play. And I guess on that point, when it comes to general pricing trends, aside from what they charge for, I can understand that it all gets blended together. But does the pricing trend tend to correlate to the cost of compute or would you say that they're able to raise pricing or are there pricing wars? I'm just curious. There's the quality of the product which is going to make customers make that decision. Some customers might depend on or lean in on the price. What drives pricing changes?
B
I would say that in this market it is more about total cost of ownership relative to performance. What a lot of customers care about is are we able to run our workloads in a performant way. And again, because it's not apples to apples in a lot of situations where there is the infrastructure layer that the hyperscalers monetize. If a vendor like databricks can actually help customers run their infrastructure more effectively, that may not show up in databricks pricing. But from a customer perspective that factors into total cost of ownership. And a lot of that is a technology solution in terms of better understanding how certain workloads are behaving. How do you optimize the underlying infrastructure to serve those workloads? And so in my experience, it's getting workloads into production and really seeing what is that total cost of ownership to achieve the goal of any particular use case. And the reality is that for a lot of these use cases, going back to the fact that databricks oftentimes is embedded in the core products or perhaps the nature of the decision is extremely strategic. If you can effectively provide the end value. The ROI is typically very clear for customers.
A
It's an interesting business in the sense that the use case on credit card fraud, you can actually draw a very clear ROI in terms of fraud as a major issue and a cost problem for credit card providers and being able to draw those connections. But there's other things that maybe are more difficult to draw direct ROI conclusions from, but equally as valuable. It's very interesting to hear how different ecosystems and value chains work when it comes to this type of business. On the cost side of the equation, cost of compute which is theoretically passed through, you have your overhead. Are there big buckets of cost that we didn't touch on? That would be very important.
B
Generally speaking, databricks model is fairly capital light. They don't very much need to get into the GPU acquisition situation. Yeah, that we're all very aware of these days. The reality is a lot of core data processing workloads are CPU based. It is interesting that when you hear Jensen at Nvidia talk about where he sees a lot of value for GPUs in the future, he does have a view that more of these workloads will transition to GPUs. But the reality today is that databricks products are not compute intensive in the same way that you would think about a lot of AI native companies, which.
A
Is amazing to me. It just speaks to what training would actually require in terms of compute. But given how much they process, that's actually surprising to me, but interesting.
B
It could evolve. One of the things that databricks is having success monetizing is called model serving, which is basically when you go back to some of the examples we've talked about, is this entire workflow around building the intelligence that underpins an application and you tie that into an LLM and you create what's called an endpoint that then exposes that intelligence to an application. More and more customers are asking databricks to actually host that endpoint, which is basically like an API on behalf of the customer. And in those situations, databricks will actually have GPU costs underneath that. But again, relative to some of the other examples out there, in terms of the scale of cost, it's a very different order of magnitude.
A
Not all GPUs are created equal. And that's a whole other topic. But I interrupted you, I think on the questions of costs. Were there any other things that you would bucket in there that we didn't touch on?
B
So databricks is running free cash flow positive at the scale that they are at 4 billion plus in ARR. One could make the argument that free cash flow positive is a very low bar, that perhaps they should be showing more profitability. But a big chunk of their cost is just around traditional software business model. It's investing in people, it's investing in R and D. It's really, I think of a lot of the scaled software companies out there. One of the companies that in my view has as strong a track record of any in terms of demonstrating ROI against organic innovation. So from a cost structure perspective, nothing dramatic to call out other than the fact that they are still investing very aggressively behind R and D. And that is a big reason why they've been able to maintain their pace of innovation even as they've expanded to so many new products and areas.
A
I do have to ask. It's capital light, it's free cash flow positive. They've done a lot of fundraising over the years. Where does that capital go?
B
So this is actually a unique dynamic that is not necessarily specific to databricks, but is more of a dynamic that I think more folks are aware of, which is some of these tier one high quality private assets are just staying private longer and longer. There's a dynamic where once you reach a certain level of scale, there is a certain expectation for perhaps early investors or oftentimes, more importantly, employees to be able to get liquidity for their options or RSUs. In the case of a lot of these companies, including databricks, oftentimes the reason why they need to do these big fundraises is a tax consideration for the fact that once you've provided employees enough opportunities to get liquidity, the IRS starts treating those RSUs and options as taxable. So historically within startups, one of the benefits and options was that that was deferred tax compensation. But this is something that I think the industry has started to learn is that there's a real tax bill for basically compensating employees via equity. And so to answer your question, the majority of the proceeds from the fundraisers of databricks have effectively been used to offset the employee stock compensation and the corresponding tax bill associated with it.
A
If I'm understanding correctly, it's not necessarily pure secondary in nature where the employees are cashing out. Maybe that is to some extent, but it is. When I get my equity grant, I ultimately only end up getting 66 of the 100 shares because 34 are used to pay the taxes associated with that compensation is exactly that makes sense. And on the point of more private companies are staying private for longer, it's been a very interesting dynamic. As a former public markets guy, I can actually understand some of those challenges. But what would you say just from your seat? It's very interesting. You sit at the crux of all of this. Do you think it's a when, not if, about going public? Or what would you expect that catalyst to be to whether it's specific to databricks or you can talk about industry.
B
As a whole, it's literally a trillion dollar question. Now with some of these assets and the scale that they've gotten to in terms of both the business and valuation, I think that for again that tier one list of privates, there's increasingly a term that we all know, the Mag 7 in the public markets, but there's effectively a Mag 7 in the private markets. You've now seen that there is enough Infrastructure in place from a fundraising perspective, both in terms of capital availability, but also just the process of doing these very large scale fundraises at late stage growth that has basically made it pretty easy for, if you're at a certain level of quality, to stay private. Going public becomes less about the need to access public markets and more of a discretionary decision of what are the pros and cons. And for each business there's going to be different decision around that. I think in the case of databricks, it's fair to say that because they were private in the 22 cycle in growth tech, they were able to continue to play offense in a way that a lot of their public peers were unable to. It really helped to accelerate their business for a bunch of different reasons you could imagine. But generally speaking, the ability to continue to invest both behind sales and R and D was something that really benefited databricks. I can't speak for them, but I would imagine that that was an informative experience where the thinking would be that the reasons to go public would have to be sufficiently high to overcome the benefit that they've already experienced of staying private.
A
It's a very interesting dynamic and I think you put it incredibly well there with a trillion dollar question. So it's going to be one that's interesting to watch. A lot of headlines out there this week about what could happen next year, but it's almost like a believe it when you see it type environment now.
B
Totally.
A
I think we talked a lot about what has gone right and the opportunity set ahead. What stands out to you from a risk perspective? I think we've glossed on some of those. But when you think about risks for the business, is there anything that pops out the most to you?
B
It is a dynamic enough market where pace of innovation is still very important that it can't be taken for granted that continued R and D execution sustains. We've seen examples in this space of companies that have perhaps taken their eye off of the ball. They've been slow to ship certain products and how that can really show up in the numbers. So I think first and foremost it's easy to say execution, but I do think in the case of databricks that continued execution at scale is important because they are doing so many different things. How they execute around the newer AI products is going to be important over time, maybe not even necessarily over the next two to three years because again of that dynamic that we talked about earlier where the core data processing tailwinds are so strong. But I do Think for the ambitions that databricks has, it will be important to execute on some of the AI products in the same way that they executed on the lakehouse. We are in a very similar dynamic right now where there's a question of category creation. What exactly does an agentic application product portfolio look like and what do you even call that? And how does the industry coalesce around that set of tools and products? I think is not just a question of product execution, but again that marketing and commercialization DNA. In the case of databricks, time and time again what I've seen is that the way that they make decisions has been with a very long term mentality in mind, go back to the decision around how they named the company. There are always these trade offs where if you have a shorter time horizon in mind, you might make a certain decision, you might not open source something, you might be tempted to monetize whatever feature you just came out with. I think the ability for databricks to continue to maintain that DNA is going to be really important because especially as we enter AI, every single time you make more of a short term oriented decision that inevitably opens you up to some sort of vulnerability down the road. And so I think it'll be really important from a cultural perspective, which is something that we spend a lot of time focusing on, that they maintain that core culture of being long term. But also again that founding academia based DNA of thinking first principles. It's easier said than done to just say, oh, keep doing that for us. That's an important thing to stay on top of.
A
Yeah, it might tailor back to your original answer on the staying private versus public. Having that long term mentality is a little bit easier to do in the private markets I think, than the public markets, which is notable and at such an interesting point in time with AI, it's all very interesting. We close these conversations out with lessons and I think you might have tapped into some at the end of your answer there. But what would you say are lessons that you can take away from databricks? And in the spirit of whether it's pattern recognition or anything else, what lessons can you take away from databricks and investing in that business?
B
Honestly, it would be just reiterating the last point around the long termism. I think that's something that a lot of investors and founders talk about. But I think in the case of databricks it's being able to actually point to so many specific examples where certain decisions are made where there is that clear trade off. There are certain times when people talk about being long term where it isn't clear what the trade off is. And I find that the way that the databricks team is able to talk about the bets that they're making, going all the way back to that original founding view of those three bets of where the world is going and operating in accordance with a level of consistency, is something that really stands out about databricks. We talk about how there were ways that they could have monetized sooner. Even going back to the cloud example where it wasn't entirely clear that the industry was all in on cloud, but they were so convicted that cloud was a real thing that they never came out with an on prem version of their product. And you can imagine again that there were a lot of examples where they could have that ended up leading to less monetization in the near term than they might have otherwise had. But they had this very clear view of where the world was going. And so if you just go back through the number of examples we've touched on, you can just point to a lot of examples of maintaining that long termism and recognizing why certain decisions are made. I found that following databricks, that's been something that I've started looking for in other companies.
A
I think that's a very interesting point on the long termism, but understanding what the actual trade off is, it's so easy to gloss over that second point. So very interesting. This has been a pleasure. Alan, thank you for educating me on something that I only knew a very surface level amount of material on. So it's been a pleasure.
B
No, this has been great. Thank you. To find more episodes of breakdowns ranging from Costco to Visa to Moderna, or to sign up for our weekly summary, check out joincolasis.com that's J O I N C O L O-S S U S.com.
A
This episode is produced in collaboration with WCM Investment Management. This discussion reflects WCM views as of the recording date December 11, 2025 and should not be considered current investment advice or a recommendation to invest in databricks or any other security. WCM has a financial interest in databricks, which creates an inherent bias in this discussion. For additional disclosures, visit wcminvest.com.
Guest: Alan Tu (WCM Investment Management)
Host: Matt Reustle
Recorded: December 10, 2025 | Released: January 8, 2026
This episode provides an in-depth examination of Databricks, a major but relatively mysterious force in enterprise data processing and AI infrastructure. Host Matt Reustle speaks with Alan Tu, portfolio manager and analyst at WCM Investment Management, to dissect Databricks’ origins, business model, technology evolution, competitive landscape, financial characteristics, and lessons for investors and operators.
Key themes include how Databricks turned academic research and open source foundations into a powerful, commercially successful platform, the critical role it plays in enabling both legacy and AI-driven data solutions at scale, and the company's repeated demonstration of long-term, first-principles thinking.
Company founded by seven researchers from UC Berkeley’s AMP Lab in 2009, amidst early developments in the cloud.
Three foundational bets:
Alan Tu (11:12):
“They believed that data was going to be an important thing. And so, how do we create a business around that? ...In hindsight, it turned out all three of those bets were very good bets.”
Databricks created a proprietary, higher-performance engine, moving essential features behind a paywall, not just “enterprise extras.”
Alan Tu (19:46):
“The better model that is smarter and will give you better answers, you do have to pay for.”
Many enterprises use both; Databricks excels at unstructured data processing, Snowflake at data warehousing.
Both moving into the other’s territory, but Databricks' leap into structured workloads (data warehouse) has been more successful so far.
Alan Tu (32:12):
“[Databricks’] data warehouse product scaling to a billion dollars… has dwarfed the analogous revenues that Snowflake has had around moving to data engineering.”
Databricks coined and legitimized the “Lakehouse” term, merging concepts of data lake and data warehouse.
A quarter of Databricks’ $4B+ ARR is already AI-related revenue.
“You don’t have an AI strategy without a data strategy”—demand for AI boosts demand for Databricks’ core offerings.
Multiple ways to win:
Alan Tu (47:12):
“One of the things I really like as an investment… you don’t have an AI strategy without a data strategy.”
This episode demystifies Databricks, illuminating its evolution from a group of academic founders to a commercial and technical heavyweight. Through disciplined long-term vision, iterative platform building, and careful strategy in open source, partnerships, and category leadership, Databricks set a new standard in enterprise data and AI. For investors, operators, and tech observers, Databricks exemplifies the power and subtlety of first-principles thinking—especially when backed by the stubborn patience to see bets through.