Transcript
A (0:00)
After helping people optimize models for about a decade and a half, I came to a really interesting realization that I was solving the wrong problem. And foundationally it was that the thing that's holding back people getting value from these AI systems is not performance. It's not about squeezing out that last half a percent from some eval function or some performance metric. It's about being able to confidently trust these systems. And I can't tell you how many times over the decades we would help someone optimize a system and they would say, okay, well what did you break? What bad behaviors are you introducing? What lack of robustness do I now have because I've overfit this system? And we're seeing people do the exact same thing again today with LLMs where they're focusing on these high level metrics, these end outputs, these performance evals, and that ends up masking all of these potentially undesired behaviors within the system itself.
B (0:56)
Welcome to the A16Z AI podcast. I'm Derek Harris and I'm joined this week by a16Z partner Matt Borenstein and Distributional co founder and CEO Scott Clark for a deep dive on deploying and testing AI systems, particularly, but not exclusively, LLMs in enterprise environments. We cover a number of topics under this broad umbrella, but the focus is really on how enterprise users can and are trying to establish a trusting relationship with AI so they can actually put it to work on important work and scale their deployments beyond small projects. Generative AI can be inherently chaotic at a systems level, so it's a process that involves invoking some level of control over usage and lots of lots of monitoring and testing to track behaviors and ensure small changes to one part of the system. A system prompt, for example, don't have unexpected impacts in other parts, including customer facing outputs. The discussion begins with Scott giving his somewhat tongue in cheek definitions of machine learning and artificial intelligence before sharing some of his background, including the hard earned realization that trust, not performance optimization is the biggest factor in how heavily large companies will deploy AI. As a reminder, please note that the content here is for informational purposes only, should not be taken as legal, business, tax or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. For more details, please see a16z.com disclosures.
C (2:27)
So can you just define for us what's machine learning? What's AI like? As someone who's lived through the ups and downs of this market, I remember.
A (2:35)
Answering a similar question I think on an A16C podcast maybe eight years ago.
