
Loading summary
A
Today on the AI Daily Brief, an AI scientist that can do six months of work in a single day. Before that, in the headlines Gemini 3 hype hit fever pitch. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Alright friends, quick notes before we dive in. Firstly, thank you to today's sponsors, kpmg, Robo Robots and Pencils and Blitzy. To get an ad free version of the show, go to patreon.com aidaily Brief or subscribe on Apple Podcasts. If you are interested in sponsoring the show, send us a Note@ SponsorsIDailyBrief AI to learn more. And lastly today we are rounding the corner on the AI ROI Benchmarking study. Thank you to all of you who have already contributed. I'm pretty sure this is now one of the biggest collections of AI ROI information. You have now about a week left to get your AI use cases in. Anyone that adds three or more will get the full detailed readout of the report. Again, you can find information about that at roisurvey AI and there is one week left. With that, let's join the Hype Train my friends. Welcome back to the AI Daily Brief Headlines edition. All the daily AI news you need in around five minutes. It is a big risk as a podcaster to talk about speculation about an imminent release. The chances are just so good that by the time someone hears the show, the thing that you're talking about, people all hyping up will actually be out and the show will instantly be dated. However, in this case, I don't care. First of all, it's a huge part of the conversation going on right now all over the AI community. And second, I kind of feel like it's similar to when you're in a restaurant and you've been waiting for your meal for a while and you head to the washroom in hopes that by the time you come back, your food will be sitting there waiting for you, all glittering and ready. In other words, if it happens to be the case that Google pops in and drops Gemini 3 right on top of my head, I'll take it now. Generically, people have been getting excited for the Gemini 3 release for a while. However, it certainly feels basically confirmed at this point. With a teasing Tweet from Google CEO Sundar Pichai On Friday, Sundar retweeted Polymarket showing 69% odds of the model being released this week with two thinking face emojis. Other Googlers basically all over X are also teasing the release. You even have some of the OpenAI folks getting excited with Adam GPT posting I'm excited for the rumored Gemini 3 model. Seems like it has the potential to be a real banger now. As Rasr X pointed out, if even an OpenAI employee is this chilled about Google's rumored Gemini 3, you don't need a decoder ring to see what's going on. OpenAI must have an absolute monster model lined up for December. Business Insider certainly views the final few weeks of the year as a shootout between the Google and OpenAI teams, they wrote, if Gemini 3 is a smash hit, and right now insiders tell Business Insider that the new model is extremely impressive, then it could give Google a shot at taking the top spot, a position it's been vying to reclaim since the generative AI boom began. Many are betting on Google, Chubby wrote edgy take neither OpenAI nor Anthropic will have a good answer to Gemini 3 anytime soon. Gemini will remain the best and increasingly popular AI model for a considerable time. Testing Catalog went one better, adding, google will likely be the first to reach Level three and actually make a publicly available product offering at a level three scale very soon. Now. Level three refers to the five level framework for AI that came out of DeepMind and then was refined by OpenAI, with level three being agents or systems that can take actions. The hype is in fact getting so hypey that some are making fun of it. Boyan Tungus writes, Gemini 3 is so powerful it made Chuck Norris concede defeat. Andre Karpathy said, I heard Gemini three answers questions before you ask them and that it can talk to your cat. Some think the entire AI narrative is riding on the model being transformative with DMT Capital commenting if Gemini 3.0 doesn't cure cancer or world hunger, it's going to be incredibly over. Now. Polymarket is currently pricing in a Tuesday release, so we probably won't have all that much longer to wait to find out. Now moving over to the market side of the house. Despite the fear on Wall Street, Berkshire Hathaway is buying into the AI bubble to the extent that that's what we have. On Friday, regulatory filings disclosed that Warren Buffett's investment firm had purchased around $4.9 billion worth of Google stock during Q3. The same filing showed that Berkshire had further trimmed their positions in bank of America and Apple. Berkshire now holds a 0.3% stake in Google, which is relatively modest by their standards. Even after the selling, they still hold a 7.7% stake in bank of America and around 1.5% of Apple. Still, it's one of the largest new positions bought by Berkshire since they began piling up cash in 2023. Around a third of the firm's portfolio, some 382 billion is still held in cash as of the end of last quarter. For many investors, Berkshire buying AI stocks will be a huge signal to reexamine their views on a potential AI bubble. Although Warren Buffett has announced his retirement at the end of the year, Berkshire is still an embodiment of Buffett's investing style. And when it comes to tech, the style isn't necessarily that great. Buffett famously refused to buy into big tech as it led one of the longest bull markets in US history during the 2010s. They finally bought Apple in 2016, but until now haven't owned any of the other Mag 7. Berkshire typically doesn't invest in high growth companies. Instead, they're a value investor looking for companies that are mispriced based on current metrics. Still, Buffett admitted that he blew it by not investing in Google. Earlier in 2018, he said, I had seen the product work. I knew the kind of margins they had. I didn't know enough about technology to know whether this really was the one that would stop the competitive race. Buffett's longtime partner, the late Charlie Munger, put it more bluntly. In 2019, Munger said that he didn't feel badly for not seeing the rise of Amazon coming, but that he felt, quote, like a horse's ass for not identifying Google better. I think Warren feels the same way. Now. Importantly, this isn't necessarily a massive bet on AI for Berkshire. Google is still only the 10th largest position for the firm, and they are notably not buying into the speculative semiconductor or data center management companies. But it is still a major position and suggests that Berkshire thinks Google will have a strong position as a US Tech leader in the medium to long term. It's also, frankly not the kind of position you would put on if you believe the music is about to stop on a massive bubble in that sector. The position came about sometime in Q3, so Berkshire is already up at least 30% on it in just a few months. Google Stock rallied another 4% in after hours markets over the weekend following the Berkshire disclosure. Now staying on the bubble theme, a week after sending the bubble talk into overdrive, Michael Burry has shut down his hedge fund. Burry famously bet against the housing market in 2008, so when he revealed a big short on Palantir and Nvidia, some believed betting against the AI bubble would be his next triumph. The media reported the Palantir short as a $9 billion bet. However, Burry corrected them last Thursday, noting that they got the math wrong and that he had only bought around 9 million worth of bearish Palantir options. The relatively small size suggests Burry didn't have many investors left after repeatedly shorting stocks over the past decade. And indeed, in a letter to investors dated October 27, Burry said that he would be liquidating the fund and returning capital. He acknowledged, my estimation of value in securities is not now and has not been for some time in sync with markets now. The letter leaked towards the end of last week, but based on the date, Burry had already made the decision to close the fund when he deliberately made headlines by disclosing his positions early. Indeed, despite shutting down the fund, he is still pushing his short thesis on X, suggesting the AI Capex boom will roll over next year and send the Nasdaq plummeting. The big question is whether he's still worth paying attention to. In a weekend op ed, Bloomberg's Jonathan Levin asks what the obsession with Michael Burry says about ourselves. Writes Levin, we're obsessed with contrarian investors that make concentrated hero bets on macro outcomes, and our fascination has only grown as an artificial intelligence boom pushes valuations ever higher. In easily my most viewed tweet of all time, I put it a little bit more crisply. An entire generation watched the Big Short, thought Michael Burry was cool, and spent the next decade calling everything a bubble. There was actually a really phenomenal post from an account called TMT Breakout on X that basically argues that Sam Altman and OpenAI's aggressive announcement of all of these deals popped the non bubble and put AI into a more scrutinized and reasonable place. They write Bad news for the AI bulls and bears. The past few weeks has brought an end to that paradigm and led us to an unexpected turning point in the Dynamics of the AI Trade and Narrative. On the three year anniversary of ChatGPT's release, no less, and we have Sam's $1.4 trillion 30 gigawatt splurge to thank for it. Sam Splurge opened up AI Pandora's Box, shifting the AI narrative in unexpected ways. Basically, they argue that the deal making was so ubiquitous and overwhelming that it actually made people take a big pause. They write the ironic thing if Sam Splurge would have been about half the size, things would have continued to grind along. Investors would have enjoyed the 27 and 28 visibility, maybe even building the energy for a large vertical ascent in price action. Instead, we had the opposite effect, pouring too much gasoline on the fire and drowning out the energy for a big move up the conclusion we think the straight line, giddy phase of the AI trade will give way to something healthier, a phase where fundamentals and idiosyncrasies matter even more. Tech will always be a narrative in boom and bust heavy investing sector. That's part of the fun. But in a landscape where sentiment is more balanced, stock picking will become more relevant. That's a good thing. Sam Splurge popped the non bubble, but the AI trade isn't broken, it's simply entering a more mature, scrutinized phase. Interesting stuff, but that is going to do it for today's headlines. Next up, the main episode. What if AI wasn't just a buzzword but a business imperative on you can with AI, we take you inside the boardrooms and strategy sessions of the world's most forward thinking enterprises. Hosted by me, Nathaniel Whittemore and powered by kpmg, this seven part series delivers real world insights from leaders who are scaling AI with purpose. From aligning culture and leadership to building trust, data readiness and deploying AI agents. Whether you're a C suite, executive strategist or innovator, this podcast is your front row seat to the Future of Enterprise AI. So go check it out at www.kpmg.us aipodcasts or search you can with AI on Spotify, Apple Podcasts or wherever you get your podcasts. Meet Rovo, your AI powered teammate Rovo unleashes the potential of your team with AI powered search, chat and agents or build your own agent with Studio. Robo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform, so it's always working in the context of your work. Connect Rovo to your favorite SaaS app so no knowledge gets left behind. Robo runs on the Teamwork graph, Atlassian's intelligence layer that unifies data across all of your apps and delivers personalized AI insights from day one. Robo is already built into Jira Confluence and Jira Service Management Standard, Premium and enterprise subscriptions. Know the feeling when AI turns from tool to teammate? If you Rovo, you know. Discover Rovo, your new AI teammate. Powered by Atlassian. Get started at ROV as in Victory. Oh AI changes fast. You need a partner built for the long game. Robots and pencils Work side by side with organizations to turn AI ambition into real human impact. As an AWS Certified partner. They modernize infrastructure, design cloud native systems and apply AI to create business value. And their partnerships don't end at launch as AI changes robots and pencils stays by your side so you keep pace. The difference is close partnership that builds value and compounds over time. Plus with delivery centers across the us, Canada, Europe and Latin America, clients get local expertise and global scale. For AI that delivers progress, not promises, visit robotsandpencils.com aidaily Brief this episode is brought to you by Blizzi, the enterprise autonomous software development platform with infinite code context. Blitzi uses thousands of specialized AI agents that think for hours to understand enterprise scale code bases with millions of lines of code. Enterprise engineering leaders start every development sprint with the Blizzy platform, bringing in their development requirements. The Blitzi platform provides a plan, then generates and pre compiles code for each task. Blitzi delivers 80% plus of the development work autonomously while providing a guide for the final 20% of human development work required to complete the sprint. Public Companies are achieving a 5x engineering velocity increase when incorporating Blizzi as their pre IDE development tool, pairing it with their coding pilot of choice. To bring an AI native SDLC into their org, visit blitzi.com and press get a demo to learn how Blitzi transforms your SDLC from AI Assisted to AI Native. Welcome Back to the AI Daily Brief. As we sit here antsy in the Gemini 3 waiting room, there is a big discussion going on right now in the AI community about this new AI scientist called Cosmos. Now if you spend any time in and around the AI community, you'll know that one of the big promises that all the big labs talk about all the time, and which Sam Altman from OpenAI has a particular penchant for, is the idea of AI advancing scientific research and doing so in some independent or mostly autonomous fashion. For as long as I've been paying attention to Altman's comments on AI, the scientific discovery use cases of AI have been the thing that have seemed to drive him more than any other. He wrote about this back in June in his post called the Gentle Singularity. Back then pre GPT5, he wrote, we already hear from scientists that they are two or three times more productive than they were before AI. From here on, the tools we have already built will help us find further scientific insights and aid us in creating better AI systems. He's reiterated these themes in basically every interview he's done. When asked recently about his AGI definition, he said, when AI can bring completely new discoveries. And he added, but you see all these examples now on Twitter where scientists in these different fields are saying it made a small discovery or came up with a novel approach or it figured something out, basically acknowledging that you are seeing more and more the first glimpses of AI for scientific discovery. Perhaps this is why OpenAI's former chief product officer Kevin Weil began OpenAI for Science with the goal to build an AI powered platform for accelerating scientific discovery. And so with all this as background, I noticed, as many did over the weekend, that Altman himself had tweeted about this new announcement for a thing called Cosmos from Edison Scientific. Sam added, this is exciting. I expect we are going to see a lot more things like this and it will be one of the most important aspects of AI. Congrats to the futurehouse team. So what is Cosmos? CEO Sam Rodriquez explained it like this on Twitter X he posted. Today we're announcing cosmos, our newest AI scientist available to use now. Users estimate Cosmos does six months of work in a single day, one run, can read 1500 papers and write 42,000 lines of code. At least 79% of its findings are reproducible. Cosmos has made seven discoveries so far, which we're releasing today in areas ranging from neuroscience to material science and clinical genetics in collaboration with our academic beta testers. Three of the discoveries reproduced unpublished findings and four are net new validated contributions to the scientific literature. AI accelerated science is here, so let's talk about what these discoveries were. As Rodriguez said, three of the discoveries saw Cosmos independently reproducing findings that were previously made by human scientists. In the first, they write, Cosmos reproduced a claim from a then unpublished manuscript using metabolomics data identifying nucleotide metabolism as the dominant altered pathway in hypothermic mice brains. The second discovery related to perovskite solar cells, which are a new, lightweight, low cost solar technology that has many benefits, but which is very sensitive to moisture. Cosmos confirmed that humidity during heat treatment is the key factor in how well they work, and found a fatal filter point above which a certain amount of humidity the cells fail. In a third discovery, Cosmos found the same mathematical patterns in how neurons connect across different species. Now, in the next four discoveries, Edison, which is the company behind Cosmos, claims that Cosmos made novel contributions to the scientific literature. The fourth discovery was that Cosmos found statistical evidence that higher levels of the enzyme SOD 2 may help reduce heart tissue damage in humans, which supports earlier findings seen in mice. In the fifth discovery, Cosmos used large genetic data sets to propose a new molecular explanation for how a specific genetic variant may lower the risk of type 2 diabetes. In the sixth discovery, Cosmos created a new method to map the order of molecular changes that lead to tau buildup in Alzheimer's disease. Finally, in the seventh discovery, Cosmos found that the neurons first affected in Alzheimer's show reduced expression of flippase genes as mice age, which may make those neurons more vulnerable. Now, not being a scientist in any of these fields, I don't really have any sense of how significant these discoveries are. And obviously now that Edison has gone public with this, I presume that lots and lots of scientists who are in these fields will go actually dig in and validate it for themselves. And of course I think it is going to be a very necessary skill as we head into the age of AI scientific discovery to be extremely skeptical of claims as a default position. Even if you have no prior reason to doubt the source of those claims, we just generally need to keep our skepticism very high. Still, it seems extremely, extremely promising. And so how does this work? Sam Rodrique has acknowledged that these numbers are out of sync in a positive way with previous estimates of where agentic capabilities were. In that same announcement post on X, he wrote, we are aware that the six month figure is much greater than estimates by other AI labs like Meter about the length of tasks that AI agents can currently perform. So how do they do this? They write, our core innovation in Cosmos is the use of a structured, continuously updated world model. Cosmos's world model allows it to process orders of magnitude more information than could fit into the context of even the longest context language models, allowing it to synthesize more information and pursue coherent goals over longer time horizons than any of our prior agents. Now one note here is as Simon Smith points out, when I looked at the Cosmos paper, it wasn't clear what world model meant. I got the sense that it's a knowledge graph to which agents add information as they collect it. Which is cool and useful, but probably not what most people mean by world model. Given that we have recently been talking about world models, I think the distinction is important. Carlos Perez tries to simplify what they have going on under the hood. He wrote, we hear AI scientist and think it's just a chatbot that's good at summarizing Wikipedia. I was skeptical too. Most of these systems are toys. They can do a cool analysis, but they lose focus after a few steps. They can't run a real long term investigation. The real problem wasn't raw intelligence, it was coherence. Imagine trying to write a book with 100 different people who can't see what anyone else is writing, you get a mess of disconnected paragraphs. That's what previous AI agents were like. Brilliant, but hopelessly siloed. So the team behind Cosmos didn't just try to build a smarter brain, they built a shared consciousness. They call it a structured world model, which sounds complex, but the idea is genius. Think of it like a giant live updating whiteboard. Cosmos unleashes hundreds of AI agents in parallel. One reads scientific papers, another analyzes data. When an agent finds something, it puts it on the whiteboard. Crucially, every other agent can see the whole board. Now, for those of you who've been following along for a while, this sounds to me a bit like doctor Strange. But for scientific research, where you spin up a lot of instances of a thing doing similar work so that it can in aggregate outperform. Nico McCarty writes, The general idea is that science follows a series of steps and that many of these steps can be automated. Those steps are search the literature, read stuff, use your reading to come up with new hypotheses, try to draw connections between things, analyze data to draw conclusions, write up your results, repeat. He continues, cosmos uses two separate agents, one for data analysis, another for literature searches. To go out and do these tasks while sharing information with each other, the agents can then see what the other agents have learned, which is super useful. They exist within a single world model. A single run of Cosmos can execute up to 42,000 lines of code across 166 different data analysis agents, and also read 1500 scientific papers using 36 literature review agents. Each run takes up to 12 hours. So that's the gist. You spin this thing up, give it a huge prompt, and then let it cook. Now, it is important to note that even the team themselves don't think that things are perfect. First of all, they say you have to know how to use it, and it's much closer to a deep research tool. It's pricey at $200 a run. And they point out, while Cosmos certainly does produce outputs that are the equivalent of several months of human labor, it also often goes down rabbit holes or chases statistically significant yet scientifically irrelevant findings. They point out, we often run Cosmos multiple times on the same objective in order to sample the various research avenues it can take. Which brings up one of the interesting questions. In that same post from Nico, he wrote, I'm not wholly convinced that the idea of extremely long runs will be palatable to most biology researchers. My take is that researchers are looking for more of a real time collaborator where you're constantly prompting and getting immediate feedback rather than just delegating huge open ended tasks to agents. Now this hearkens to me to the conversation that I had with Swix a couple weeks ago about the autonomy spectrum when it comes to coding agents. One of the things that we are figuring out from a user experience expectation standpoint across all these different domains of AI uses is is to what extent people want really fast real time collaboration versus agents that go off and do things on their own. That balance is going to be a toggle on a spectrum and it's not exactly clear for different use cases what the optimal combination is going to be. Andrew White, another co founder of Edison Scientific, responded to that and said love the pushback on autonomy versus interaction. It's something we struggle with internally. It's cost prohibitive right now, but I would rather run 10 Cosmos jobs and then choose or edit the analysis I like rather than agonizingly try to tell an agent exactly what to do. Now what about the claims of the six month estimate? In their blog post they write, the most surprising part of our work on Cosmos was finding that a single Cosmos run can accomplish work equivalent to six months of a PhD or postdoctoral scientist. Moreover, the perceived work equivalency scales linearly with the depth of the Cosmos run, providing one of the first inference time scaling laws for scientific research. They say that they were skeptical when they first got the results, but then share why they think it's valid. So how did the methodology for collecting this actually work? Basically this comes from estimates obtained from polling Cosmos beta users. The beta users would give the team a research objective. They would run Cosmos for them, give them the outputs, and then poll them on how much time they estimated it would have taken them to come to the same conclusion. They write, the average across seven scientists was 6.14 months for a 20 step Cosmos run. Of course, they point out, human estimates of time saved are intrinsically suspect. And Niko has an issue here as well. Nico writes the paper tries to quantify the time it would take for a human scientist to complete the work that Cosmos performs, but I find it a bit hand wavy. They say it takes a typical researcher 15 minutes to read a paper and two hours to write a jupyter notebook for data analysis. And since Cosmos can read 1500 papers per run, it offers huge time savings. But, he continues, human scientists don't need to read hundreds of pages to make a discovery. The best scientists have an innate ability to triangulate to innovation, find the right combo of papers and discussions that enable them to make conceptual advances. This seems difficult to replicate. The team at Cosmos agrees, at least in part, writing human estimates of time saved are intrinsically suspect. However, they point to two reasons they think that Cosmos's work packages do actually equate to months of scientist time. The first are the three discoveries that had been previously made but unpublished by humans, and the second was independent time estimates which got that single paper 15 minutes piece. Now, whether you think all of those numbers add up to exactly the right metric again, I think it's fine to be skeptical, but there's clearly something powerful going on here and progress being made. Computational biologist Zachary Flamholz wrote about his use of the tool. His conclusion it is an understatement to say that I was impressed with what Cosmos did. From the well structured discovery report, it was obvious that Cosmos understood my research question on par with my own understanding. This was new for me and AI tools. Previously I used this research question to test other commercially available chatbots and none have sufficiently understood my question with the correct nuance and scientific context to advance my understanding of the question, let alone do work on the problem. The last paragraph reads. I am writing this post and starting this blog because my experience with Cosmos is causing me to reimagine what my career will look like. Until now, commercial AI tools have been an efficiency multiplier, for which I am very grateful. But Cosmos is different. The scientific Enterprise will remember November 5, 2025. Stay tuned. Big words, of course, but very interesting stuff. You can find out more about cosmos@edisonscientific.com and certainly I think that this will be a theme that we keep coming back to. For now, that's going to do it for today's AI Daily Brief. Appreciate you listening or watching as always. And until next time, peace.
In this episode, host Nathaniel Whittemore (NLW) explores the hype surrounding imminent AI model releases, market reactions to the AI sector, and the breakthrough announcement of Cosmos, an AI “scientist” from Edison Scientific. Cosmos is said to perform six months of scientific research work in a single day, sparking wide debate about the capabilities, methodology, and implications of autonomous AI in research.
This episode of The AI Daily Brief provides a snapshot of a field on the cusp of transformative change. While top-line hype continues around new models like Gemini 3, the revelatory feature is Cosmos—a tool that, by blending hundreds of agents and shared “world models,” may redefine productivity and methodology in scientific research. As NLW notes, skepticism is vital as these claims unfold, but the pace of AI’s incursion into high-complexity domains is undeniable—and the conversation around ramifications, reliability, and user experience is only beginning.
Find more about Cosmos at EdisonScientific.com.