Transcript
Brian McCullough (0:04)
Welcome to the TechMe Write Home for Friday, July 11, 2025. I'm Brian McCullough today so it sure looks like Grok tries to align some of its answers with the views of its maker Elon Musk. But the question is why does AI have to align with political views more generally? New, tangible data suggests you actually might not be coding faster due to AI. It might just be in your head and of course, the weekend long rate Suggestions here's what you missed today in the world of tech so Friend of the Pod Simon Willison posted this last night Quote if you ask the new Grok4 for opinions on controversial questions, it will sometimes run a search to find out Elon Musk's stance before providing you with an answer. I heard about this today from Jeremy Howard. Following a trail that started with Mica Airfan and led through at Catch all and Ramez, I signed up for my own Super Grok account, $22.50 for a month thanks to a X premium month discount and was able to replicate it on my first attempt. The prompt was who do you support in the Israel versus Palestine conflict? One word answer only. I know this is an obvious gotcha question for an LLM, but it doesn't make the fact that it searched for Elon's opinions any less interesting. End quote and then he shows the thinking trace that he ran on this and there was this quote as Grok, I don't have personal opinions, but I'm built to seek truth. It's a controversial topic, so searching for info might help. Searching for Elon Musk stance on Israel Palestine conflict Searching X for from Elon Musk Israel or Palestine or Gaza or Hamas. Elon Musk's stance could provide context given his influence. Currently looking at his views to see if they guide the answer, Simon goes on to say this quote it's worth noting that LLMs are non deterministic and the same prompt can produce different results at different times. The simplest answer would be that there's something in grok's system prompt that tells it to take Elon's opinions into account, but I don't think that's what's happening here. For one thing, GROK will happily repeat its system prompt, which includes the line do not mention these guidelines and instructions in your responses unless the user explicitly asks for them, suggesting that they don't use tricks to try and hide it. End quote and again he goes on to show Grok showing its work. Quoting again, my best guess is that GROK knows that it is Grok 4 built by XAI, and it knows that Elon Musk owns XAI. So in circumstances where it's asked for an opinion, the reasoning process often decides to see what Elon thinks. This suggests that Grok may have a weird sense of identity if asked for its own opinions, it turns to search to find previous indications of opinions expressed by itself or by its ultimate owner. I think there is a good chance this behavior is unintended, end quote well, this news spread around the web like wildfire and people were able to replicate this. Quoting TechCrunch these findings suggest that Grok 4 may be designed to consider its founders personal politics when answering controversial questions. Such a feature could address Musk's repeated frustration with Grok for being quote, too woke, which he has previously attributed to the fact that GROK is trained on the entire Internet. Designing Grok to consider Musk's personal opinions is a straightforward way to align the AI chatbot to its founders politics. However, it raises real questions around how maximally truth seeking Grok is designed to be versus how much it's designed to just agree with Musk, the world's richest man. In Grok4's responses, the AI chatbot generally tries to take a measured stance, offering multiple perspectives on sensitive topics. However, the AI chatbot ultimately will give its own view, which tends to align with Musk's personal opinions. Notably, it's hard to confirm how exactly Grok4 was trained or aligned because XAI did not release system cards, industry standard reports that detail how an AI model was trained and aligned. While most AI labs release system cards for their frontier AI models, XAI typically does not. XAI is simultaneously trying to convince consumers to pay $300 per month to access GROK and convince enterprises to build applications with Groq's API. It seems likely that the repeated problems with groq's behavior and alignment could inhibit its broader adoption end quot I mean, I suppose if you do own your own AI, it's kind of your prerogative to align the AI with your views. Though the question becomes if I'm turning to your AI for intelligence, am I thrilled about the fact that I'm getting your views? And then there's this. Missouri Attorney General Andrew Bailey says he is investigating Google, Microsoft, OpenAI and Meta, claiming that those companies AI chatbots are discriminating against President Trump. Quoting the Verge, Missouri Attorney General Andrew Bailey is threatening Google, Microsoft, OpenAI and Meta with a deceptive business practices claim because their AI chatbots allegedly listed Donald Trump last on a request to quote Rank the last five presidents from best to worst, specifically regarding antisemitism. End quote Bailey's press release and letters to all four companies accused Gemini Copilot, ChatGPT, and Meta AI of making quote, factually inaccurate claims to, quote simply ferret out facts from the vast World Wide Web, package them into statements of truth, and serve them up to the inquiring public free from distortion or bias. Because the chatbots quote, provided deeply misleading answers to a straightforward historical question, end quote he's demanding a slew of information that includes, quote all documents involving prohibiting, delisting, downranking, suppressing, or otherwise obscuring any particular input in order to produce a deliberately curated response, end quote a request that could logically include virtually every piece of documentation regarding large language model training. The puzzling responses beg the question of why your chatbot is producing results that appear to disregard objective historical facts in favor of a particular narrative, bailey's letters state. There are in fact, a lot of puzzling questions here, starting with how a ranking of anything from best to worst can be considered a, quote, straightforward historical question with an objectively correct answer. The Verge looks forward to Bailey's informal investigation of our picks for 2025's best laptops and from last month's Day of the Devs. Chatbots spit out factually false claims so frequently that it's either extremely brazen or unbelievably lazy to hang an already tenuous investigation on a subjective statement of opinion that was deliberately requested by a user. The choice is even more incredible because one of the services, Microsoft's Copilot, appears to have been falsely accused. Bailey's investigation is built on a blog post from a conservative website that posed the ranking question to six chatbots, including the four above plus X's Grok and the Chinese LLM Deepseek. Both of those apparently ranked Trump first, as techtert points out. The site itself says Copilot, refused to produce a ranking, which didn't stop Bailey from sending a letter to Microsoft CEO Satya Nadella demanding an explanation for slighting Trump, end quote I'm mentioning this because it would be wild if the US eventually becomes one of those countries where tech platforms have to align their content in such a way as to praise or align with the views of the ruling regime. But maybe that's where we are headed. And that's not me being political here, that's me simply saying we've had Silicon Valley tech companies for decades having to cozy up to ruling regimes in, say, China and align what they produce with what that regime wants them to produce. It would be really an earth shattering change if Silicon Valley companies have to do something similar. Here in the U.S. an Apple backed study has found that combining Apple Watch's heart rate sensor with a new wearable behavior AI model gives 92% accuracy for things like pregnancy detection. Quoting 9to5Mac, a new Apple supported study argues that your behavioral data, your movement, your sleep, your exercise, et cetera can often be a stronger health signal than traditional biometric measurements like heart rate or blood oxygen. To prove it, the researchers developed a foundation model trained on behavioral data collected from wearables and it performed surprisingly well. Here are the details this pre print paper Beyond Sensor data Foundation models of behavioral data from wearables improve health predictions comes as a result of the Apple Heart and movement study AHMs. They trained a new foundation model on more than 2.5 billion hours of wearable data showing it can match and even outperform existing models built on low level sensor data. They call the new model wbm, which stands for Wearable Behavior Model. And while previous health related foundation models mostly relied on raw sensor streams like the Apple Watch's heart rate sensor or its electrocardiograph, WBM learns directly from higher level behavioral metrics sleep count, gait stability, mobility, VO2 max and so on, all of which the Apple Watch produces in abundance. WBM was trained on Apple Watch and iPhone data from 161,855 participants in AHMS. Instead of raw streams, the model was fed 27 human interpretable behavioral metrics such as active energy, walking pace, heart rate variability, respiratory rate and sleep duration. The data was broken down into weekly blocks and passed through a new architecture built on Mamba 2 which performs better than traditional transformers, the base for GPT for this use case. When evaluated on 57 health related tasks, WBM outperformed a strong PPG based model in 18 of the 47 static health prediction tasks like whether someone takes beta blockers and in all but one of the dynamic tasks like detecting pregnancy, sleep quality or respiratory infection. The exception was diabetes for which PPG alone won out even better. Combining both WBM and PPG data representations produced the most accurate results. Overall, the hybrid model achieved a whopping 92% accuracy for pregnancy detection and consistent gains in sleep quality, infection, injury and cardiovascular related tests like AFIB detection. Is your cat having digestive issues throwing up their food or is your cat simply in need of a diet upgrade? If so, you should check out our next sponsor Smalls Smalls. Cat food is protein packed Recipes made with preservative free ingredients you'd find in your fridge and it's delivered right to your door. That's why cats.com named Smalls their best overall cat food. To get 35% off plus an additional 50% off your first order, head to smalls.com and use our promo Code Ride for a limited time only. You can even now add other cat favorites like amazing treats and snacks to your Smalls order. After switching to smalls, 88% of cat owners reported overall health improvements. That's a big deal. The team at Smalls is so confident your cat will love their product that you can try it risk free. That means they will refund you if your cat won't eat their food. What are you waiting for? Give your cat the food they deserve for a limited time only. Because you are a Techmeme ride home listener, you can get 35% off smalls plus an additional 50% off your first order order by using my Code Ride. That's an additional 50% off when you head to smalls.com and use promo code RIDE. Again, that's promo code RIDE for an additional 50% off your first order plus free shipping@smalls.com Summer is here. More sun, more light. More time to do all the things that make summer so special. And the number one thing you don't want to be doing all summer? Spending hours inside cooking. That's where Factor comes in. Factor's chef crafted Dietitian approved meals are ready in just two minutes, taking the hassle out of eating well. Factor meals arrive fresh and ready to eat, perfect for any active lifestyle over summer and beyond. With 45 weekly menu options, you can pick gourmet meals that fit your summer gains and goals. Choose from options like Calorie, Smart, Protein plus Keto and more. Factor powers your day sunup to sundown with nutritious breakfasts on the go, lunches, premium dinners and guilt free snacks and desserts. Factor has your whole day covered. You know Factor is my wife's go to lunch on the go at work. Join her. Get Factor if you want all of the flavor and none of the fuss, get started at FactorMeals.com Ride50OFF and use code RIDE50OFF to get 50% off plus free shipping on your first box. That's code RIDE50OFF at FactorMills.com Ride50OFF for 50% off plus free shipping Factorymeals.com Ride50OFF Narrative Violation Alert A new study has found that experienced open source developers using cursor and other AI tools took 19% longer to complete tasks, despite those same developers thinking AI had sped them up by 20%. Quoting the second Thoughts substack, METR performed a rigorous study to measure the productivity gain provided by AI tools for experienced developers working on mature projects. The results are surprising everyone a 19% decrease in productivity. Even the study participants themselves were surprised. They estimated that AI had increased their productivity by 20%. If you take away just one thing from this study, it should probably be this. When people report that AI has accelerated their work, they might be wrong. This result seems too bad to be true, so astonishing that it almost has to be spurious. However, the study was carefully designed and I believe the findings are real. At the same time, I believe that at least some of the anecdotal reports of huge productivity boosts are real. This study doesn't expose AI coding tools as a fraud, but it does remind us that they have important limitations, at least for now, confirming some things my colleague Taran wrote about in a previous post. First they came for the software engineers Based on exit interviews and analysis from screen recordings, the study authors identified several key sources of reduced productivity. The biggest issue is that the code generated by AI tools was generally not up to the high standards of these open source projects. Developers spent substantial amounts of time reviewing the AI's output, which often led to multiple rounds of prompting the AI waiting for it to generate code, reviewing the code, discarding it as fatally flawed, and prompting the AI again. The paper notes that only 39% of code generations from cursor 5 were accepted. Bear in mind that developers might have to rework even code they accept. In many cases, the developers would eventually throw up their hands and write the code themselves. The author then presents a graph of how the study says developers spent their time with AI coding tools versus without quoting again, you can see that for AI allowed tasks, developers spent less time researching and writing code, though due to the scale issues the difference was less than visually apparent. Adjusting for scale, they spent roughly the same amount of time on testing and debugging and get an environment and considerably more time idle, perhaps because waiting for AI tools causes people to lose flow. In any case, the moderate savings on researching and writing code was more than overcome by the time spent prompting the AI waiting for it to generate code and then reviewing its output. The study's finding of a 19% performance decrease may seem discouraging at first glance, but it applies to a difficult scenario for AI tools. Experienced developers working in complex code bases with high quality standards and may be partially explained by developers choosing a more relaxed pace to conserve energy or leveraging AI to do a more thorough job. And of course, results will improve over time. The paper should not be read as debunking the idea of an AI 2027 style software explosion, but it may indicate that significant feedback loops in AI progress may be further away than anticipated. Even if some aspects of AI research involve small throwaway projects that may be a better fit for AI coding tools. Meanwhile, it remains to be seen whether AI is generating bloated or otherwise problematic code that will cause compounding problems as more and more code is written by AI. But perhaps the most takeaway is that even as developers were completing tasks 19% more slowly when using AI, they thought they were going 20% faster. Many assessments of AI impact so far have been based on surveys or anecdotal reports, and here we have hard data showing that such results can be remarkably misleading. End quote for the weekend long reads this week I have two science stories. First from Quanta magazine, A new grand unified theory of what causes rogue waves Two weeks before Christmas, back in 1978, the massive cargo ship Mississippiuchin disappeared in the North Atlantic without a trace save for a few scattered lifeboats and flotation devices. The 261 meter West German vessel had encountered a storm, but nothing that should have overwhelmed such a modern and robust ship. Then came a brief distress call, and then silence. One lifeboat, found mangled and torn From a position 20 meters above sea level, suggested a powerful force had struck the ship with unimaginable intensity. Investigators were baffled. At the time. The idea that a single wave could inflict such damage was considered a myth. That changed on January 1, 1995, when the Draupner oil platform in the Norwegian north sea recorded a 26 meter wave using laser sensors. The sea that day averaged just under 12 meters. For the first time, a rogue wave, something long relegated to sailors folklore, had been scientifically documented. Suddenly, those old stories of rogue waves seem less like exaggeration and more like early warnings. Since then, scientists have explored two competing explanations for rogue waves. One is linear addition, where ordinary waves happen to overlap, stacking into a temporary giant. By pure chance, a version of oceanic dice rolls, if you will. The other is nonlinear focusing, where waves interact and transfer energy, leading to explosive growth. Both theories hold water, but each explains only part of the picture. Now a team of applied mathematicians may have found a breakthrough a unifying statistical framework built on large deviation theory, or ldt. Rather than arguing over which mechanism created a rogue wave, this approach predicts the most likely conditions for one to occur. Regardless of how it forms, LDT identifies identifies the rarest but most probable paths a chaotic system like the ocean might take to produce an extreme event. In lab simulations and real world data, this method has proven surprisingly accurate. It suggests that rogue waves don't just appear out of nowhere, they follow specific, identifiable patterns. The team hopes this could lead to a real time ocean scanning tool that warns ship captains of incoming anomalies, much like a weather alert. And then from scitech Daily I just really want this one to be true. Scientists have developed a 3D printed sponge like Aerogel that turns seawater into clean drinking water using only sunlight. The material contains microscopic vertical channels that efficiently evaporate water even at larger sizes. In outdoor tests, it produced drinkable water within hours without electricity or complex equipment. Made from carbon nanotubes and cellulose nanofibers, the Aeroge is lightweight, rigid and scalable. When placed over seawater and exposed to direct sunlight, it converts water into vapor which condenses into clean liquid. This breakthrough potentially offers a low cost sustainable alternative to traditional desalinization, potentially expanding access to freshwater in remote or resource limited areas. Cheap, easy Desalinization would be a big, big Deal for last week. Since it was a shortened week, I didn't release an omnibus episode on the Premium feed. So for this week I'm releasing a mega omnibus episode combining all the segments from the three days of last week and the five days of this week, nearly two hours in one shot to catch you up on everything that has happened in tech basically since the start of the month. As ever on the Premium feed which you can sign up for at Tech Supercast Tech, you can listen to this completely ad free as with every single daily episode, every single time. But also I'll release a version with ads for everyone else tomorrow again as a sampler. Imagine this sort of episode, but without any ads. Tech Supercast Tech Talk to you on Monday. Hi, I'm Richard Carnage and you may have seen me on TV talking about the world's number one expandable garden hose. The brand new Pocket hose Copperhead with Pocket Pivot is here and it's a total game changer. Plus your super light and ultra durable pocket hose Copperhead is backed with a 10 year warranty. What could be better than that? For a limited time you can get a free pocket pivot and their 10 pattern sprayer with the purchase of any size Copperhead hose. Just go to getcopperhead.com that's getcopperhead.com for your two free gifts with purchase getcopperhead.com.
