
Loading summary
Simon Willison
I call it my weird intern. I'll say to my wife Natalie sometime, hey, so I got my weird intern to do this. And that works, right? It's a good mental model for these things as well, because it's like having an intern who has read all of the documentation and memorized the documentation for every programming language and is a wild conspiracy theorist and sometimes comes up with absurd ideas and they're completely, massively overconfident. It's the intern that always believes that they're right. But it's an intern who you can. I hate to say, you can kind of bully them. You can be like, do it again. Do that again. No, that's wrong. No, that's wrong. And you don't have to feel guilty about it, which is great. Or one of my favorite prompts. One of my favorite prompts is you just say, do better, and it works. It's the craziest thing. It'll write some code, you say, do better, and it goes, oh, I'm sorry, I should. And then it will churn out better code, which is so stupid that that's how this technology works. Oh, yeah. But it's kind of fun.
Ronak
Welcome to the Software Misadventures podcast. We are your hosts, Ronak and Gwan. As engineers, we are interested in not just the technologies, but the people and the stories behind them. So on this show, we try to scratch our own edge by sitting down with engineers, founders, and investors to chat about their path, lessons they've learned, and, of course, the misadventures along the way.
Gwan
Simon, so you've been building tools for doing data analysis in the past few years, but you also started playing with LLMs before. It was cool. I think you started. I think you started writing about GPT3 like, two years ago, and I'm sure you had different expectations after you started playing with it. Has there any, you know, been any big surprises in the last two years?
Simon Willison
Big surprises? Last two years hasn't been a big surprise. Right? It's. The last two years have been completely wild. Yeah. So I started playing. I actually started playing with GPT2 back in 2020, which was the very early precursor, and it wasn't. There was clearly something there. I tried to use it to generate New York Times headlines for current affairs, based on the style of headlines from different decades. So I like Fed in New York Times, 1950s, 1960s, 1970s. And I mean, I didn't really get anywhere with it. I sort of abandoned that project. But it felt like there was something interesting, but certainly not sort of life shattering. And then GPT3 became available and I started really playing with that sort of 2021, 2022. And that thing was extraordinary because was this weird situation where the only way you could use it was either through the OpenAI, through their API or through their weird little playground interface. And so nobody was using it, right? Like I actually, like I put up a tutorial, here's how to use this thing. Because nobody was experimenting with it and because nobody else was using it, there was very little, there wasn't much information about what it could do. Like you sort of poke around with it. It was also GPT3 was a completion model, so you didn't get to like chat with it. You'd have to give it a sentence and then put a colon at the end and have it complete the sentence so you'd discover things. Like, one of the things that really clicked for me early on was the JQ programming language for manipulating JSON. I discovered that GPT3 could write that. So I could say, hey, here is a JSON document. The JQ program for extracting an array of names from this list of objects is colon and it would spit out working code. And that was a bit of a revelation because I could never remember the syntax for jq. And so I was poking around with it and it was increasingly clear that there were all sorts of things it could do that you wouldn't have expected something list to be able to do. But it never felt really like an AI. It wasn't like you were conversing with something. It was just this, this sort of very weird tool that could complete things if you prompted it in the right way. And then ChatGPT came along and, well, that was November 2022, right, November 2022. And all they did is they slapped the chat interface on top of their existing model effectively. Like they tweaked it and trained it a little bit more. But, you know, ChatGPT was an experimental prototype and a bunch of people inside of OpenAI thought it was a bad idea. But like, hey, this is a waste of time. GPT4 is coming, we should just hold off until then. They didn't expect it to take off at all. And it was, I think it's the fastest growing consumer application in the history of the world.
Ronak
It is.
Simon Willison
Which for a very obscure, weird thing, is sort astonishing, you know, but it was fun because this sort of, this rocket just took off. The entire world swiveled and started paying attention to this field. And then because you've got millions of People experimenting with trying things out. That's when we really started figuring out what it could do and what things it was good at and all of that. And so yeah, I've been documenting and exploring that for the past couple of years. I also had an advantage in that I've got a blog and most people just don't bother blogging anymore. They might like tweet or post on LinkedIn, but very few people are writing sort of long form content about what they're learning. But because I was doing that at AI, I very quickly sort of established myself as a person that you go to and talk about this stuff with. Which is great because then you get all of these people who are figuring things out. Talking to you directly and you can learn much faster.
Gwan
Is like having a blog sort of like an accountability mechanism to have you yourself, like go out and then find these sort of things that are maybe not working super well. So maybe back like, you know, GPT2 back in the days, just as like a new source of, you know, inspiration to write more posts.
Simon Willison
So that in the process, enormously. Yes. And that's actually, that's the thing I've been doing this year is I very quietly started a streak I'm trying to like, inspired by Duolingo and actually Tom Scott on YouTube did this 10 year streak of making a video once a week, which I found incredibly inspiring because wow, like, what a thing to manage to do. And so since January 1st, I've been trying to post something on my blog every single day and I've done that. And it means that I do have that little extra incentive to make sure I find something interesting. So that's been helping my blog. It's been an accountability mechanism for me for wider work for a few years now because I'm now sort of independent. I don't have an employer and so I started doing this thing I call weeknotes where once every two or three weeks I post a blog entry. Just saying here are the things that I've worked on in the past couple of weeks. And that means that when I'm thinking about what to do, occasionally I'll think, you know what, I haven't done anything I can write about yet, so I should really like invest in one of my open source projects or do something so I can actually, I've got something to show for it. And yeah, I love that. I think writing is thinking and it's such a great way of forcing you to structure your thinking. You know, the best way to learn something is to try and Explain it to somebody else. So if you've got a blog and even my shortest little like link blog things where it's like a link and two sentences of text, I always try and put something in there that's valuable. That partly it's like to prove that I read the thing that I'm linking to, but also it's like if you read the summary in my blog and read the article, do you get something slightly extra from my perspective on it? And it might just be saying, this is like, maybe I'll link it back to something else and say, hey, the Claude prompt caching they came out with a few days ago. And when I wrote about it, I linked back to Google Gemini, which has a similar feature. And I could compare how Google Gemini pricing works and how Claude pricing works. And that's a little bit of extra perspective that you won't get from Anthropic. They're not going to write about Google Gemini and their announcement of a feature. So it's that kind of thing. It's forcing you to engage with the material just a tiny bit more thoughtfully so that you can try and say something interesting about it as well as linking to it.
Ronak
So when it comes to blogging, I think you had this tweet at one point which was something like blogging is like planting a beautiful cactus. The best time to do it is 18 years ago. But the second best time to do it is today. I think, especially when it comes to LLMs today, when generating content has become way easier. Not necessarily good content, but there is just way more out there. How do you think about adding enough quality to the content where someone would actually read the post? The other part is also having an accountability mechanism to just do Interesting thing is one perspective on writing blog posts, but I'm also curious to hear, like, what are some of the other things that keep you going? Because after a point, it takes a lot of work to write blogs.
Simon Willison
Well, that's the secret of blogging is that it takes a lot of work at first. But I've been blogging for 22 years and you just get faster. You know, if you write every day, you get faster at writing. Most days I will spend 10 to 15 minutes on my blog and that's it. You know, it's like two links, maybe a quote. It's a very, very quick process to turn things around. I actually have a second blog, my til blog. Today I learned where the idea there is it should really be part of my main blog. It's partly to play with different technology that I'm running it as a separate site. But the idea of that is anytime you learn anything new, it's worth putting it out there and saying, hey, these are the things I learned. It's all it is. It's my personal notes, but very slightly cleaned up so that I can publish them. And actually as a result of this, habitually when I'm writing personal notes, I sort of write them well enough that I could copy and paste them into a public document, which is a good habit to be in anyway. But part of the reason I do the tils is that the most low pressure form of writing that there is because with a regular blog you feel like when you write something you have to say something new. You've got to add something new to the world. With a TIL blog, no, you don't. The barrier for writing on TIL is did you learn it, did you today or recently? And if it's like how to do A for Loop in Bash, that still counts, that's fine. Like I'm publishing it honestly, it's mainly for me. It's sort of my public notes I can go back and find if somebody googles for how do I write a for loop in Bash and they land on that document, that's great for them. It's also, I feel like I've got like 25 years of software engineering experience. I feel like it's important to. To outwardly demonstrate that when you've got 25 years of experience, it's still worth celebrating. Learning for Loops in Bash. You shouldn't get into that. There's that pattern people get into where they don't want to admit that they only just learned how to do something. Sort of ashamed that I didn't know how to do for Loops and Bash doesn't, you know, I like using my sort of my reputation to broadcast out that. No be proud of that. Right? You figured out For Loops and Bash fantastic. There's a million other things that still to learn about everything involving computers. Right. It's no biggie that you didn't know that already.
Ronak
So one thing that I struggle with always is I want to do this a whole lot more of. I have a blog which has four entries right now and maybe 10 in my notes, which I've never gotten to polish and publish and it always goes back to, well, okay, today I have maybe, let's say an hour I can either spend on writing this up, cleaning it up, or you know, I could just spend the time doing some work. So I struggle with that balance and I'm curious how you think about it.
Simon Willison
I've got a great trick for that. So the thing that, the way that I work, all of my work that I do, software work and a lot of my other stuff as well is in GitHub issues, right? It's free, it's got, you can have private issues, public issues and so forth. So every single one of my projects has a very active GitHub issues like set up. And I've got dozens of private repositories. I've got one just called to DOS that I use for personal stuff. And the thing I love about GitHub, basically the idea is that anytime I'm doing any project at all, I open an issue and I stick in a sentence at the top saying do this thing. And then most of the work that we do as software engineers, it turns out, is research. You have to gather so much information, solve for problem. You have to be like, okay, where am I going to do it? I'm going to do this file here needs modifying the tests for it live over here. I, I need to use this library. Here's some example code I found on Stack Overflow that solves this problem. I asked Claude a few questions and got these answers and so I will very quickly pepper in like 2 or 3 or 4 Reply Comments to my issue with the research that I've done and then I'll do the implementation. And it means that firstly, programmers often talk about how damaging it is to be interrupted, right? There's this idea that you carefully build up the context of everything that you need for your problem and then somebody taps you on the shoulder and asks you a question and. And it all comes tumbling down. It takes you half an hour to get back into it. The fix for that is to have very detailed notes. If you have written down everything as you were going along, I can be distracted, come back, read the last three issue comments and have everything back in place again. And that's amazing for productivity, but it also means that I'm maintaining over 250 active open source projects at the moment. And a lot of them are very small. They're like little command line tools or plugins for my projects or whatever. But they're all maintained in as much as if somebody reports a bug and I see their issue report in amongst all of my notifications, I will fix that bug and I'll ship a new release. And the only way to maintain 250 projects is to treat every single one of them like you're going to forget every detail of it like every project has to be as if it was somebody else's project that you occasionally drop into and maintain. The way to do that is with issues. Every project I have, every single design decision I ever made is in an issue comment somewhere in that repository so I can search through them. I can use git blame and say, okay, why did I add this code? It was in this commit. This commit is linked to this issue. This issue tells me what I was thinking at the time, what options I explored, all of that kind of thing. And so this is an enormous productivity boost. Like, it feels like writing all of these notes should slow you down. It's the opposite. It speeds you up. It means when you want to publish something, you've already written the rough outline of anything that you want to publish. Most of my tils are copied and pasted from my GitHub issue notes and then I'll clean up the wording a little bit and maybe add some formatting and that's it, it's done. So that's been enormously like. I gave a talk about this at Jaggercon a few years ago about increasing your productivity on personal projects through documentation and unit tests, the two things that people would expect would slow you down. Turns out if you put the right habits in place, having comprehensive documentation means you can work so much faster. I can drop back into a project I haven't touched in a year, read the documentation as if I didn't know what the project was, and then start working on it. That's fantastic. And the same thing with unit tests. If you've got tests, you can iterate so much faster because you get over that fear of accidentally breaking something, you make a change. Normally you'd have to manually test every single feature of the software to make sure it didn't break. If your test is doing that work for you, you can drop in, make a five line change, add a new test, run the test suite, and then publish it to pypi or ship a release of it just works, you know.
Gwan
That's super interesting. Do you also have design docs? I'm curious about having all these projects. Being able to kind of drop in if it's something that's not like, so to give blame. Super useful. Being able to search through the issues. That sounds super cool. Yeah. Like, what about like. Do you also write like design docs or like.
Simon Willison
Yes, but the issues are design docs. Yeah, absolutely. So the issues are the design documentation effectively. And the only because the problem with design documentation and well, all documentation has to be Kept up to date. If it falls out of sync with the code, then the big problem, people lose trust in it. I've worked at companies where we've had internal documentation and nobody uses it because they know that it's not being actively maintained. And so the way I see it, there are sort of two key forms of documentation. There's the documentation that has to be up to date, which tells you how the thing works or how to use the thing. So if you're writing software libraries, it's the documentation that tells you which functions to call. If you've got a web API, it's the one that tells you what the API endpoints are. Command line tools, these are the options and what they do that I keep in my repository with the code. So there's always a docs folder. It's got a bunch of markdown files in it. Anytime I update the code, I update the associated documentation. And if I'm collaborating with people, that's part of the pull request design process, the code review process. If you submit a pull request and it doesn't update the documentation, I'll either put in a note saying you need to update the documentation, or sometimes I will update the docs as part of that pull request. The idea being that the moment you land it on the main branch, it's got the test, it's got the implementation and the documentation all in a single commit. Because then when you use Git blame, the commit shows you the documentation change as well. But the other form of documentation is, I've been calling it temporal documentation. It's documentation that was true at a certain point, but isn't guaranteed to still be true today. And that's where issues shine, right? If I read an issue and it says 2017, January 5th, a bunch of stuff, I know that that's not promising to be up to date. So it's still useful because I can say, okay, well, in January of 2016 this was true, but it's not sort of ruining my trust in my docs. Because I look at it, I'm like, hey, is this still true anymore? I don't know. And yet. So very occasionally I will write design documentation that says, if you are a maintainer of this code, you should look here and here and here, and this is how it works. But I often don't do that. I sort of leave it to the issues. And the idea being that if you can spelunk through the code with Git blame and the issues, you can get that same information. You might have to put a Bit more work in. See, I don't think any of my projects have significant design docs at the like current architectural documentation right now and I might start adding it also, a lot of them are if it's a software library, the design documentation and the API documentation are kind of the same thing. The design is sort of presented through how the API is built.
Ronak
I was actually thinking this is a practice that would be useful for even teams at companies where what ends up happening, at least in my experience, is something new will come up that you need to implement. Someone will go do some research, try out a few things. So you see the code changes in the PRs, maybe the approach changes in the Google Docs, and mostly at least at my workplace, but they don't always end up linked together and one usually gets out of sync. But I think this idea of using issues to do that, where you can do that in the repository itself and the issue may link to the Google Doc that you have, which is easier for collaborating and commenting on, I think that would go a long way. So it's something I'm going to try actually.
Simon Willison
And you know what? I've got issue threads that are over 100 comments long and they're all me. It's just me talking to myself. I just realized that I Issues are a blog, right? An issue thread is basically a one off blog for the story of this change the story of this feature. One of the reasons I love issues so much, I used to write really long commit messages. Like I do six paragraphs and a commit message explaining what I was doing. I've stopped doing that now. What I do instead is if there's stuff that should be in documentation, I put it in the documentation and then include that in the commit. So it doesn't go in the commit message, it goes in the actual code. And secondly, it's every commit always links to an issue thread. Because the great thing about an issue thread is I can add comments to it. A year after the commit, you're running git blame, you see a commit, you click through to the issue thread and there might be a comment saying 12 months later it turned out this was a terrible idea. For these reasons also issues accept screenshots. So I can put screenshots of the feature. So if I'm doing CSS stuff, I always include screenshots of before and after. You can do animated GIFs or videos and issues. So I'll sometimes do a little GIF demo of the thing. Issues can link to each other. You can embed code in Them, they're a really rich canvas for all sorts of aspects of documentation. You can't put an image in a commit message, but you can put a screenshot in an issue. So yeah, so I'm definitely a GitHub issues power user.
Ronak
This is super helpful. Thanks for sharing the tricks. We'll actually link your talk in our show notes as well so that people can find it easily. By the way, one thing about the blog. So I was looking at your blog and bunch of educational posts where people can learn about how to do various things and you want to get into some of those. But I also saw that you have some of these posts linked on substack, so I was curious, how do you use one versus the other?
Simon Willison
That is a cutting trick that I came up with. I have a substack newsletter which I put out once every two or three weeks and all it is is the content on my blog. So since the last newsletter with maybe a sentence at the top with it, like maybe I'll add a tiny bit of text at the top. But basically I'm using Substack as a free mechanism to let people subscribe to my blog via email because I didn't want to pay to send emails and build all of that kind of stuff substack. It's great for that. I've got over 6,000 subscribers now on substack and it takes me about two minutes per newsletter to send it out because so it turns out substack, they don't have an API, but you can copy and paste stuff into your substack, like edit panel. And so I built myself a little tool which it's actually an observable notebook. But what it does is it pulls all of the content from my blog, reformat it into like HTML rich text, and then gives me a big copy button that I can click which puts all of that on my clipboard. And so I go to this notebook, I click copy, I I switch to substack, I hit paste, I set the title of the newsletter and I pick a preview image and that's it. I'm done. Literally two minutes to send that newsletter out because it's using copy paste as an API, which it turns out is a really powerful trick. There's loads of stuff you can do with software that thinks it doesn't have an API and you're like, yeah, but I can paste stuff into you. And that's been great. My only regret with the newsletter is I should have started doing it years ago because I've been doing it for about just a year and a half maybe. And it's brilliant. You know, it's a really great way of getting things out there to people who live in their email clients.
Ronak
So there is an argument of using either systems like Substacks or Medium or having your own personal blog. And the argument that I've heard to keep your own personal blog is that these platforms may or may not exist in the future, which has happened for many of these platforms. Is that the reason why you still have the personal blog and Substack is mostly just an email distribution service of sorts.
Simon Willison
That's one of the reasons. Yeah. I mean, one of the reasons I chose Substack is you can export your subscriber list. So if Substack ever say, hey, we're shutting down next week, I can pull out a CSV file with all of the email addresses in and I can move to something else. That's really important to me because yeah, vendors absolutely come and go. I've owned my domain name for again, 20, 20 odd years. And you know, it builds up SEO credibility and stuff over time, but also it's just having. There's something sort of wholesome about having a little corner of the Internet that's just for you like that. That's something I genuinely, I really enjoy. It feels a little bit subversive as well. In this day and age with all of these giant walled platforms and things, you're like, yeah, no, I'm, I've got a domain name and I'm running a web server and it's just fun. As a software engineer, it used to be like 10, 15 years ago everyone's intro to web development was building your own blog system. I don't think people do that anymore. And that's really sad because it's such a good project. You get to learn databases and HTML and URL design and all of these and SEO and all of these different skills and yeah, I mean my blog itself is running. It's a Django application because I helped create Django 20 odd years ago. So I want to have something in my life that's like a Django app that I'm building on and it's all open source, the code is on GitHub. Over the past six months I've started updating it a lot more like just making little tiny tweaks to it. I changed the default typeface that I'm using for, for headings like a couple of weeks ago and I started doing more things with images and it's just really nice. It's Nice being able to dive in and try out something new completely in that space. I run it on Heroku behind Cloudflare. The great thing about Cloudflare is if I get a giant spike of traffic, like if I'm linked off the Hacker News homepage, my tiny little cheap Heroku instance doesn't even notice because Cloudflare absorbs all of the traffic. That's great.
Ronak
I bet that helps. I think Elon Musk maybe linked one of your posts in a tweet at one point, so I'm assuming this would have helped.
Simon Willison
Yeah, I got 1.4 million hits on a page from that one. And yeah, without Cloudflare, I would have instantly melted.
Ronak
Well, by the way, I think that also resulted in your first ever TV appearance, right?
Simon Willison
That's right. It was last year. It was when Microsoft Bing came, added their chat thing and it was February. It was February last year. It turns out it was GPT4, which hadn't been released yet. So Microsoft Bing was our first glimpse of GPT4, and they hadn't quite figured out the personality. It was some early prompt engineering and it went completely wrong and it started threatening people and it tried to break up Kevin Roose from the New York Times. It tried to break up his marriage. Just joyfully bizarre. And yeah, so I know all these people, these posts on Reddit from people going, yeah, it just told me that it wanted to have me arrested and all of this kind of stuff. So I put up a blog post where I just collected together a bunch of examples of this. And yeah, and Elon Musk tweeted a link to it and it was on the Hacker News homepage. And yeah, I got interviewed on live TV news out of Chicago talking about, trying to get. Trying to reassure people that this thing wasn't going to steal the nuclear code, even though it said it was.
Ronak
This is no Terminator.
Simon Willison
Yeah, that was deeply entertaining.
Ronak
I can imagine. By the way, coming back to LLMs, and before we actually go there, you mentioned this tool that you use to move your blocks from your site to substack. Is this tool something open source that people can use, or this is something.
Simon Willison
Yes, well, it's an observable notebook. I think it's probably linked to from my back page. We can stick it in the show notes. It's a little Observable notebook. Right now it only works against my blog, so it's useful for me to create my own newsletter. But because Observable Notebooks is a. It's a platform where you can create basically Write a sort of interactive document using JavaScript, and those are. The code is visible in it, so you can dig through and see exactly how it's working. It's quite complicated because it actually pulls the content from a dataset instance. Like Dataset is my major open source project, which lets you build a JSON API on top of a CPU SQLite database. But my blog is running a Postgres database in Heroku. So there's actually a whole chain of things that make this work. Where I've got code I wrote that does a backup of my blog From Heroku into JSON and then it loads that into a SQLite database and then it publishes the SQLite database with dataset, which gives it a JSON API. And then my notebook can use fetch calls in JavaScript to run SQL queries against the JSON API to pull, pull in the content to assemble it into markdown, which it renders as HTML, which I copy out. It's beautiful. Like it's this giant convoluted chain of things that somehow works really well.
Ronak
It basically sounds like Unix, but in a notebook where you pipe the input.
Simon Willison
Oh, totally. And I'm big into the Unix philosophy is big in a lot of the work that I do. I love tools that just. You can pipe tools together, build ups. I said I've got 250 projects. That's because they're all little tiny things that you can then plug together to pipe things from one to another.
Ronak
So talking about the UNIX philosophy, I want to talk about the LLM CLI tool that you developed. And I actually recently came across it while researching for the episode and I found it to be super cool because a lot of times I don't want to leave the terminal that I'm on. I just want to ask the question there. So having something like this where you can choose the model you use or sometimes even run it just locally when you don't have WI fi, for example, that sounds super cool. Can you tell us a little more about this tool?
Simon Willison
Yeah, so this was. I built this last year. I started this project last year. The Idea was originally OpenAI were effectively the only interesting game in town for quite a while. You know, with GPT4, they had such a lead today. That's not true at all. There are a bunch of amazing competing models that I'm often using instead of OpenAI. But yeah, so my initial idea was the OpenAI API is kind of cool. I like hanging out in the terminal. It would be great if I had a way of not just Running prompts from the terminal, but also piping data to and from the model. Because the UNIX piping idea is always like, you get some content, you pipe it into another thing which transforms it, you pipe it back out again. That's all language models are. They're a function where you can give it some stuff, it does something and it gives you back, you give it input, it gives you output. And so the original idea was to build a little API client for OpenAI where you could basically say LLM space double quotes, how do you do a for loop in bash and then you hit enter and then it spits out the answer on your terminal. But you can also pipe things to it, so you can say cat, hello world, py pipe LLM and then give it an extra prompt saying explain what this code does or rewrite this in C or whatever. And I noticed that Nobody had the PyPi, the Python package index, nobody had reserved the LLM name on there yet. And I'm like, oh, I've got to have this. So I grabbed this beautiful three letter so pip install LLM is how you get it. And that was fun. And then a few months later I wanted to start playing with all of these new models, the Llama models that run in your laptop and so forth. And I'd already built plugin systems for other software that I've developed, so I thought, okay, what if I had plugins? So you could install a new plugin for this command line tool and now it can talk to Anthropics Claude, or it can talk to Google Gemini, or it can run LLAMA on your computer directly. And so I built that and now there's over 200 different models that this one command line tool can run if you install the right plugins for it. Other people have written plugins. The great thing about plugins is it's a way of building an open source project where you don't have to review people's code to, to add features to your thing. Like, I can wake up one morning and my software can do a new thing because somebody else released a plugin for it. It's amazing, right? It's the best form of open source contribution as well. Because if you write a plugin for my project, you're not asking any time of me to sort of review your code and interact with you, you're just putting it out there into the world. So yeah, so you can LLM install all of these different plugins for all of these different models. And the other big feature of LLM is that everything it does is logged to a. A SQLite database. So anytime you prompt it, the prompt is logged and the response is logged and it records which model was used. So you can actually use this for model research where you can run the same prompt against five different models and now you've logged all of the responses and you can go and compare them later on. I've got like 3,000 prompt response pairs that I've recorded just in my own local database from tinkering around with this thing, which is, I'll be honest, I don't go back and use it to go compare the models as much as I want to, but the data is there. You know, I've hoarded the data. At some point I can. And it's also just useful to be able to say, okay, show me the logs of my conversations, search those logs, export the log of this conversation and publish it somewhere. So, yeah, that's been super, super fun. It does. It's very distracting because it means that whenever a big new model comes out, I lose half a day to spinning up a new LLM plugin so that I can try it out. But it's been really. And it's super useful. Like, I use it several times a week. Most of my personal usage is still through the web interfaces. I use Claude AI and I use ChatGPT daily, like every single day for the last year and a half, basically. But yeah, having it on the command line as well, it gives you all of these other options. It's also just really fun for hacking things together. Like you could write a bash script that implements a retrieval augmented generation against something, like by scraping a web page with a curl command and then piping that to LLM and running against Llama and all of that kind of stuff. It's really, really fun. It.
Gwan
Like some of the guests that we've talked to, there's this sort of idea of, oh, you need to kind of build up this, what do you call it, this habit of like going to the LLM first before you, like, try anything else. I was curious. Yeah, like, what has it been like in this one and a half years for you for like, kind of using it daily? Did you, when you started, you have to kind of force yourself to be like, hey, you know, I need to. Even though I know the answer, I'm gonna still go to the LLM just to see how. How did that, like, evolve over time.
Simon Willison
That's interesting. I don't think. I've not been turning to LLMs for things. I already know the Answer to. But you know, I'm as a software engineer, every single day I run things that I don't know. And it might be a for loop in Bash. It might be something a lot more interesting and complicated than that. So I don't know. I mean the problem with LLMs is they're actually really difficult to use, which is very unintuitive. Everyone assumes that they're easy because it's a chatbot. You type things to it, it says things back to you. But to use them effectively, you need to build this really, really deep model of what they can and can't do. Like, I would never ask an LLM to count all of the instances of something in a paragraph because I know they can't count, which is totally non obvious. Right. It's a super sophisticated computer system. How can it not count? Computers are great at counting. That's like what they've been doing since we've invented them. I know that it can't look thing. I know that if I, I've got a question where if I, if I have a, like a sort of. If a friend of mine could read, read a Wikipedia page and then answer my question, then I know that the LLM will be able to answer that question. But if it's the kind of thing which the Wikipedia page probably isn't going to cover, it's less likely that the LLM will be able to answer it. But it's difficult because you really do just have to put the time in like you've got to spend. A friend of mine says it's 10 hours is the minimum you have to spend with GPT4 model before it really starts to click what these things even are and how to use them. And I think to develop that level of expertise where I can look at a prompt and I can 90% of the time I will predict correctly if it's going to work or not. Like I'd look at somebody's prompt and say, yeah, you're asking it to count things. That's not going to work. Or you're asking it about like quantum physics. But it's the kind of basic question that an undergrad like student in quantum physics would answer straight away. It'll definitely get that one right. Right. But having that intuition is, it takes a long, long time to build up and it's not transferable. Like I love teaching people to use this stuff, but I can't just dump my intuition into their head. I can't be like, boom, here you go. Now you'll be able to use these things effectively. Like, one of the lessons I think people need to learn as quickly as possible is you've got to run prompts where it gets the answer wrong in a really confident way. Like the earlier do that the better because otherwise you can go into this idea that it is this sort of like science fiction AI that knows everything. And so when I'm evaluating a new model, I always start with an ego prompt. I ask it about myself, I say provide a career outline for Simon Willison. And because I've been blogging for like 20 odd years, it knows a lot about me, right? There's a lot of stuff that ends up in the training data, but it still makes it like they often say that I'm the CTO of GitHub. I have never worked for GitHub, right? Or I had one tell me the other day that I'd been to, oh, a university that I hadn't been to. Like those kinds of mistakes. So generally if you know somebody who's sort of Internet famous, right, they're not like a celebrity, but they've been around on the Internet for long enough that there's stuff about them in the training data. Asking questions about them very quickly exposes that these things are not knowledgeable, that they're spitting out statistically likely text from their training data. And that's so important it's crazy to me how to get the best results out of these things. You need to have expertise in what they can do, so experience using them, you need to have a bit of expertise in how they work. You don't need to understand the matrix multiplication and the key value pair and all of that kind of stuff. But you do have to understand that they come from training data. They're doing next token prediction. You need to have that sort of basic level and you have to be a subject expert in what you're doing with them, right? Like as an experienced software engineer, I can do amazing software engineering with an LLM because I've got that expertise in what kind of questions to ask. I can spot when it makes mistakes very quickly. I know how to test the things it's giving me. Occasionally I'll ask it legal questions like I'll paste in the terms of service and say, hey, is there anything in here that looks a bit dodgy? I know for a fact that that's a terrible idea because I have no legal knowledge, right? So I'm sort of like play acting with it and nodding along. But I would never make a life altering decision based on legal advice. For an LLM that I got because I'm not a lawyer, if I was a lawyer, I'd use them all the time because I'd be able to fall back on my actual expertise to sort of like make sure that I'm using them responsibly.
Ronak
I can attest to that one part where if you search for Internet famous people, these things will, or LLMs will very confidently tell you stuff, which is not true. I've experienced that in doing research for a lot of our episodes, including this one where I saw this like, oh, my bad. And what I've started doing now is I would actually tell it, give me the source information. And at least, I mean, I use ChatGPT more than anything else. And I would actually say, give me the source for this information. And surprisingly, when it comes to fetching information from certain podcast transcripts, it's decent at doing that, but it's horrible at attribution because either transcripts are faulty or it just doesn't know who said what. And the other thing is it'll actually start sourcing things which look like links, but they're not clickable if you search for that exact string.
Simon Willison
Favorite bug. Yeah, my, I wrote, I wrote something about this last year because chat, before chatgpt had browsing mode. It would do that all the time. It was amazing. It would just hallucinate these URLs. And one thing that you could do that's really fun is you could give it a URL and say summarize this, this article. And even though it couldn't access the web back then, it would still produce a summary. And so you could do things like you could make a viewer like Wired.com/Taylor Swift gets into cryptocurrency, which is a made up URL. It's a 404 page. And then you paste that in and it would confidently write a story as if it was a Wired story about that happening. Like, just utterly like Claude. Now, because anthropics Claude can't access the web, they do at least have a little inline hint that shows up and says, by the way, I can't access the web. But yeah, it's, that's. That was a great one because people got so confused by that one. There were people who were absolutely convinced that ChatGPT could summarize web pages because they'd seen it do it dozens of times. And you're thinking, wow, you've probably spent the last two months like consuming summaries of web pages that were entirely made up and you do not want to admit to yourself that you've got two months of crap, that it's fascinating, right? There's so many traps in all of this stuff.
Ronak
And interestingly, I think perhaps you mentioned it in one of your talks, that LLM interface is kind of interesting because it's just a simplified interface where you just get dropped into this chatbot or chat box and you kind of have to discover the capabilities as well as limitations of the system. Like you can't find things out that it's capable of doing.
Simon Willison
It's like taking a brand new computer user and dumping them in a Linux machine with the Linux prompt as. There you go, figure it out, right? It's a joke. It's an absolute joke that we've got this incredibly sophisticated software and we've given it a command line interface and launched it to 100 million people. What were we thinking? Yeah. One of the things I'm most excited about is alternative interfaces to these systems, which we're beginning to see some really interesting stuff starting to crop up there. But I mean, the chat interface, it is really powerful and useful, but it's such a bad way to onboard people. And they've, they've nodded like, at least now these systems, they'll at least give you a few ideas. They were like, why not try to get it to cheat on your homework or whatever. But come on, you know, we could, we could do so much better. Oh.
Ronak
So I gave, I introduced ChatGPT to my mom. She doesn't speak English, but recently she wanted to send a message to someone and she was asking me to help her format things a little bit. Format meaning like she had a rough draft and she's like, help me improve it. And I was like, you know what would be amazing at it? ChatGPT. So I just gave her the phone and I just speak into it, forget about typing. And she's like, but what do I say? So just looking at that microphone prompt, for example, like she had no idea what she could do, but once she got started, oh boy. She won't charge a video on her phone now for all sorts of things she wants to do.
Simon Willison
Honestly, people who don't, who don't speak English, who have English as a second language, this stuff is incredible, right? Absolutely amazing. And that's something like, I feel like that's something that people often like. There are lots of people who are very cynical about this technology and there are a lot of reasons to, There are a lot of like reasons to be concerned about it. I feel like taking like we live in a society where if you have really good spoken and written English, it puts you. So it's such an advantage. Like you've got a problem with like the street light outside your house is broken and you need to write a letter to the council to get it fixed. That used to be a significant barrier. It's not anymore. ChatGPT, if you get it, to write a formal letter to the council complaining about broken streetlight. Flawless, absolutely flawless. And you can prompt it in any language. And I'm so excited about that. I feel like the. And it also interesting. It sort of breaks aspects of society as well because we've been using written English skills as a filter for so many different things. Like if you want to get into university, you have to write one of those like, like a formal letter and all of that kind of stuff, which used to just. It used to keep people out. Now it doesn't anymore, which I think is thrilling. But at the same time, if you've got institutions that are designed around the idea that you can evaluate everyone and filter them based on written essays, and now you can't, we've got to redesign those institutions. That's going to take a while. What does that even look like? It's so disruptive to society in all of these different ways.
Gwan
I think this is like a nice plug that I saw on another podcast. You mentioned that, you know, the, the thing that I want to spend my life doing is helping people make the most use of these computers. And we want people to be able to automate their lives like going along, right?
Simon Willison
This is what computers are for, right? Computers supposed to automate tedious things in our lives, right? And if you are a programmer, you can do that, right? If you've got a software engineering degree, there are so many problems in life that you can automate away. The vast majority of people can't do that, right? They didn't spend two years getting a software engineering degree, which means that they frequently end up having to spend all day copying and pasting things. I actually, last year I was at an event where I encountered. I heard from a fire chief, right? The guy who runs the fire station who had just spent the last day and a half copy and paste, copying and pasting names and phone numbers from one CRM system into another CRM system because it needed to be done. And I'm like, this is. How are we taking people with like jobs of that, that much importance and leaving them so that they have to do this kind of manual copying and pasting because Computers are really, really frustrating to use and there's no easy way to do that. If that guy had a computer science degree, he could have automated the export from the CRM system to the other CRM system and saved a day and a half of work. And that's the thing which it feels and like there's this idea of end user programming. For years we've been wanting to solve it so that users can actually like program computers without spending six months learning how to do it. Like Apple, like HyperCard and AppleScript and Microsoft. Excel is probably the best version of this, right? So many people are programmers every day using Excel and they don't think of themselves as programmers. But honestly, if you can use Excel, if you can spin up formulas and stuff, that's programming, that's software. You are building software and automating things. I feel like language models could be the key to unlocking this. Like we're just beginning to see little hints of it. ChatGPT code interpreter and Claude artifacts are two of the, the most exciting things in the AI space. And I am seeing, I continually hear from people who, firstly, people who really are using these tools on a daily basis, who've never programmed before, but now they can, they can do stuff. The other thing that's exciting is I talk to people who tried to learn to program in the past and they couldn't. They didn't get over that initial six months of misery where you forget a semicolon and you get an obscure error message and you get stuck for two hours and a lot of people give up. They're like I, they assume they're not smart enough to learn to program. And that's not the case. It's that they were. Nobody warned them how tedious and frustrating it was. They weren't patient enough to get over that miserable initial learning curve. Those people, a lot of them are learning to program now because if you get that semicolon error and paste it into ChatGPT, it tells you the fix. So it's like having a teaching Assistant on hand 24 hours a day who you can call over and they go, yeah, you put the semicolon there. Amazing, right? Absolutely amazing. I was talking to somebody just the other day who had, who's a very experienced professional in their own field, and they've spent the last two months programming and really enjoying it, having tried and failed to learn a dozen times because they've got this new assistant that can help them. That's amazing, right? That's. As a professional programmer, there's a Little tiny aspect where you're like, okay, does this mean that our jobs are all going to dry up? I don't think the jobs dry up. I think more companies start commissioning custom software because the cost of developing custom software goes down, which I think increases the demand for engineers who know what they're doing. But I'm not an economist. Maybe this is the death knell for six figure programmer salaries and we're going to end up working for peanuts. I don't know.
Ronak
I guess we'll find out. So there's a lot to unpack there. I want to take a couple of directions, but before we go forward, there's one thing you mentioned when you were talking about the LLM being the interface for talking to these models. So I wanted to read one of your tweets where you're asking a question on Twitter or X what are the LLM driven products that people use which don't have this chat interface? I'm sure you would have gotten fascinating answers, but I'm actually curious, what are some tools that you use which don't have this chat interface on top of but are built with LLMs?
Simon Willison
That's a really good question. The most obvious one, GitHub Copilot was the first mainstream non chat based and actually GitHub Copilot predates ChatGPT. That was a thing before ChatGPT came along. And that interface, the gray text which you get to approve seems so simple and obvious now. They iterated on that a lot. Like the team that built GitHub Copilot, they were the first to figure out how you do LLM integration into IDEs. They put a heck of a lot of research and work into that. They came up with something. It's one of those things where a lot of the really obvious ideas weren't obvious at all until somebody did the work to get there. So GitHub Copilot is my favorite example. I'll be honest, on a day to day basis I'm still using. I'm not using anything that's not chat driven that I can think of. But I do use the alternative inputs a lot. I use the voicemail mode on ChatGPT and I've been playing with the Google Gemini one a lot. Like I can go on a walk with my dog with AirPods in and I can write code walking my dog because I get ChatGPT to do it over the audio thing. It's amazing, right? So I use that images. I love image inputs. I feel like image inputs are actually still quite new GPT4 vision was announced in November last year, so we've only had. And these days all of the models have amazing image inputs. But that's still like, not that it's still quite a new capability. So I will drop in screenshots of like a rough mock up of a thing and get it to do HTML and css. I'll drop in screenshots of error messages, all of that sort of stuff. The coolest demo still that I've seen of alternative UI is the TLDraw guys. The TL draw team did this thing called make it Real with where you've got this browser based vector editing software so you can draw boxes and lines and add text. And they added a feature where you can then select a mock up and click make it Real and it sends a screenshot of that to GPT4, gets back like tailwind, HTML, CSS, JavaScript and it pops in a working version of the thing and you can literally like you can draw a calculator like a Fahrenheit to Centigrade Celsius calculator. Just draw in the boxes, put CF and a calculate button and you don't even tell it what's supposed to happen. You say make it real and it guesses, oh, I bet clicking that button should calculate that from Fahrenheit into Celsius and update the two boxes. It's extraordinary, absolutely extraordinary. And that feels like there's so much more to be explored around that. This idea of okay, so we've got an interface that lets you draw something and we can pipe that through an LLM and turn it into words, working code, that kind of stuff. In my own work I've been experimenting with the dataset is software for data analysis. It loads. You have a SQLite database full of data and it gives you a UI for exploring it, adjacent API for running queries against it, that kind of thing. So I've got one plugin for it which is a Ask a question in English and have that one's using Claude Haiku at the moment. Have that turn into a SQL query and then it'll run that SQL query and. And then a lot of people who build those systems give you the answer straight away. So you'll say how many records were in California? And it'll say 230 records were in California. That I think is a bad idea because in my experiments with it, it gets the right answer four out of five times, but one in five times it'll like do where state equals ca. But in the data it was where state equals California. So it gets zero results, right? And that's disaster, right? You've just given somebody the wrong answer to their question. So instead I'm redirecting them to the SQL query page. So you at least see this. So if you're SQL literate, you can look at that and go, oh, it's search for ca, not California. I'll fix that. If you're not SQL literate, it's not great. I'm trying to figure out, okay, do I do a human explanation of the query? Should I show like a join diagram? What are the other things that I
Ronak
can do to try.
Simon Willison
Try and make this more obvious. I like the idea of showing you're working with these systems. But yeah, so that's one of my experiments, is ask a question that gets into a SQL query, you get shown the SQL query, all of that kind of stuff. There's so many more things like that that we can be experimenting with.
Ronak
So that approach is fascinating. I think in this case, the way you are at least building this application is assisting people right at the place where they would ask the system a question. So, like, what typically happens when I am interacting with SQL databases I use dBeaver, for example, to connect with some of our internal MySQL tables. I used to be really good at SQL a few years back. I haven't written SQL in at least the last four years. I can write simple queries, but when it comes to doing things beyond joins, where you need like a bunch of unions and join the other things and so on and so forth, I'm like, I can do that, but I'm lazy. So I would go to something like ChatGPT, give it a simple prompt and say, give me the answer and then I run it. So I like the way you describe what you're building because in this case, in the same prompt, a user can say, I want to do this. It's like, you see, it's kind of a debug log of sorts. You see what it generates.
Simon Willison
Exactly.
Ronak
You click it right there also.
Simon Willison
So I use language models for SQL queries just all the time because they're so good at SQL. Like they're really, really good at sort of advanced SQL queries, all of that kind of thing. Problem is, you have to copy and paste the schema in first. You've got to give it the schema so that it knows what to do. And that's. And I'll do that. But again, when I'm building it myself, invisible to the user Is I'm sending the schema. I can actually I've also started experimenting with sending example rows. Like the thing where the state column might be CA and it might be California, send three example rows and the language model cottons on. It's like, okay, I should search for Florida because I know that it's full state names in this column. So yeah, tricks like that are super important. I feel like generally, if you're a developer working with these models, it's all about the context, right? What matters is it's all about the prompt. And the most interesting thing about the prompt is that you can slap in a full copy of the SQL schema, five examples of queries that have run in the past, those kinds of things. That gets really interesting. I'm a big fan of the term prompt engineering, which is a term that a lot of people make fun of. A lot of people are like, come on, it's chatting to a chatbot. How is that engineering? But I feel like those people are missing the craft of this thing. Forget about chatbots. For me, prompt engineering is about figuring out, yeah, okay, for a SQL thing we need to send the full schema and we send these three examples and these three responses, we need to prompt it in this specific way. That's, that's engineering. It is engineering. It's complicated. It's very. The hardest part of prompt engineering is evaluating. It's figuring out, okay, of these two prompts, which one is better. I still don't have a great way of doing that myself. Like that to me is the, the people who are doing the most sophisticated development on top of LLMs are all about evals. They've got really sophisticated ways of evaluating their prompts. I aspire to, to get to that point. Like I'm still trying to figure out the best way to do that.
Gwan
Yeah, reading your post was really helpful. Like I love how you include. It's like, hey, this is my first prompt. This was like the code that you split it out, you spit it out and then say, this is like how I changed it, right? Like, I feel like that modification process is exactly the most important.
Simon Willison
That's super important. Like as an end user of an LLM, it's to going all about the follow up prompts. Like a lot of people who are disappointed in LLMs will stick in a single prompt, say, write me code that does this and it'll spit out a bunch of code and they'll look at, go, well that was crap. And sure it was crap. So now you tell it, you say, refactor that to not or write some tests for that or this doesn't work or you paste in an error message and that's all of the work. The substantive work that I do with these things ends up being like 20 or like actually to be honest. Often I'll get that with two or three follow ups, but sometimes you go longer than that. So I will always try. I love sharing my prompts because these things are so hard to use. I feel like it's beneficial to show people what you did and so I'll Very frequently I'll share chatgpt transcripts. I built my own tools for sharing Claude transcripts because they don't have a good full transcript sharing thing. My LLM tool makes it easy to pipe out the logs into markdown format. I I paste Those into a GitHub gist and then share that. A little habit I've got is that when I'm sharing these things I like to put them in private gists because GitHub private gists aren't indexed by search engines but you can link to them. So it's a way of avoiding polluting the Internet with giant mounds of LLM generated text, but still giving people links they can go and see. It's just a little habit that I've got.
Gwan
Thanks.
Ronak
By the way, are there any prompt engineering resources that you found to be useful?
Simon Willison
1. Just one. The Claude documentation Anthropic are the only team who have really invested in good documentation on how to prompt their models. There are, I mean there are a million sources of like millions of people on Twitter will tweet like crazy prompting tricks like your grandmother and all that kind of stuff. And honestly, some of those are good tips. The problem is filtering through them. So if you want to read something which is reliable, like I trust the Anthropic prompting guide a lot. That's not to say there aren't other good prompting guides out there, but that's the one that if you want one resource, that's the one I send people to.
Ronak
And you were describing data set and using LLMs to power some of the features. So for folks who haven't been paying attention, I want to say there's been a theme of SQLite and a bunch of things that you do with your blog with LLM, the CLI tool with Dataset as well. And you've built a lot of data analysis tools, I mean worked on them over the last few years. How are you thinking about this integration? Because at least when I first just learned about LLMs and I thought, well, having them answer random questions is cool, but I want them to do things on either my data or the context that I provide. And this idea of context was bizarre. At least it didn't make sense to me very initially. I thought you always had to just fine tune things on top.
Simon Willison
So.
Ronak
And I was discussing some of the ideas with my wife and she was like, well, you're not thinking about, she works on some of the LLM stuff. So she's like, you're not thinking about it in an LLM first way. That's just not how you build applications on top. A lot of it is just prompting to build stuff on top. So I'm curious, when you're thinking about building some of the features and dataset, how do you go about building these features? And is that different from doing traditional software engineering where you rely more heavily on prompts than APIs, for example?
Simon Willison
Yeah, I mean, this is like as a software engineer, LLMs are incredibly frustrating because they are non deterministic, right? You give them, you tell them to do something and there is no guarantee that if you say the same thing twice, you'll get the same answer back. Even if you fix the seed and turn the temperature down, you still might get slight differences. Unit testing. How do you unit test something which, which has a random, like a random number generator almost built into what it spits out? Really frustrating and difficult. It's working with the computer that sometimes just straight up says no, right? It might refuse to do a thing that's really difficult because the sort of larger theme of my work is around data journalism, this idea of helping journalists analyze data and find stories in it. Dataset was originally designed for data journalists. It turns out it's applicable way outside of that field as well. But that's always been the sort of framing that I hold for this. And a challenge that journalists have is that if you're a journalist, some of the source material you work with is nasty, right? It's police reports about violent incidents, it's fascist message boards, all of this kind of stuff, right? Now, if you've got an LLM that's helping process these things and you like ask it to summarize the themes from this fascist notice board, it's going to say no, right? A lot of the LLMs will just straight up refuse to process that. Which as a journalist kind of makes them, it doesn't make them useless, but it greatly limits how useful they can be in all sorts of different things. Like if you analyze 10,000 documents and 9,999 of them it does analyze and one of them it rejects. Maybe there was something important in the one that it rejected. This is very frustrating, but yeah. So working with things that sometimes say no is really confusing. It means that you always have to keep the human in the loop. Like, I feel like anytime you have an LLM doing something for you and then the result of that is used for something and there's at no point could anyone spot if something had gone wrong that's going to almost certainly lead you into difficulties. But then there are things they are good at. My favorite application of LLMs in journalism, and I'm getting the impression this is one of the most important business applications generally is this idea of structured data extraction. So you've got a document that's just typed up or even handwritten, and you need to pull out who are the people and what dates are there and like that, and what are the job titles. They are so good at this. So good at this. And that's like data entry is one of the most frustrating aspects of anything involving computers. Data analysis. Like journalists often need to do data entry on thousands of documents, but they can't do that. They haven't got the person power to go ahead and actually do all of that work, giving them access to an LLM that can do that data entry. And with data entry, if the LLM gets it 95% right, that's probably what you'd have got if you got a room full of interns doing the same data entry. You know, like the accuracy is. It's not perfect, which is unfortunate. But a lot of these things not being completely perfect is still incredibly valuable. So one of the. So the AI features I built for data set, there's actually three. There's the one I talked about, the ask a question, get back a SQL query. There's one called Dataset Extract. And the idea there is that you can define a SQLite table. So you can say, I want a table with restaurant name, restaurant address, number of Michelin stars. And then you paste in the copy of an article that talks about new restaurants, Michelin stars, and it will populate the database for you. That works so well. That's like absolutely fantastically effective.
Ronak
I have a question on that. So in this case, a user is just sharing two things. One is here's a document that talks about Michelin star restaurants and kind of prompting the system to say do X. But what's happening behind the scenes is a little more than that. So what are the. Maybe I'm Using the word wrong. But system prompts that you then add to what the user provides to make it do the right thing.
Simon Willison
I'm going to have to look. I will look that up right now because I can't remember. I think it's a very short one. Let's see. I don't know if I'm even using a prompt for that one because I'm using the structured Data. Like with OpenAI, you can give it a structured. You can give it a schema, effectively a JSON schema. Oh no, I get the user to provide additional prompt. So when you're doing this, you can put in the next prompt that says only include restaurants that are at least two Michelin stars. Like for example. And then I have a tiny. I say extract data matching this schema. And then I give it the schema in terms of, you know, there's name, string, or it's an array of objects and each object has a string called name and a string called location and an integer called number of stars. That's the whole thing. Like we can stick it in shell. It's very, very simple because this is so sort of fundamental to what these things can do. But yeah, and then I let my users add additional, like prompt instructions if they need to. One, something I use a lot is when you're extracting the date, format it as year, year, year, year, month, month, day, day. Little, little clues like that. That's it. It's spectacularly powerful for how simple the underlying system is. That's actually a good example of a UI for these models that isn't just a chat UI as well. It's a paste in some text or it accepts images as well. You can like drop in an image, give it a. The schema you select from like you sort of type in the name and you select text from a dropdown. Then type in a name and select integer from a dropdown. That's effectively it. And it works. It works really well. And then my third feature is I have a feature where you can basically run a prompt against every row in your database table. So you might have a table with 100 restaurants in and you can say, enrich this data from. For each of these hundred rows, write a haiku about this restaurant and stick it in the haiku column. Haikus come up a lot for this stuff. And that's it. It works. So yeah, those are the three things, the very, very sort of early steps in what's possible with this. But yeah, the applications to data analysis, data cleaning, finding stories in data Almost overwhelming how much potential there is there.
Ronak
So I want to try something today. One of the things that we do before we record a podcast is we research about the guest to educate ourselves to inform the conversation as well. And Guang has been building an amazing tool that helps us collect a lot of this information. Guang can describe more of what that does. But I'm curious if you've been exploring LLMs a lot. So I'm curious to get here input on this. If our goal is to given an Internet famous person, we want to know more about what they've done in the recent past or let's say over the years. And we want to get notes for where the conversation could go and obviously we want to dig more into it. How would you go about doing something like this with LLMs?
Simon Willison
So for this particular thing, the one thing I would not rely on is them doing the research, them knowing about because like we said earlier, for people who are Internet famous, it will make stuff up all the time. What's way more interesting, it's find reliable information and dump it into the LLM. So go and like grab their RSS feed from their blog or all of their recent tweets, which is harder now because Twitter doesn't really have an API you can use. Really frustrating, but yeah, or transcripts from other podcast episodes that they've been in. Anything like that. And then what I'd do, I'd use Google Gemini because Google Gemini's signature feature is that it's got a 1 million token or even 2 million token context which like Claude and OpenAI cap out about 200,000. So it's like five times the amount of stuff that you can pipe into it. Plus Gemini can accept audio clips which I haven't really played with very much yet. It accepts video. So what I would do is I'd experiment with audio and video. But out of interest, I wouldn't necessarily trust those to be the most effective of doing it. I basically try and gather as many tokens about that person as possible. So copy and paste crap out there, Wikipedia bios and anything they've written, all of that kind of stuff. Copy and paste all of that into Google Gemini and then prompt it with we are interviewing this person. What are some themes that we should do? I think that would work amazingly well. I think you'd probably, as long as you, as long as you're feeding it the source data so that you know that the source data again don't even trust it to go and read web pages because who knows what it's going to do. But copy and paste is the best API, right? Copy and paste. Copy and paste half a million tokens of information about that person in I am certain you'd get good results out of that.
Ronak
I'm going to give that a shot
Simon Willison
that feels like work really well. The prompting trick that I use a lot is, especially with these longer context things, is I always prompt and say identify core themes for topics we should talk about, illustrate each one. For each one, provide two illustrative quotes from the source material. So then it'll say you should talk to Simon about his LLM tool. Simon said quote LLM is my something tool for something something something. Partly as a fact checking mechanism because then you can take the quotes it gave you and you could search in the source material and see if it made them up. In my experience, it doesn't make those up if you ask for direct quotes. It might even like fix the punctuation or something. But I can't remember having asked it for direct quotes where it did completely invent a quote which is useful. It's not so it wouldn't do it, but it's a good trick.
Ronak
That's super helpful. Thanks for sharing that. And I wanted to talk a little bit more about LLM enhanced development of sorts. So I like this code that you had in one of your talks where you said LLMs kind of make you more ambitious. And the way you go about thinking about technology or any new technology is how it makes things possible which are impossible before, or how it makes you faster or build things faster. Given all of this that's going on, can you share a few examples of where LLMs have made you more ambitious? Or have you tried things which you wouldn't otherwise? Some of the recent examples that you're most so many.
Simon Willison
Yeah, I mean so many. This is the thing is that so as a software engineer, when I'm building a project, I like to have confidence that I've got most of what I need to build that thing. Right. If I'm going to have to like learn Objective C from scratch to do a project, I can't necessarily justify investing the time. I will try. I will find a different project to do. LLMs have kind of changed that equation for me. My earliest example of this is I've had a Mac for 20 years. I've never learned AppleScript because AppleScript is a weird, weird programming language. Like I've heard AppleScript described as the world's only It's a read only programming language. If somebody shows you Some applescript, you can go, oh, I get what that does. And then you sit down to write it yourself and you have literally no idea what you would do to make it do anything useful. ChatGPT it turns out so good at AppleScript, right? It knows AppleScript. The thing I wanted to build is I wanted to export all of my Apple notes into like a plain text format and I asked for the Apple script to do it and it knocked out six lines that loop through every Apple note and for each one output the title and the body. And I ended up writing a little Python program on top of that that like embedded Apple Scene script in a Python program. And now I've got a command line tool that can export my notes to a SQLite database. That project was impossible. It was impossible for me to build that previously because I would have had to spend realistically, probably a solid week getting my head around AppleScript, which is not a well documented language either. And instead of that full week, I got a working prototype in five minutes that proved to me that the thing I wanted to build could be done. And once you've got like, my style of development is all about research and prototypes. Like you build a prototype to prove that the thing is possible and to fill in those gaps in your knowledge about what you need to know. And then writing the software. And it's easy once you've, once you've figured out the Apple script, you need to get the notes out, whatever it is. So that was an early example and that just keeps on going. I have production code written in Go right now. Despite if you asked me for a for loop in Go, I would have to go and look it up. Like I'm not fluent in go. But the code that I wrote in Go with the help of, I think that was Claude three Opus I used for that one. It's fully unit tested, it's got continuous integration. So when I commit to GitHub, it runs the test, it has continuous deployment, right. If the test pass, it deploys the thing, all of these things which I see as essential for production grade software. And I feel good about it, like despite the fact that I could not sit down and write it off the top, top of my head, I know that when I go and look at that code, it's good code, it's well tested, I've thought about the edge cases and it's been running in production for six months and serving quite a decent volume of traffic. That's really cool, like being able to, I no longer look at a problem and Think, well, ideally I'd use Go for this, but I don't know, Go. So I'm going to just cross that off the list. Just the other day, what was the thing I was working on recently? I built a little Django application that was a. It's like a webhooks debugging application. When you're working with webhooks, the thing you really want is just set up an endpoint that logs everything and then you tell Stripe, hit my endpoint and you get logs in your database showing what it sent you, and then you can figure things out from there. And I've always wanted a Django app for doing this, but it would take like a day to build that and I couldn't quite justify spending a day on it. I got Claude 3.5 sonnet to write the entire thing and it took two hours. From Idea to having deployed working software with unit tests in production, that was solving this problem for me. And it's a great example of a project where I could just about justify two hours on that problem. I couldn't justify any longer than that. Like I should just use something off the shelf at that point. So, yeah, time and time again I'm built. All of these little projects would not exist without LLMs. Not because I couldn't build them, but because I couldn't build them fast enough to justify the effort.
Ronak
So this is fascinating because one thing that we see LLMs being really good at is code. Because you can test it, you can verify it. Text pros not as much, because hard to verify in a lot of these projects. What does your typical workflow look like? So you mentioned you sometimes have ChatGPT write code while you're walking your dog, which is amazing for something like this where you spend, let's say, a couple hours. So can you walk us through what that looks like from prompting to actually getting the thing in production?
Simon Willison
I mean, it definitely varies with. Because there are two types. There are the projects where I know it's possible already, like building a webhooks endpoint for Django. I know that's possible. I could absolutely just sit down and write that. So that doesn't need the exploratory prototype, whereas there are other projects like exporting my Apple notes. The number one question is, can I even do this? And so if it's got those unknowns, that's when I'll jump straight into a prototype. And that's normally just have an idea prompt an LLM a few times, say, hey, can you write me? Oh, A great tip with LLM is always ask for options. So I'll say things like, what are my options for exporting Apple notes? And it might say, you could do this, or you could use AppleScript, or you could do this, or you could do this. That's the best way to work with them, because if you ask for one option, if you ask it a question, they'll give you an answer, and if you're lucky, it'll be a good answer. But maybe it's not ideal if you ask it for options, one of those four or five options is almost always the best thing and you're better equipped to evaluate than it is because I mean, it's just a random number generator essentially, but you know, it can spit out the so I'll often start with, okay, what are my options for solving this problem? Sometimes I'll say write me the code for option three and I'll do that in Normally I'll have it Write it in JavaScript or Python, because those are my two daily driver programming languages. Occasionally I'll try it in some. Often I'll try it in Bash if it's something I can use on the terminal, that kind of thing. So if there's a prototyping phase, I'll be using the LLMs as part of that prototyping to answer those questions. The moment it turns into a project I'm actually going to try and commit to, I start a GitHub issue for it. And sometimes I'll start a GitHub issue just for the research, like maybe in my private notes, like figure out if I can export Apple notes and I'll just copy and paste things that I learned along the way if it's going to turn into actual software. Most of the software I build is Python, and it's Python packages that I can publish to the Python packaging index. And those come in basically three shapes. They're either a Python library that I'm going to import and use. It's a Python CLI tool. So something where I type LLM space, whatever, or it's a plugin for one of my other projects where I install it into dataset and it adds new functionality. I've got cookie cutter templates for all three of those. So cookie cutter is this great little Python tool that will spin up the directory structure and the readme and the setup PY or pyproject, toml, all of that kind of junk based on a few questions that it asks you. So I've got three public open source cookie cutter templates they use to get me started on that. Those set up the initial file structure, they set up the GitHub Actions workflows for testing. They set up the workflow for publishing the package to PyPi. So if I've picked a name for it, I can write a bunch of code, push it to GitHub, click a button in Git or I post a release on GitHub and that will be published to PyPi. So that entire workflow of writing the code, testing the code, documenting the code, publishing the code is all automated for the most part, which is a huge productivity boost. Like I can, I've got like command line tools that I published to PYPI where I had them live on the package index within an hour of the idea of the tool. That that's something and that's because I've done it 250 times now. So you've got the automation in place. You've done, you've just. It's just a very, very quick habit. I love the idea of release early release often for open source things. You know, if it's an open source package I will often if I'm. If I'm not confident yet, I'll put it as an alpha. I'll say okay, this is the 0.1 a zero alpha release and I won't release code that doesn't run at least. But you know, if I'm not quite confident that the design's right or whatever and some of my projects languish in alpha state for far too long, I'm also trying to get better at committing to a 1.0 release. I've still got my main data set projects on version 0.65 right now, I think. So I've had like 65 releases and I still haven't done the 1.0 and I really need to do the 1.0 for it. But yeah, so that's the process. I've written quite a bit about this. I've got some good sort of write ups on how that all works.
Ronak
Oh yeah, we would love to link that in the show notes. I think what's helpful here to note is that it's not just you using LLMs to tell you what to do, but in a way you were in the driving seat. You're kind of having it just assist you where you still have a lot of structure around it to make you more productive. Where it's not just like I call
Simon Willison
it my weird intern. I'll say to my wife Natalie sometimes, hey, so I got my weird Intern to do this. And that works, right? It's a good mental model for these things as well. Because it's like having an intern who has read all. All of the documentation and memorize the documentation for every programming language and is a wild conspiracy theorist and sometimes comes up with absurd ideas and they're completely, massively overconfident. It's the intern that always believes that they're right. But it's an intern who you can, I hate to say, you can kind of bully them. You can be like, do it again. Do that again. No, that's wrong. No, that's wrong. And you don't have to feel guilty about it, which is great. Sometimes when you're working with other people and like, they're like, they've done five iterations and you're like, you know what? I'm still not entirely happy with this bit, but come on, I'm not going to make them do a sick. That's just not fair. The LLM. You can do that, right? You can just keep on having. Oh, actually, you know what? Rewrite that whole thing in Go.
Ronak
Or.
Simon Willison
One of my favorite prompts. One of my favorite prompts is you just say, do better and it works. It's the craziest thing. It'll write some code, you say, do better and it goes, oh, I'm sorry, I should. And then it will churn out better code, which is so stupid that that's how this technology works. Oh, yeah. But it's kind of fun.
Ronak
Reminds me of our friend Austin. So we have a common friend, Austin. If you tell, if anything, let's say if you're struggling with anything, and if you go to him for advice, and if you ask him, hey, Austin, what do you think I should do? He has one answer for every damn thing, and that's try harder.
Simon Willison
It works really well. Nice. Yeah.
Ronak
Sorry, go on. I think you were saying something. No, no, that's very true. So in terms of interns, like, the good thing is you don't have just one. You have many of them with, like, chatgpt, cloud and whatnot. And as you were describing some of your projects, you mentioned how you used different ones for different things. I'm curious. How do you go about, like, using one over the other? Is it more try what works or do you have a pattern at this point that you go to.
Simon Willison
It's so hard. It's so hard, right? It's hard. I've been calling this. It's vibes based evaluation. Right? Because the only way to figure out if A model is any good is you have to use it repeatedly a bunch of times and try different things about it. And some people are really like, they're really sophisticated about this. They have like a document full of all their test prompts that will run through the new models. I'm not doing that. I should be doing that. I have a few prompts that I always run against the new model just to try and get a feel for it. But a lot of the time I go sort of based on vibes from other people. Like if a whole bunch of people are saying no, seriously. I was all about Claude Sonnet, but Now Google Gemini 1.5 is better for these things, then I'll start experimenting with that one as well. At the moment, my daily driver is Claude 3.5 sonnet. I think. I think that's the best model. But the new Gemini 1.5 from like two weeks ago is getting massive buzz, so I need to spend more time with that one. I still use ChatGPT for walking my dog. The voice mode is amazing. And for code interpreter, like if I'm writing Python code and I want it to actually test that Python code for me and fix any bugs that it finds, I'll go to ChatGPT for that Claude. Also the Claude artifacts thing where it can build little interactive web apps, it's amazing. I'm using that. I use that to prototype up little like things that I'm actually building. I use it to build one off tools like a little pricing calculator for something just for me to use. I really love that feature then and on the command line I love playing with the local models, the ones that run on my laptop. The problem is that they are never going to be up to the standards of like Claude 3.5 sonnet. So for actual real work that I'm doing, I tend not to use them. But because of my LLM project, I'm constantly tinkering around with them. I also, I think they're really good for people learning LLMs because using a kind of crap one that runs on your laptop, it hallucinates way more often, it makes more mistakes. It helps you get that mental model of what they're good at. Much better than working with the really good models. So I always Recommend People like 5.3Gemini Gemma. Gemma 2B is really good. Llama 3.18B is currently my favorite local model. It's quite easy to run. It's a 4 gigabyte download. If you get the quantized version, it's genuinely useful. Like it's shocking. It's definitely. It feels equivalent ChatGPT 3.5 at least. And it's really amazing to me that a 4 gigabyte file can be that useful running on my own laptop. Like, the compression of these things is extraordinary. But yeah, so it's vibes. It's vibes based. It's frustrating. I wish I had better benchmarks of my own to try these things out. And a lot of it also comes down to prompting style. Like some people will say, oh no, I tried Claude and it sucked. And it's like, yeah, but maybe that's because the way you prompt LLMs and the way I prompt LLMs, it's not like I'm doing it right and you're doing it wrong. It's that your way is more compatible with ChatGPT and My Way is more compatible with Sonnet in ways that I don't fully understand.
Ronak
So you've been writing a lot about LLMs over the last few years, and as we were going through your blogs, there's a lot of new stuff that's coming out. And in general, one thing that at least I struggle with is just keeping up to speed with everything that's happening.
Simon Willison
Yep.
Ronak
It's like two weeks, work's busy, or life's busy, and then suddenly something has changed. And I'm not spending as much time building things on top of LLMs, but I'm curious to just learn more and see where the what the capabilities are and how it can be useful. I'm curious how you stay up to date one and also filter signals from noise because there's just so much of it.
Simon Willison
Right. The big one is so Twitter is still the like, I tried moving to Mastodon. I'm very active on Mastodon. Mastodon is mainly AI skeptics who don't like this stuff. All of the AI people hang out on Twitter still. So I maintain presence on Twitter because that's where the AI conversations are happening. So following a bunch of people helps. There are a few accounts that I turn on notifications for, so I get a push notification whenever Anthropic or OpenAI put out a tweet, because it's always like, that's where the big news comes from. The other thing is private groups. I'm on a couple of WhatsApp groups. I'm in a bunch of different discords. Those are great. Those are the highest signal stuff will come from a discord. I'm in with like 15 other people who are very engaged with this stuff. And we'll be sharing notes with each other in there. And then it's blogs, like I blog. Having a blog means a lot of this stuff comes to me. People will like tag me and say, hey, have you seen this new thing? It's relevant to what you were talking about last week, that's super useful. And that's it. And I've got an RSS reader that's subscribed to a bunch of things and substacks and stuff, so forth. But the other thing is like, I don't have a. I'm not employed by anyone else. So if I want to spend a couple of hours because a big thing just happened, I want to research it. It's nobody to tell me not to, which isn't necessarily beneficial for my own projects. You know, I don't have that accountability. But yeah, I'm in a privileged position in that I can afford to invest the time in figuring this stuff out as well.
Ronak
So you mentioned that you're an independent open source developer and this is something I want to talk about. But one question that I wanted to ask before was we're Talking about using LLMs to kind of improve your productivity and being able to build things faster. But there's one thing which comes up which is like learned helplessness. In other words, it's more like your kind of muscles are atrophied and in this case your skills are maybe atrophied where you can't just write things off the top of your mind. For example, I remember some time ago in the recent this year itself, WI fi was out and I was writing some code and copilot wasn't working. I knew what I wanted to write, but I was frustrated because the damn thing were just not autocomplete. And I was like, why is this not working? And it's like, oh, WI fi is out. So I'm curious how you think about that in general.
Simon Willison
Yeah, I felt a little bit of that, to be honest. The other day I went and reported a bug against GitHub Actions for like I was saying, hey, I'm running a Windows GitHub Actions thing and the version of Python can't load SQLite extensions and I thought you'd fix that. This is really frustrating. And then after I'd filed the bug, I realized that I'd got Claude to write my test code and it had just written SQLite code that doesn't. It had hallucinated the SQLite code for loading an extension and I'd gone and I'd literally, I'd reported a bug And I had to close that bug and say, no, sorry, this is my fault, that code is wrong. And that was a bit embarrassing, you know, like, I know I should know more than most people that you have to check everything these things do. And it had caught me out and I'd lost like half an hour of time as well to try and figure out what was going on. It turns out it just hallucinated the wrong way to use SQLite in Python. And SQLite are my bread and butter. I really should have called. So, yeah, this has happened that my counter to this is I feel like my overall capabilities are expanding so quickly. I can get so much more stuff done that I'm willing to pay with a little bit of my soul, right? I'm willing to, I'm willing to accept a little bit of atrophying in some of my abilities in exchange for honestly, like a 2 to 5x productivity boost on the time that I spend typing code into a computer. And that's like 10% of my job. So it's not like I'm two to five times more productive overall. But that is a very material acceleration and like I said, it's making me ambitious. I'm writing software I would never have even dared to write before, so I think that's worth the risk. A lot of people are worried about the impact this has on new programmers, and I've sort of got two conflicting opinions there. One opinion is, like I said earlier, people like the fact that you've progressed, got a semicolon, you lose half a day to figuring out the semicolon. That sucks, right? That's just inexcusably miserable. And fixing that for people is a wonderful thing. I think it opens up, I think way more people are going to learn to program. And I think that the people who are learning to program will be able to learn faster. But I think there are skills they're not that they're going to skip over. I heard a kind of terrifying anecdote from a friend recently where they had a. They knew somebody who was a new programmer. They were just getting started. They were a professional programmer of just like very early stages. And they, they were calling code, they used the word something like goop. And they said. So I got, I got, I got chatgpt to spit out some goop. And I pasted in, it seems to work. And then this didn't work, so I'm going to spit out more goop. And I pasted that and now that's working. And they were asked, well, how are you going to maintain the GOOP in the future? And they said, oh, just get it to write more goop. And that idea that, the idea that code is now goop. As a programmer, that offends my very soul. Like, that's, that's sort of horrifying. But, you know, it's. If you get working, maybe, maybe we are going to have to look like we have. We currently live in a. In a world where half of the world runs on Excel spreadsheets, with no unit tests, with which, with no backup, no version control, no unit tests, and anyone can muck up a formula and the valuation of a company goes down by half overnight. Because, you know, that's the world we live in today, right? Excel spreadsheets are kind of GOOP already and somehow society functions. So maybe those of us who are like, no, every line of code has to be perfect, maybe we're wrong. Maybe actually GOOP is the way forward. But that's a little bit terrifying, you know, it is.
Ronak
I think that's precisely what I was thinking about, that with all of these LLM tools, the amount of GOOP is just increasing, and not just in code, but in almost everything else. And I think you talked about slop as well, which is like the unwanted and not good AI content, especially images, coming out of many, many countries right now. In general, if you think about how these LLMs have been helpful from a usability standpoint, one is they're super cool and exciting and they have way more potential than what we are seeing today. They've truly been impactful in increasing productivity for software engineers where someone who knows what they are doing. And I think we were speaking with Steve Yegi last week and he mentioned like, LLMs are way more safer or way safer in hands of senior engineers who know what they need to be doing as opposed to someone who doesn't. But we don't control who uses this and how they use it. And if you think about the quality of software that's actually coming out, and I was having a discussion with one of my friends recently and we were talking about this, where these tools are amazingly helpful to make us productive, who are, quote, unquote, some sort of an expert in a domain where you kind of know what's right. But a lot of the other tools that come out, they are super nice from a prototype standpoint, from a demo standpoint, but you don't see quite as good tools when it comes to a production system that you would fully rely on. I know rag is a thing that people talk about too where demos are amazing but then production is like, well, would you trust it to give it to customers? So I'm curious, what's your take on that in terms of the sheer quantity of things being built from a prototype standpoint, but the quality isn't quite there yet.
Simon Willison
It, it's really interesting, isn't it? Yeah, like that. I mean so many of these things are completely open questions to me. I still don't like will society Overall in like 10 years time look back on this and say okay, this technology had more pro, more pros than cons or will we just be flooded in slop and be like wow, I wish nobody'd ever invented this stuff at all. And it's harder for me to evaluate that because I think programmers are the best equipped to use these tools like hallucinations in code don't matter because when you run it, you get an error and you fix it.
Ronak
Right.
Simon Willison
Like we are. They're better at code than they are anything else. So I'm getting enormous productivity boost out of this stuff and it looks amazing. Is that just because I happen to be in the one profession in this world that is most attuned to the benefits these things can bring you and then. Yeah, in terms of quality, one thing I've been thinking is you keep. Every now and then you hear a story of a company who got software built for them and it turns out it was the, the boss's cousin who's like a 15 year old who's good with computers and they built software and it's garbage software. The quality is absolutely awful. But you know, it's, it's how these things happen. And maybe we've just given everyone in the world the overconfident 15 year old cousin who's gonna claim to be able to build something and build them something that maybe kind of works and maybe society is okay with that. Maybe that. Because this is why I don't feel threatened as a senior engineer because I know that if you sit down somebody who doesn't know how to program with an LLM and you sit me with an LLM and ask us to build the same thing, I will build better software than they will. There's no question about that at all. But yeah, so hopefully sort of market forces come into play and people, the demand is there for software that actually works and is fast and reliable and so forth. And so people who can build software that is fast and reliable, often with LLM assistant used responsibly, benefit from that. That seems okay to me. But Yeah, I don't know. It's a big frustration I have is I want like lots of computer science papers come out about LLMs. I want sociology papers. I want all of the sort of humanities doing research into the impact on these things. How do people learn to use them? All of this kind of stuff is. And I think that research is happening, happening. But in academia it takes two to three years to get a paper out. So like we're seeing papers come out today that is talking about GPT 3.5 from like December of 2022, which is so, so outdated at this point. But yeah, that's, it's, it's frustrating. There's so, so many open big questions like this that we don't have good answers to.
Ronak
Yeah, we are starting a family pretty soon. So these are questions that at least I'm thinking about these days and so struggling with and don't know the answers to. And I would love to get some of those research papers as well, which I may ask these tools to summarize for me, which is a different problem.
Simon Willison
Oh yeah, I read academic papers now. I never used to read academic papers, but you can copy and paste the app. I built a GPT called De Jargonizer and it's just a prompt that says, yeah, you paste text and it says find all the jargon terms to define what they mean. And so I can grab an academic paper abstract and paste it into De Jargonizer and then I'll understand it because they inevitably use like five terms I've never heard before. But it, yeah, that's so good. It's so useful for that kind of thing.
Ronak
So we've been talking about how these systems are beneficial for senior engineers. And I've been having some conversations with some friends who have kids who are either starting school or already just starting computer science or looking for a job. Which job market has been much tougher this year and the last in general, at least for entry level engineers. But let's skip the job market problem for a second. In general, for junior engineers who have these tools at their disposal to be productive and learn things much faster, what advice do you have for them? For them to also develop some of the skills that you only develop through making mistakes or just building things in production.
Simon Willison
So I'm not qualified to answer this question because I was a junior engineer 25 years ago. So I do not have the learner's mindset. I will answer it anyway. I think it's all about projects. I think build things that do something and ship them My very strong hunch, and this is going back throughout my entire career, the fastest way to learn anything in software is to build something with it. And like not. And also to get beyond tutorials. You know, tutorials are fine. You can go through a tutorial and build that thing. Those will not have nearly as much of an impact on you as saying, okay, I'm going to build a thing that does this or take the inspiration from the tutorial and build something else. It's also, it's great for hiring, right? When I've been a hiring manager in the past, if a candidate can show me stuff that they've built that's worth more to me than any degree. Like I've hired people where we hired them. And then at the end of the process I realized I never even asked them if they went to university because it didn't matter because they showed me cool stuff that they built and they could talk through it. Like you can if you show a great, if you've got a great demo and I can ask, oh, how did you solve this problem? What else did you try? We can have an amazing interview. It also there's that whole the fizz buzz Leetcode side of interviewing. I hate that stuff. I absolutely hate that. If you, you've got code on GitHub which I can read through and I can look through your commit history and see evidence that you know how to fix a bug in a for loop or whatever, I can skip all of that junk, which is obviously better for both of us. So yeah, I think it's having a portfolio of working projects. I think stuff on the web is still best because then I can, if I can hit it in a web browser and see that you've built something on that basis. Like I love, I'm a massive power user of GitHub. I love GitHub pages. So you can like just build a little static web app host on GitHub pages. It'll live forever, right? And it's a URL that people can click on. They can start using it. If you're doing server side code, it gets a bit trickier. I've been, I was, I've used Vercel a lot in the past. Vercel. If you don't give them a credit card so that you can't get accidental denial of service billing problems. The cell can be really good. There's always places that you can host code online if you look around for them. But yeah, having live demos of things that you've built that are hosted online I think is the best possible sort of resume, and it's the best way of learning. And so to this day, I've got a tag on my blog called Projects. And every time I do a project, I tag it project. And right now it has 404. Oh, good number. 404 items tagged projects. And that's over the course of 20 years, you know, so it's. But every single project that I do, I learn just the tiniest new thing. And it's also like, if I want to remember how to like, do a screen, take a screenshot using the Playwright framework. I've written code on GitHub that does that and I can go and look at it, or if somebody asked me how do I take a screenshot with that, I can send them a link to the code that I wrote in GitHub. So it almost becomes an external memory of everything that you've ever learned to do. But yeah, for me, I think that's it. If you're, if you're a new programmer, knockout. It doesn't matter what they are. Weird little things, fun little things. It's also a great excuse to do writing because one of the. The two easiest forms of blogging are something that I learned or something that I built. Right. You can do a blog entry. We just say, I wanted to, I wanted to solve this particular problem. So I built this. Here's a screenshot of it. Screenshots are amazing because they never break. Like, I love screenshots. So if you build hosted software, it's going to break eventually. Take a little video, take a screenshot, stick those up. Like when I've coached people going through boot camps before, and one of the things I always tell them is they always do this sort of end of boot camp project and they'll have a GitHub repository with their project in. I say invest in the readme. The readme needs screenshots of your thing. Like if I'm a hiring manager and I click through, if it's. I'm not gonna check the code out, I'm not gonna try and run it. But if there are screenshots and a couple of paragraphs saying how it works, that puts you in the top, like 1% of candidates. If you've got a README with a screenshot in it. So do that. Right? Yeah. So that's, I think my advice is do lots and lots of projects. Small, small, weird projects, whatever. If you can get them deployed, that's excellent. Then have a readme with a screenshot in and that's a really good way of learning.
Ronak
That's good advice. Talking about projects, I wanted to jump on to life as an independent developer. Now, when it comes to someone working at an employer at a company, for example, the kind of projects you work on are typically driven by some business priority. And there are just problems to solve and you don't need to go look for them. Very often people kind of tell you what the problems are. Well, let me put this way. If you're lucky, you don't need to go look for interesting problems. They kind of come to you at a company and you got to work on those. And there's always.
Simon Willison
It's not your job at a company to figure out what. What is the important thing to do. That's what the management chain is for. Exactly.
Ronak
So you always have the steady input of things to do. And at least these days there's. Everyone I speak with has more work to do than they have the time for. But when it comes to working independently and having to define how you spend time, one needs to be very disciplined. Also have a way of identifying what you work on. So I'm curious how you do that with just like figuring out how do you work on and keep a structure that keeps you going.
Simon Willison
Honestly, that is the hardest problem. It's really, really difficult. So I'm in a very privileged position in that. My wife and I ran a startup for a few years. We sold that startup, it made us enough money that I don't. It's not that I don't ever have to work again, but I have a substantial Runway where I don't have to worry about an income, which is almost like a requirement for if you want to go out independently, especially during open space source stuff. Right. It's very, very difficult to make. And I'm starting to spin up sort of consulting things and so forth because I want to extend that Runway and ideally I want to do what I'm doing right now for the rest of my life. Right. To do that, it needs to be funded. It needs. I need to have a repeatable source of income for it. So I've been building a software as a service version of my main dataset open source project like that. That feels like in open source. That's one of the most proven business models is it's like WordPress, right? WordPress is open source or you pay automatic and they run the hosting for it and they built a really successful business around that. I essentially want to do exactly that because also it's kind of lonely Right. Working open source, working on your own projects. I would like to be able to employ a full team of people to work on stuff with me. That's the sort of big ambition. But that said, honestly, what to work on next. Prioritisation is so difficult when you don't have any external sort of forcing factors. My big thing, I mentioned my weeknotes earlier. Just forcing myself to be accountable every couple of weeks to write the stuff up. I don't care if anyone reads them or not. Like the weeknotes are entirely for me. They're for me to track what I've been working on and the progress towards things. I try and set myself deadlines. I occasionally do conference driven development where you sign up to give a talk at a conference and you're like, this project needs to be in a state where I can actually present it on stage. The dataset AI features are almost all conference driven development. I was speaking at a journalism conference about ways to use AI and journalism. Well, I better have the features ready by then. So yeah, it's really difficult, especially since in the AI space and in software engineering generally everything is interesting in the language model space. I've been calling it recursively interesting because any aspect that you look at like audio models that can process images or how does the training work or how does fine tuning work, just raises more questions. You can just keep on getting deeper and deeper and deeper into any of these spaces. So I don't think I have a good answer to that question, to be honest. Like I've been kind of coasting on the fact that I don't have financial incentives that forced me to do something and letting myself go run wild with all of these different projects. And I would. My number one goal, to be honest, is I'd like to be more disciplined in terms of saying, okay, here are the big goals. How can I go after those? I do have a goal at the moment, so my main software data set, it's for journalists to try and find stories and data. My ambition is I want someone to win a Pulitzer Prize for a piece of investigative reporting where my software was one of that the things tools that they used. So I want dataset to be part of the mix in some Pulitzer Prize winning investigative reporting. And that's useful because I can say to myself, okay, am I building the right features? Am I engaging with the right people? Am I making sure it's easy enough to use and all of that kind of stuff. So that's a sort of like one of my sort of guiding ideas at the moment is that. And so I can ask myself, is the thing I'm working on right now on the path to somebody else winning a Pulitzer using my software? But yeah, I could do with. I sometimes wish I was like raising money from investors just so I had somebody breathing down my neck saying, you said you were going to get this thing done. This is the focus. Have you done it yet? But yeah, I'm still completely, I'm free of all influence at the moment.
Ronak
It seems like in many of these cases creating some sort of forcing function help helps could be like you said, context.
Simon Willison
Absolutely.
Ronak
By the way, you mentioned the startup that you ran with your wife which got acquired. Congratulations. I think it got acquired by Eventbrite if I remember it correctly.
Simon Willison
That's right, yes.
Ronak
And you also were at Eventbrite for some time after that, is that true?
Simon Willison
I think yes. Six years at Eventbrite, Yes.
Ronak
Yeah. And then you decided to go independent. So I'm curious what prompted that decision to then not continue on the job? Because looking at your career, I think you could have gotten any job if you didn't want to continue at Eventbrite, but you chose to be independent. I'm curious what prompted that decision.
Simon Willison
So what happened is I was out of Everbrite for six years. I was a director of engineering focusing on APIs and scaling and internal platform stuff and so forth. And then later on I moved into more of a prototyping and R and D role which, which sort of suits my interests a little bit better. I had this opportunity come up where the University of Stanford have a fellowship program for journalists where the idea is, it's called the JSK Fellows and the idea is they take sort of mid career journalists and they pay them to spend a year on campus at Stanford effectively working on a project that is beneficial to the future of news and that's a very, very loose defined. And I heard about this thing and I got in touch with them and said well I'm not technically a journalist but I've worked in a lot of newsrooms, I've worked for newspapers, I build tools for journalists, I've sort of, I'm effectively a data journalist. Could I be a good fit for this program? And so I ended up being the person on this program who was a bit of sort of the wild card. Right. I wasn't formally a journalist but I was working on journalist adjacent projects and it was amazing and it completely ruined me because they paid me to spend a year working on whatever I thought was most interesting. And once you've done that. It's very difficult to go back to having somebody else set the define what it is that you were going to do. So basically that was the problem is that I experienced freedom for a year and I'm like, I do not want to give this up. I'm having so much fun working on these things.
Ronak
So that's amazing. And the last group question I had on this topic was you mentioned running the startup with your wife. Now, in many cases, this equation doesn't always work as productively, where people who are partners or who live together don't always end up working well together because of all sorts of frictions. I'm curious how so.
Simon Willison
We'd been together for at least 10 years at that point before we got married and we had worked on projects together before. We'd worked at the same companies in some situations. We had a whole bunch of little side projects that we'd built collaborating together, which meant that. And that was really important. We already knew that we could work together. We had very complementary skills. Like I do back end development and systems operations, she does design and front end and front end engineering. So between the two of us, we can build a really good sort of web application ocean together. And the project that ended up being a startup was it started as a side project, actually started on our honeymoon. We got married and we set off on honeymoon where the plan was to travel around the world for like a year plus met with our laptops occasionally, maybe doing a little bit of freelancing work remotely to keep money coming in. And we got as far as Morocco, amazing place, place to travel. We got food poisoning in Casablanca and it was during Ramadan and in Casablanca. Casablanca is not really tourist trail in Morocco. So during Ramadan everything shuts down. And so we basically, we basically rent. We had to rent ourselves an apartment so we could cook for ourselves to try and try and get through this. And since we were stuck there for two weeks, we said, okay, we've got this idea for a website to show what conferences our friends are going to. Let's build that as a little project and put it live. And it was built. This was 2010, we were building this and we built it on top of Twitter. We're like, hey, Twitter knows who our friends are. And we follow people who we like on Twitter. So you could sign. It was called Lanyard. And the idea was you sign it to Lanyard, Twitter and it goes, oh, you follow these 50 people. They are attending or speaking at these 10 conferences. Here are conferences you should know about. And it Works extremely well because it turns out people who speak at conferences have a lot of Twitter followers. And we actually built the database where we'd say, oh, and so is speaking at this conference, even though they weren't a user of the site yet. So when we launched, we had like 100 speaker profiles and zero users, like just the two of us. But that was enough that if anyone signed in who followed one of those speakers, they'd get a recommendation, which felt like magic. People like, oh, my God, this thing knows everything. It's. It's 100 rows in a MySQL MySQL database. That's the whole thing. But it worked. Right. And so that we ended up applying to Y Combinator, the startup accelerator. From Cairo? No, from Luxor in Egypt. So there's a video out there which is Natalie and myself standing in front of this ancient Egyptian temple pitching our YC idea. We don't mention the temple at all. We just played it completely cool that there was this. No, that was in Aswan. It was the Aswan temple behind us. That was kind of fun, right? And so we applied to Y Combinator. We got in. Our honeymoon turned into three months in Mountain View in California doing Y Combinator, which. Which was a little bit different. And then we raised money from that. We hired a team in London. We spent three years sort of building the startup before we got acquired by Eventbrite, who moved us out to California. So that's how we moved to the move to America. But, yeah, so it was a fun startup experience. But yeah, the whole thing. And like I said, it started on our honeymoon. It was a good thing that we'd been together 10 years already and we knew we'd worked on projects because it's a tough thing. You know, when you're literally married to your co founder, you have to set rules, like, no talking. Like, no talking about the company beyond, like six in the afternoon, six in the evening, that kind of thing, which we did not stick to.
Ronak
But it's hard.
Simon Willison
That was the thing. And so Natalie wrote up a really good story of the whole sort of startup story, which I can share a link to as well.
Ronak
Oh, yeah, for sure.
Gwan
That is really cool.
Ronak
That's a fascinating story. Thanks for sharing. So, Simon, this has been an amazing conversation. Thanks for spending way more time with us than we actually planned for. We had a blast. We got to learn a lot, listen to a lot of good stories, and learn about how you use these tools. Is there anything else you would like to add before we.
Simon Willison
I think yeah, the one thing I'll add is as practitioners using LLMs and like using AI, we understand this stuff better than 99% of the population, which I think puts a responsibility on us to figure out the positive ways of using this and then to share that. Like my sort of overall approach to ethics around this is that we're not going to uninvent this technology. So if we can figure out what other things we can do that generally enhance people's lives that make the world a better place, those positive impacts. And if we stay away from generating like garbage slop and dumping that on people, that feels right. So I feel good about the way I'm interacting with these tools mainly because I'm trying to help other people learn how to use them effectively and sort of get over the kind of weird, like weird science fiction fear of this stuff and say, okay, these are quite dumb. They are good at these certain things. If you put the work in to learn how to use them, they can have a really positive impact on what you're doing.
Ronak
That's really well said. Thank you so much, Simon. This has been an amazing conversation.
Simon Willison
Thanks Simon. Thanks a lot. This has been really fun.
Ronak
By the way, didn't mention this before, but I would say two things that I saw in your talk and we'll link to that in the show notes too. Instead of generative AI, I think you called it transformative AI, which is pretty amazing. And instead of artificial intelligence, you said imitation intelligence, which I thought is so accurate. So thank you for those terms. Now I'm thinking of the third thing too. You also coined the term prompt injection, which was.
Simon Willison
Oh yes, we haven't talked about that yet. Yeah, prompt injection, it's the security attack against applications built on top of models we won't go into now. But if you are unaware of prompt injection, you will build stuff with horrifying security holes in. So you need to learn about this one. And then. Yeah, the imitation intelligence. I owe the world a full write up of this. It's an idea I threw out in a PYCON talk a few months ago. Yeah, I feel like artificial intelligence has all of these sort of science fiction ideas around it. People will get into heated debates about is, I don't think this is artificial intelligence or all of that kind of stuff. I like that. So I've been thinking about it in terms of imitation intelligence because everything these models do is just imitating something that they saw in their training data like, and that actually really helps you form a mental model of what they can do. And why they're useful. And it means that you can think, okay, if the training data has shown it how to do this thing, it can probably help me with this thing. If you want to cure cancer. The training data doesn't know how to cure cure cancer, so it's not going to come up with a novel cure for cancer just out of. Out of nothing. And then what was the other one?
Ronak
The other one was transformative AI.
Simon Willison
Oh, yes. I feel like when you call something generative AI, that instantly makes people think, oh, it just generates random rubbish. Right. Okay, it'll cheat and write an essay for you, but, oh, it'll create horrifying images. But is that really that valuable? The most interesting applications, these tools are transformative. It's when you feed in the transcript of a podcast and say, hey, pull out all of the show. Anything that should be in the show notes, which I always do for these kinds of things. Now, that kind of stuff is so much more interesting to me. And so, yeah, I like that idea of emphasizing that. It really is like what you get out is as good as what you put in. But you can put in a lot of stuff. There's a lot of interesting application that's just pump in a bunch of things, ask the right questions, and you'll get much more reliable, interesting results out of it that way.
Ronak
Well, as we're finding out, there's a lot more to talk about and we hope there is a second time and you come back on the show. But today. Thank you so much, Simon. This was amazing.
Simon Willison
Thanks a lot.
Ronak
Hey, thank you so much for listening to the show. You can subscribe wherever you get your podcasts and learn more about us at softwaremisadventures. Com. You can also write to us at hello at softwaremisadventures. Com. We would love to hear from you. Until next time, take care.
Hosts: Ronak Nathani, Guang Yang
Date: September 10, 2024
In this engaging episode, Ronak and Guang sit down with Simon Willison, the creator of Datasette and a prolific independent developer, to discuss the evolution of large language models (LLMs), their everyday use, best practices in documentation, and the future (and challenges) of LLM-powered software development. Simon offers his unique perspectives on productivity, open source life, prompt engineering, and why LLMs are best understood as "over-confident interns." The conversation covers surprising moments in AI's evolution, practical workflows, the impact on programming culture, and the responsibilities of practitioners.
"It's like having an intern who has read all of the documentation and memorized the documentation for every programming language and is a wild conspiracy theorist and sometimes comes up with absurd ideas and they're completely, massively overconfident. It's the intern that always believes that they're right." — Simon Willison (00:00)
"You can be like, do it again. Do that again. No, that's wrong. And you don't have to feel guilty about it... One of my favorite prompts is you just say, do better, and it works." (00:00, 78:54)
"I actually started playing with GPT2 back in 2020... I tried to use it to generate New York Times headlines for current affairs... but it never felt really like an AI. It wasn't like you were conversing with something." (01:51)
"All they did is they slapped the chat interface on top of their existing model... ChatGPT was an experimental prototype and a bunch of people inside of OpenAI thought it was a bad idea... And it was, I think it's the fastest growing consumer application in the history of the world." (03:37)
"Since January 1st, I've been trying to post something on my blog every single day... It's been an accountability mechanism for me for wider work for a few years now because I'm now sort of independent." (05:40)
"With a TIL blog, no, you don't [have to say something new]... I'm publishing it—honestly, it's mainly for me." (08:44)
"All of my work that I do, software work and a lot of my other stuff as well is in GitHub issues... Every single one of my projects has a very active GitHub issues setup." (11:27)
"The issues are the design documentation effectively... The only problem with design documentation... is if it falls out of sync with the code, then people lose trust in it." (15:38)
"Issues are a blog, right? An issue thread is basically a one off blog for the story of this change the story of this feature." (18:59)
"I'm using Substack as a free mechanism to let people subscribe to my blog via email... I built myself a little tool... that pulls all of the content from my blog, reformats it into like HTML rich text, and then gives me a big copy button." (20:50)
"There's something sort of wholesome about having a little corner of the Internet that's just for you like that." (22:58)
llm CLI tool: invoke LLMs from the terminal, with piping, logging, and plugin architecture.
"The UNIX piping idea is always like, you get some content, you pipe it into another thing which transforms it, you pipe it back out again. That's all language models are." (28:52) "I grabbed this beautiful three letter... so pip install llm is how you get it." (29:38)
"You need to build this really, really deep model of what they can and can't do... One of the lessons I think people need to learn as quickly as possible is you've got to run prompts where it gets the answer wrong in a really confident way. Like the earlier do that the better..." (33:31, 36:43)
"I feel like language models could be the key to unlocking this... I continually hear from people who... are using these tools on a daily basis, who've never programmed before, but now they can do stuff." (43:40) "The fact that you've got a semicolon error and paste it into ChatGPT, it tells you the fix. So it's like having a teaching assistant on hand 24 hours a day..." (44:30)
"GitHub Copilot was the first mainstream non chat-based... That interface, the gray text which you get to approve, seems so simple and obvious now." (47:46)
"Ask a question in English and have... Claude Haiku... turn [it] into a SQL query and then it'll run that SQL query..." (51:42) "You can define a SQLite table... and it will populate the database for you. That works so well." (58:40)
"For me, prompt engineering is about figuring out... for a SQL thing we need to send the full schema and these examples... That's engineering. It is engineering. It's complicated." (53:32)
"It's vibes based. It's frustrating. I wish I had better benchmarks of my own to try these things out." (79:59)
"I can get so much more stuff done that I'm willing to pay with a little bit of my soul... It's making me ambitious. I'm writing software I would never have even dared to write before..." (86:09)
"The moment it turns into a project I'm actually going to try and commit to, I start a GitHub issue for it." (73:32)
"I want sociology papers. I want all of the sort of humanities doing research into the impact on these things. How do people learn to use them?" (91:58)
"The fastest way to learn anything in software is to build something with it... If a candidate can show me stuff that they've built that's worth more to me than any degree... The two easiest forms of blogging are something that I learned or something that I built." (95:29)
"What to work on next. Prioritisation is so difficult when you don't have any external sort of forcing factors." (101:09) "My ambition is I want someone to win a Pulitzer Prize for a piece of investigative reporting where my software was one of the things tools that they used." (103:25)
"I experienced freedom for a year and I'm like, I do not want to give this up. I'm having so much fun working on these things." (105:39)
"We understand this stuff better than 99% of the population, which I think puts a responsibility on us to figure out the positive ways of using this and then to share that." (111:31)
On LLMs as overconfident interns:
"It's like having an intern who has read all of the documentation and memorized the documentation for every programming language and is a wild conspiracy theorist and sometimes comes up with absurd ideas and they're completely, massively overconfident." — Simon (00:00)
On blogging as learning:
"Writing is thinking and it's such a great way of forcing you to structure your thinking. You know, the best way to learn something is to try and explain it to somebody else." — Simon (05:40)
On Unit Testing and Documentation:
"It feels like writing all of these notes should slow you down. It's the opposite. It speeds you up." — Simon (11:27)
On Chat Interfaces:
"It's a joke. It's an absolute joke that we've got this incredibly sophisticated software and we've given it a command line interface and launched it to 100 million people. What were we thinking?" — Simon (40:31)
On Prompt Engineering:
"Prompt engineering is about figuring out, yeah, okay, for a SQL thing we need to send the full schema and we send these three examples and these three responses, we need to prompt it in this specific way. That's, that's engineering." — Simon (53:32)
On "Slop" and the Dark Side:
"Will we just be flooded in slop and be like wow, I wish nobody'd ever invented this stuff at all. And it's harder for me to evaluate that because I think programmers are the best equipped to use these tools..." — Simon (91:27)
On Responsibility:
"If we can figure out what other things we can do that generally enhance people's lives that make the world a better place, those positive impacts. And if we stay away from generating like garbage slop and dumping that on people, that feels right." — Simon (111:31)
"Yeah, I got 1.4 million hits on a page from that one. And yeah, without Cloudflare, I would have instantly melted."
"The idea that code is now goop. As a programmer, that offends my very soul. Like, that's, that's sort of horrifying."
"Copy and paste is the best API, right? Copy and paste. Copy and paste half a million tokens of information about that person in I am certain you'd get good results out of that."
pip install llm][Check Simon's GitHub]For further reading and Simon’s tools, refer to the resources and show notes.
This summary focuses exclusively on substantive discussion and content.