Transcript
A (0:02)
Light in space 22. 5 Wake up.
B (0:12)
We're here at Neurips with John Yang of Sweet Bench and many other things. But welcome.
A (0:17)
Thanks so much for having me. Yeah, really happy to be here.
B (0:20)
Last year I talked to Ophir and I think Carlos as well, one of your co authors. How's seabence doing? Like, just. Just generally, the project is like one and a half years old.
A (0:31)
Yeah. Yeah, I think one and a half years old in terms of when it was actually useful. Yeah. We put it out October 2023, and then people didn't really touch it too much. And then, of course, like, cognition came on the scene and Devin was an amazing release. And I think after that, it kind of kicked off the arms race.
B (0:48)
Did they tell you beforehand? And they just showed up.
A (0:50)
You know, I got an email about, like, two weeks ago. I think it was from. I think it was from Walden. He was like, hey, you know, we have a number on it. I was like, wow, congrats. You know, thanks for using it. And then the release was, like, mind blowing. I was like, wow, these guys did an excellent job.
B (1:05)
Yeah, amazing. And then sweetbench Verified was like, maybe last year.
A (1:09)
That's right. Yeah.
B (1:11)
Catch us up this year. Like, you have other languages. There's, like a whole bunch of varieties of Sweet Bench now. Yeah. So what should people know?
A (1:19)
Yeah, for sure. I think there's a couple extensions that have happened. One is, like, more suite Benches. Sweep Bench Pro. Sweep Bench Live.
B (1:27)
Oh, Sweep Pro. Was that with you guys? Because it looks independent. It's like different authors.
A (1:31)
It's completely independent. Yeah.
B (1:32)
So they just call this Bench Pro without your blessing?
A (1:35)
Yeah, I think. I think we're. We're. We're okay with it. When we came out, we were like, oh, cool. Interesting. It would have been, you know, fun to be part of it. But, you know, I mean, congrats to them. It's a great benchmark. Yeah.
![[State of Code Evals] After SWE-bench, Code Clash & SOTA Coding Benchmarks recap — John Yang - Latent Space: The AI Engineer Podcast cover](/_next/image?url=https%3A%2F%2Fsubstackcdn.com%2Ffeed%2Fpodcast%2F1084089%2Fpost%2F186610569%2F183dd75aed4203e2c58adcc0da042dcd.jpg&w=1920&q=75)