Dagger, with Solomon Hykes - Kubernetes Podcast from Google

Summary6 min read

Kubernetes Podcast from Google: Episode Summary

Title: Dagger, with Solomon Hykes
Hosts: Abdel Sagiwar, Kaslan Fields
Release Date: September 17, 2024

Introduction

In this episode of the Kubernetes Podcast from Google, hosts Abdel Sagiwar and Kaslan Fields engage in an insightful conversation with Solomon Hykes, the co-founder of Dagger and the renowned creator of Docker. The discussion delves into the evolution of CI/CD pipelines, the innovative approach Dagger brings to the table, and Solomon's perspectives on the open-source business landscape.

News Roundup

Before diving into the interview, Abdel and Kaslan provided a concise update on recent developments in the Kubernetes ecosystem:

Kubeadm Configuration Update (00:31):
- Kubernetes 1.31 introduces a new configuration version for Kubeadm.
- Notable Quote: Kaslan Fields mentions, "The old version, v1 beta 3, is officially deprecated and will be removed after three minor Kubernetes versions."
Kubernetes 1.32 Release Cycle (00:52):
- The 1.32 release cycle commenced on September 9th, with an anticipated launch on December 11th, 2024.
- Notable Quote: Abdel Sagiwar expresses optimism, "We wish the release team good luck and look forward to interviewing the release."
CKA Exam Updates (01:05):
- The Certified Kubernetes Administrator Certification is undergoing changes in competencies required for passing.
- These updates will be effective no earlier than November 25, 2020.
Generative AI Survey (01:23):
- The CNCF and Linux Foundation Research are conducting a 2024 survey to understand generative AI's deployment and challenges in organizations.
Azure Container Networking Enhancements (01:50):
- Microsoft's Azure Container Networking Team announced new features for Advanced Container Networking Services, including fully qualified domain name filtering to enhance security.

Interview with Solomon Hykes

Introduction to Solomon Hykes and Dagger [02:15 – 04:35]

Abdel Sagiwar introduces Solomon Hykes, highlighting his pivotal role in creating Docker and co-founding Dagger. Solomon begins by defining Dagger as an engine that can run pipelines and containers anywhere, aiming to streamline and standardize CI/CD processes.

Notable Quote:
Solomon Hykes at [03:01]:
"Dagger is an engine that can run your pipelines and containers and can run them anywhere."

Challenges in Traditional CI/CD Pipelines [04:35 – 10:16]

Solomon elaborates on the complexities and inefficiencies of traditional CI/CD pipelines, comparing them to the early days of application deployment before Docker standardized environments. He emphasizes that while applications have become more portable, pipelines remain rigid and often become a tangled mess over time.

Notable Quote:
Solomon Hykes at [04:53]:
"The pipeline logic, the logic that builds and tests and deploys and automates all the tasks to get your application ready and deployed, that's basically its own application now. Yes, right. It's very complicated."

Dagger's Distinct Approach [10:16 – 22:41]

Dagger differentiates itself by treating CI/CD pipelines as applications that are programmable through SDKs available in languages like Python, Go, and TypeScript. This programmable nature allows teams to define and manage pipelines using familiar coding practices rather than rigid configuration files like YAML.

Notable Quotes:
Solomon Hykes at [05:14]:
"It's very hard to run that locally. If it were, people would do it already."

Solomon Hykes at [09:11]:
"Dagger is an attempt at fixing that, and we're doing it in a way that's very similar to how Docker actually fixed very similar problems for the application."

The OS for Pipelines Concept [19:27 – 26:50]

Solomon introduces the concept of Dagger as an "OS for pipelines," leveraging Directed Acyclic Graphs (DAGs) to model and execute pipeline tasks concurrently. By building on BuildKit—the engine behind Docker builds—Dagger provides a robust framework for defining, running, and managing complex CI/CD workflows.

Notable Quote:
Solomon Hykes at [19:27]:
"So we're thinking of this as an OS for pipelines."

Developer Experience and SDKs [26:50 – 37:31]

The conversation shifts to the developer experience, emphasizing the importance of allowing pipeline definitions in multiple programming languages. Initially, Dagger introduced a new language called Q for pipeline definitions but pivoted to generating SDKs in imperative languages to reduce friction and cater to developers' existing language preferences.

Notable Quotes:
Solomon Hykes at [20:14]:
"Every pipeline system that starts out as a strict nice, pleasant configuration gradually devolves into trying to be code."

Solomon Hykes at [25:16]:
"Now, Dagger is an engine that runs your pipelines as a graph of concurrent tasks with data flowing through them."

Daggerverse: A Marketplace for Dagger Modules [44:00 – 62:10]

Daggerverse is introduced as a marketplace for sharing and discovering Dagger modules—extensible components that allow users to build and compose pipelines efficiently. Unlike Docker Hub's binary-centric approach, Daggerverse focuses on source code, enhancing transparency and trust.

Notable Quote:
Solomon Hykes at [44:07]:
"We created a system where you can package your Dagger functions written in your language into what we call a module."

Integration with Infrastructure as Code [61:30 – 65:20]

Solomon discusses how Dagger complements Infrastructure as Code (IaC) tools like Terraform and Pulumi. While Dagger focuses on stateless, cacheable pipelines, IaC tools manage the synchronization of infrastructure state, making them harmonious components within a modern DevOps toolkit.

Notable Quote:
Solomon Hykes at [46:18]:
"Dagger is not an infrastructure as code platform, it's a pipeline platform."

Open Source Business Model [48:15 – 51:31]

Addressing the open-source business landscape, Solomon outlines Dagger's approach, inspired by Red Hat's model. Dagger maintains an open-source license for the engine, ensuring transparency and community trust, while controlling the trademark to maintain product integrity.

Notable Quote:
Solomon Hykes at [48:40]:
"Our model is the Red Hat model. We have a regular open source license with no plans of changing it and strict control on the trademark."

Key Takeaways

Standardization and Portability:
- Dagger seeks to bring the same level of standardization to CI/CD pipelines as Docker did for application deployment, ensuring consistency across environments.
Programmable Pipelines:
- By offering SDKs in multiple languages, Dagger allows developers to define pipelines using familiar programming paradigms, enhancing flexibility and reducing complexity.
OS for Pipelines:
- Treating pipelines as applications managed by an OS-like engine enables more efficient execution, caching, and scalability.
Community-Driven Development:
- Dagger emphasizes community-led growth, leveraging platforms like Discord for feedback and collaborative development.
Transparency and Trust:
- Through Daggerverse, Dagger promotes a source code-centric marketplace, addressing trust issues associated with binary distributions.
Complementary to IaC Tools:
- Dagger integrates seamlessly with Infrastructure as Code solutions, enhancing the overall DevOps ecosystem without overlapping functionalities.

Conclusion

This episode offered a deep dive into the innovative solutions Dagger brings to the CI/CD landscape, challenging traditional paradigms with a programmable, standardized, and developer-friendly approach. Solomon Hykes' insights shed light on the future of pipeline management, emphasizing efficiency, flexibility, and community collaboration. For DevOps professionals and Kubernetes enthusiasts alike, understanding Dagger's methodology provides valuable perspectives on optimizing CI/CD workflows in modern development environments.

Notable Quotes:

Solomon Hykes [03:32]:
"Dagger is an attempt at fixing that, and we're doing it in a way that's very similar to how Docker actually fixed very similar problems for the application."
Solomon Hykes [05:14]:
"The point of a pipeline is to connect pieces together. It's to be the glue, right?"
Solomon Hykes [19:27]:
"So we're thinking of this as an OS for pipelines."
Solomon Hykes [25:16]:
"Now, Dagger is an engine that runs your pipelines as a graph of concurrent tasks with data flowing through them."
Solomon Hykes [48:40]:
"Our model is the Red Hat model. We have a regular open source license with no plans of changing it and strict control on the trademark."

For more detailed insights and ongoing updates, listeners are encouraged to join Dagger's active community channels and explore the Daggerverse for a wealth of shared modules and collaborative opportunities.

Loading summary

Transcript241 lines

[00:00]
Abdel Sagiwar
Hi and welcome to the Kubernetes podcast from Google. I'm your host Abdel Sagiwar.
[00:04]
Kaslan Fields
And I'm Kaslan Fields.
[00:16]
Abdel Sagiwar
In this episode we speak to Solomon Hikes. Solomon is the co founder of Dagger, who is probably best known as the creator of Docker. We spoke about Dagger, CI CD and more.
[00:27]
Unknown
But first let's get to the news.
[00:31]
Kaslan Fields
Kubeadm is moving to a new configuration version with the Release of Kubernetes 1.31. The V1 Beta 4 introduces some changes to the configuration file used to deploy Kubernetes with Kubeadm, the old version, v1 beta 3 is officially deprecated and will be removed after three minor Kubernetes versions.
[00:52]
Abdel Sagiwar
The 1.32 release cycle for Kubernetes began on September 9th with an expected release date of December 11th, 2024. We wish the release team good luck and look forward to interviewing the release.
[01:05]
Kaslan Fields
Lead the CNCF announced updates to the CKA Exam. The Certified Kubernetes Administrator Certification was one of the first to be available for platform administrators. The new updates introduce changes to competencies required for passing the exam and will go into effect no earlier than 11-25-20.
[01:24]
Abdel Sagiwar
The CNCF and Linux Foundation Research are running a 2024 generative AI survey. The survey aims to understand the deployment, use and challenges of generative AI technologies in organizations and the role of open source in this domain. The target population for the survey is professionals familiar with the use of generative AI in their organizations. It should only take about 10 minutes to complete the survey. If you are interested in participating, you can find the link in the show.
[01:50]
Kaslan Fields
Notes Microsoft's Azure Container Networking Team has announced new enhancements to Advanced Container Networking Services. Advanced Container Networking Services is a new product offering designed to address the observability and security challenges of modern containerized applications. The updates include introducing fully qualified domain name filtering as a new security feature.
[02:12]
Unknown
And that's the news today.
[02:15]
Solomon Hykes
I'm talking to Solomon Hyks. Solomon is the co founder of Dagger. He's probably known as the creator and co founder of Docker, the tool that changed how developers package, run and distribute software in the last 11 years or so. His impact on our industry is undeniable and I'm incredibly honored to have him on the show today. Welcome to the show, Solomon.
[02:35]
Thank you. Thanks for having me.
[02:37]
I think that we don't really need to do any introductions. People pretty much know you. If you have used Docker, you have used your software before, right? But we are here Today to talk about Dagger, which I learned about, and the more I dig into it, the more it became intriguing to me as a concept, especially what you're trying to achieve. Let's hear it from you. What is Dagger? How can you describe Dagger to people?
[03:02]
Dagger is. Well, it's an engine that can run your pipelines and containers and can run them anywhere. So that's our short version. It's most commonly used to improve CI continuous integration, which a lot of times is a mess in a lot of software teams. It's something that you just kind of cobble together over time as you ship to your app, and it just kind of gets more complicated and messy, and you ignore it because you have to move fast. And then eventually, if your project lives long enough, you can't ignore it anymore.
[03:33]
Unknown
Yes.
[03:33]
Solomon Hykes
And it breaks, you know, it becomes super slow or things just stop working. Maybe the person who wrote the first version is gone. Now it's kind of just glued together. And so Dagger is an attempt at fixing that, and we're doing it in a way that's very similar to how Docker actually fixed very similar problems for the application. So the insight is that the problems that you have in your CI CD pipelines are very similar to the problems that people used to have with their applications, starting with the fact that you can't run it locally and then trust that it'll run the same on the server. It was also very hard to run them across different servers. Right. So that. That was just reality. Before Docker, every server was kind of a unique snowflake. Right. And reproducing environments was very hard. That's still the reality today with your pipeline. So the application now is more portable, but the pipelines that deliver the application aren't.
[04:36]
So it's interesting, the way you described Dagger when you started talking about it, is you said it's a way to run pipeline, but from my understanding, at least when I was looking into it.
[04:45]
Abdel Sagiwar
It'S also a way to define the.
[04:46]
Solomon Hykes
Pipeline because you have the SDK components, which is available in TypeScript, Node, JS and Go. Right.
[04:53]
Python.
[04:53]
No, go Python, Go Python and JavaScript. And then you have the runtime part. Right. And that's kind of quite different than how most CI tools are today, because they tend to be the runtime for the CI and then a configuration language, which is usually YAML or something. Right. So how do you think Dagger is different than how current CI tools work today?
[05:14]
Yeah, the best parallel is to what happened with Docker and containerization about 10 years ago now you had applications that were stuck on a server and they only worked on that server. That was a unique snowflake, very hard to reproduce environments. Right. And so, so the solution was to containerize. Let's lift that application and package it into some sort of portable format so that you can give it to another server, give it to another developer to run on their machine. It's a thing now I can look at it and I can run it here and here. And there's a runtime that guarantees that the behavior will be the same. Like I was saying that the CI CD pipelines have not benefited from that. So the application is now more portable across servers and from dev to production, but the pipeline isn't. It's stuck on the CI server. And so it's very hard, for example, to make changes and improvements to your CI CD pipeline because you gotta push and pray. We call it push and pray. Right. You make a change in your YAML or groovy or whatever. Git, commit, git, push, pray. Oh no, I made a typo. Start over. It's that way. Because CI started out as just a server. So the pipeline definition was just a configuration for a server that runs your builds. Yes, but now it's much more than that because applications got much more complicated. And so the pipeline logic, the logic that builds and tests and deploys and automates all the tasks to get your application ready and deployed, that's basically its own application now. Yes, right. It's very complicated. And if you look at it as an application, all of a sudden you see an application that is in desperate need for a better tooling and a better development experience. So it's in the stone age compared to the application itself. So the starting point is how do we containerize that? And the reason it hasn't been containerized is that your pipeline is an application, but it's a really special kind of application. So containerizing a CI CD pipeline is not the same as containerizing a web application. If it were, people would do it already. And the difficulty comes from a few places. One is that you can't just run the whole thing in one container. Right? You need to actually run each individual step of the pipeline in its own container. And then you need to orchestrate the movement of data, of artifacts flowing from one step to the next. And first generation container engines, Docker, et cetera, they don't know how to do that. They don't know how your pipeline works. So you can give them the whole thing and they'll run it or you can give them each step in a container and they'll run each step, but they don't know what's going on between the steps. And so really, they just don't know how your pipeline works. And so that's what gets us to your question, which is the coding part. The key to containerizing your pipeline so you can solve the problem of the pipeline not being portable, not being standardized, is to make the container engine, the engine that runs it, smarter. You need to be able to describe the pipeline to it. So you need an API. The dagger engine has an API that lets you describe your pipeline as a graph. Here's a step that does the build, here's the step that downloads the source code, here's the step that deploys whatever, and here's the linkage between them, here's the exact artifact that will flow through. Then once you have that API, on top of that, you add SDKs in native languages like Python, Go and TypeScript to allow the people who understand the pipeline best to describe it.
[09:11]
Unknown
Yes.
[09:12]
Solomon Hykes
And that's the team developing the application that the pipeline will deploy. So that's the key. Right now you kind of have these silos, you have the people developing the application, and then you have the people creating the pipelines that will build and deploy that application, right?
[09:29]
Unknown
Yeah.
[09:30]
Solomon Hykes
And usually they don't really. They're not able to help each other because the pipeline is this complicated mess of YAML and shell scripts. It's its own machine, right?
[09:39]
Unknown
Yeah.
[09:40]
Solomon Hykes
So you got the DevOps team or the SRE team or the build team, the designated DevOps people in charge of that. But the problem is they're centralizing work that grows exponentially as your team grows. Because the tool chains of these teams is always evolving, right?
[10:01]
Unknown
Yes.
[10:01]
Solomon Hykes
Now there's an AI feature. So now there's all these new tools to do inference and set up models. And, you know, there's a whole new team, they're doing data engineering and Python now and all these tools, and they need those tools added to the CI CD pipeline, right?
[10:16]
Unknown
Yeah.
[10:16]
Solomon Hykes
But now they don't know how to change the CICD pipeline, so they're waiting for the DevOps team to do it. The DevOps team is not familiar with the new tooling, so they have to go and figure out, okay, how do I integrate this new tool chain into the CI CD pipeline? So everyone's kind of stuck waiting on each other. You can fix that if you allow each team to program their pieces of the pipeline in the language that they're familiar with. So the Python team, a team that develops in Python, will also develop their pipeline logic in Python. Same thing for Go TypeScript, et cetera.
[10:50]
Right, right. So, I mean, there is quite a lot of interesting ramification to what you described. I think the first one that jumped into mind is, even after docketization, when we got to the step where you could write your CI pipeline as a set of containers, where each task is an actual container, there is always a question of, like, who is responsible for that actual container that executes that actual step? Is it the DevOps team in this case, or is it the developer team? Right. So what if the developers move from one version of Python to another? Who is going to update that container Step like, whose responsibility?
[11:24]
Right, yeah.
[11:26]
And I think that's the other thing that came to mind. From the way you describe it, it sounds like with Dagger, you could potentially run the pipeline on any server. You don't need a CI server, right?
[11:38]
Exactly.
[11:39]
Like. Yeah, CI becomes just like any other application, as long as the engine is on that server. You could just run the CI there and it would just work.
[11:50]
Right, Exactly. That's a big part of the appeal, is basically unbundling CI, taking the logic from these pipelines that do valuable things, that can automate work in your software project, and separating it from a particular server and a particular infrastructure platform. And a lot of times those are proprietary platforms, right?
[12:15]
Unknown
Yeah.
[12:15]
Solomon Hykes
These pipelines will only work on machines operated by GitHub or CircleCI or GitLab or whatever. And there's a separation of concerns that's needed because a lot of times you want to run those pipelines locally, you want to shift left. And what happens usually is people already do. They just have to do the work twice. So there's the build, there's the pipeline, there's the official pipeline that runs when you push a new version of the code to the GIT server. And then there's the semi official or unofficial pipeline that is the set of shell scripts and make files and more that are glued together that you can run locally. They don't have feature parity.
[13:06]
Yeah, they're not identical.
[13:07]
They have a ton of drift, but fundamentally they aspire to be the same thing. It's just that tooling, you can't easily take the CI pipeline and run it locally, so people find a way. Again, it's exactly the same problem that we were addressing with Docker. That was just everyday life for the application. Just running it locally and having the same thing was just impossibly hard. And now we take it for granted. So we hope the same thing will be true for the pipeline, that your CI server is just a server that happens to be running this set of pipelines because you need the power. Or there's something about these servers that you like, but you could change tomorrow. You're not locked in.
[13:50]
Unknown
Yeah.
[13:51]
Solomon Hykes
Or on the topic of people handcrafting shell scripts and make files, sometimes cloud providers would provide you with an emulator that emulates your CI pipeline locally.
[14:00]
Yeah.
[14:01]
Which of course, which. Then they have to maintain two pieces of software.
[14:04]
Abdel Sagiwar
Right?
[14:04]
Solomon Hykes
Totally. Yeah, totally. And it's, you know, remember OpenStack? That was the same idea. Like, you know, it was considered very weird and unfamiliar, what we were doing with containers.
[14:15]
Unknown
Yeah.
[14:15]
Solomon Hykes
And I think, because it was very abstract initially when we said, don't you see this as not portable? Wouldn't it be cool if you could just run the same thing? And the answer was, oh, yeah, we're on that. We're just going to make the VM layer, the machine layer, very standard and open and easy to replicate.
[14:36]
Unknown
Right.
[14:37]
Solomon Hykes
And so, sure, you're running on a set of proprietary VMs over there, but we're going to force everyone to implement these standards and there's going to be this implementation that does the same thing as aws, right?
[14:47]
Unknown
Yeah.
[14:48]
Solomon Hykes
But it turns out it's never actually the same thing. Plus it's not efficient to do it that way. You don't actually want all of that. If you look at. We look at CI pipelines all day long, that's all we do. We look at people's pipelines. And a typical CI CD pipeline today, it drags so much stuff. It's so heavyweight. There's so many dependencies. There's so much complexity. It's just layers and layers and layers of band aids. And at the bottom is always shell scripts, but there's always layers to hide the shell scripts and then another layer to abstract away the abstraction. And so who wants to run that locally? You know, I admire. You know, there's a project called act. You're probably thinking of that one. There's a few, but there's. Yeah, there's a project called act that aspires to run your GitHub Actions workflows locally.
[15:34]
Unknown
Yeah.
[15:34]
Solomon Hykes
And I could never do that. I mean, I admire people who just take on that project because it's an impossible target. Full compatibility with the system. That's not designed to be portable, but you try anyway. You go as far as you can. And then every day I'm sure someone brings this terrible Use case where they really need compatibility. And now you're just, you're debugging two things. You're debugging your CI pipeline and you're debugging the thing that promises compatibility. So yeah, yeah, that was application deployment 10 years ago and it's pipeline deployment today.
[16:10]
Yeah, because we started talking about like CI being configuration and shell scripts and make files. So Dagger is primarily programming language driven.
[16:19]
Yes.
[16:20]
So there is an SDK, there are programming languages. What was the appeal for doing that instead of just another YAML?
[16:27]
Well, that was really the starting point. You know, there's something about pipelines that just makes it impossible to solve the problem we want to solve without programmable pipelines. It has to be real code because of the very nature of a pipeline. The point of a pipeline is to connect pieces together. It's to be the glue, right? It's modular by nature. It's that build connected to that source, control connected to that deployment, et cetera, et cetera, et cetera. And the etc. Part is important because it's always changing, it's always growing. The eye for CI is integration. There's a built in network effect. You're always looking to connect another thing. The component system, the composition system is everything. It can't be this optional thing you tack on later. I think every pipeline system that starts out as a strict nice, pleasant configuration gradually devolves into trying to be code. And sometimes they don't realize it, sometimes they have an epiphany like, oh shit, this should have been code. Okay, okay, God, let's try to fix it. But we're just embracing the software aspect of a pipeline from day one. So our starting point is day one and we've been working years on finding the right model. Okay, if I built an os, a specialized OS for running pipelines, these kinds of pipelines, right?
[17:58]
Unknown
Yeah.
[17:58]
Solomon Hykes
Build, test deployment, but also data pipelines. Now there's AI pipelines and all of those are completely intertwined. Today. If you go to any software team's stack and you look at their pipelines, build, test deployment, data engineering, and now increasingly AI inference, fine tuning, whatever, those are not cleanly separated at all. They're like glued together. So if I were to design an OS that can run these things as an application, that I can program with a full blown software ecosystem where I'm just installing new components to my pipeline with the same level of productivity and ease that I would when I add a library to my mobile app or my web app. Yeah, okay, what would that look like? That's day one for us, right? That was day one, yeah. Otherwise it's a non starter. That's my opinion. You always, you run into a wall at some point. You run into some pipelines that you just can't deal with because you didn't, you didn't bake them, you didn't think of that particular shape of a pipeline. So you made assumptions in your pipeline system, in your runtime. Yeah, Every pipeline is going to be these three phases. You know, there's going to be this phase and then this phase. And then one day a user says, well, actually I need a sep. Completely different. I do this completely differently, you know.
[19:14]
Unknown
Yeah.
[19:14]
Solomon Hykes
And then you're screwed. So it's got to be. Yeah, it's got to be like an os, something that's programmable with an API. And when I say programmable, like you can write code and there's a runtime and you know, system calls, et cetera.
[19:27]
Unknown
Yeah.
[19:27]
Solomon Hykes
So we're thinking of this as an OS for pipelines.
[19:30]
Unknown
Yeah.
[19:30]
Solomon Hykes
And one of the first things I noticed when I was looking into documentation, and this is one of my absolute favorite things in programming languages in general is the function chaining, the width.
[19:40]
Abdel Sagiwar
Right.
[19:41]
Solomon Hykes
Starts with an image, do this, add a file, remove a file, do this, do this, do this, dot execute. What's your take? I mean obviously you implemented it, so you like, like it. There are people who doesn't for reasons. What's your opinion about that? What's your take? Like function chaining or using variables to pass stuff between steps in a function.
[20:00]
Oh, I see, you're saying those are two opposite, like option A, option B.
[20:04]
These are the two. Typically tends to be the two, you know, the two sides of the conversation. There are people who like function chaining for readability, like me, and there are people who absolutely hate it.
[20:15]
Oh, I see. There are design constraints here to get to the best system. Right. And the goal for us is the best OS to develop and run your pipelines. Right. That's. I'm simplifying. We don't use OS in our docs, but it's helpful to think of it as an os because these are applications. Right. These pipelines are applications. And so the starting point for us was what does the engine look like, what does the kernel look like? What's the execution model for a pipeline? Because pipelines are different than regular applications. And so the starting point is, okay, the ideal model for executing a pipeline, for modeling it is a dag, right?
[20:51]
Unknown
Yeah.
[20:51]
Solomon Hykes
That's why we're called Dagger It's a directed graph. So it's really boxes and arrows. And a box is a task and the arrow is an artifact flowing from one task to the next.
[21:01]
Unknown
Yes.
[21:02]
Solomon Hykes
Or actually you could flip it. The box is the artifact and the arrow is the. The task that's transforming it. It works both ways, but the point is, it's a graph either way. And each task, we call them functions in the graph is executed concurrently. Right. So this is. It's concurrency. You know, running things in parallel is baked in. And that dictated our choice of technology. So we use BuildKit as our kernel, which is the same tech that powers Docker build. So when you write a dockerfile that's internally Docker build will convert it to this graph definition that then is executed by BuildKit. And we discovered BuildKit can do way more than builds. It's more of a general purpose DAG execution engine. And so you could think of DAGGER as what if you built an OS on top of BuildKit, and instead of just running builds, you built, you ran the entire pipeline.
[21:59]
Unknown
Yeah.
[22:00]
Solomon Hykes
Turns out it works great. So that's our starting point. Okay. We have this engine. It's the most powerful way to model and run pipelines. You get all these benefits. It's faster caching everywhere. You get all these benefits. Okay, but how do I program it? You know, so we have. This is the best way to run pipelines as a dag, with parallel tasks and then separately. The other insight, the only way to really solve pipeline development and pipeline deployment for everyone is to have a programming model that's real software where people can actually exchange components, reuse each other's codes, you know, so you can really take any pipeline and run it on Dagger. Right. So you need both of these things.
[22:41]
Unknown
Yeah.
[22:42]
Solomon Hykes
How do we connect them? How do you program this weird engine that runs things as DAGs and has a declarative API for modeling that DAG? That's the key thing. The engine is declarative. The API for the engine is declarative, always will be. Because you can't program a graph of concurrent tasks imperatively. Because an imperative model is you're telling one computer, do this, then do this, then do this. But the DAG is lots of little computers, basically, and each one is performing its tasks. So it's like a factory where there's a bunch of stations and the robots in the factory are each doing their thing in parallel. So you can't just write a python script to describe that or shell script. It doesn't map. So most of our Struggle before launching, looking for the right design has been figuring out this dilemma. How do you create a great programming model, a great developer experience for this inherently declarative engine? And initially we thought we found the solution. We found a declarative configuration language that was more powerful than the others. And it looked and felt kind of like a nice, familiar imperative language. So it was like YAML, but better. And you had reasonable components, you had templating, you had comments, you had lots of cool stuff that was a language called Q. So when we launched, we launched, the only way to program Dagger was to write these pipeline definitions in this language called Q. Mm. And then we launched on that. And then we spent basically six months supporting people and helping them build their pipelines. And we realized, okay, people just don't want to learn a new language. You know, they love the power of this engine, but it's just too much friction to have to learn this whole new language. And so we went back to the drawing board and we found a way to generate SDKs in a declarative language, sorry, imperative language like Python, go TypeScript, that could then query a declarative API. And the model for us is actually a very familiar model. There's a precedent, which is SQL. So when you're writing a web app in PHP and you make SQL queries, that's an imperative language, dynamically calling a declarative API a declarative API. SQL is a declarative language. And so in our case, we used GraphQL. GraphQL is also a declarative language. Turns out GraphQL is great for navigating graphs. Who knew?
[25:17]
Happy questions.
[25:18]
Yeah. So summarizing all that, long story short. Now, Dagger is an engine that runs your pipelines as a graph of concurrent tasks with data flowing through them. Which is the best model? It's the optimal model for running a pipeline at a fundamental level. And that engine is driven by a GraphQL API that lets you any client written in any language, or you can do it in curl if you want, you can do it from a web browser. Describe this DAG of tasks, and these tasks run in containers, et cetera. And so you have the full power of this engine available to you in this declarative API. And then on top of that, we have imperative SDKs that make it easy to query and extend that API in your language. So that's the stack. Back to your question. When you're at that point, you made your way to a great developer experience for programming dags that gets you to one of the Options, naturally. The other option, it's not a matter of tastes. Oh, I like. I don't like chaining. Like, it's. Oh, if you don't like chaining, you're going to hate the performance and extensibility of your pipeline. Because that's what a pipeline is. It's, you know, you're chaining operations. So that's the point is it's not a matter of superficial subjective preference. In my mind, there's one path that gets you to the solution and the other that gets you to back to where you started, just with another abstraction that won't scale.
[26:51]
Unknown
Yeah.
[26:52]
Solomon Hykes
I mean, it's definitely interesting to hear it from you, the logic in the sense that you walked your way backwards toward SDK and not toward, like, let's not starting with the function chaining and then getting into how the engine is designed, but the other way around. Right?
[27:07]
Yeah, absolutely. Yeah. And I mean, it was. We did not have any preconceived notion of, oh, this is the syntax we want to the point that we started with a completely different language. Yeah, really, we're just, we're on a quest to finding the best developer experience in a very empiric way. We have the. Which is why the community part is so important.
[27:28]
Unknown
Yeah.
[27:28]
Solomon Hykes
We call it community LED growth. So everything we do starts with a community. It's like the first feature of the platform. It's the same with Docker. So we're kind of refining and improving the model. So we have a Discord server that's very active. We have all these events and calls, and it's just a really fun and engaging place to be. You know, it's all the. It's like a support group for people, for traumatized pipeline engineers. We all talk about our terrible pipeline stories and how to fix it. But the reason, it's not just gimmick that there's this community, it's just that we need the constant feedback. So we're developing the thing in the open and every day we're drinking from the fire hose of feedback. People are. They take the new version and they go and try to improve their pipelines and they come back and said, you know, they say, this was great, this didn't work. And we've been doing this for years. And so the. Starting with, hey, try this language. And then, okay, no, this yes, this no. And then we just keep going. So, yeah, no preconceived notion, but a very ruthless, pragmatic, empirical method of what's working here. What do people love? What's making them more productive. And we make assumptions and sometimes we're right, sometimes we're wrong. But yeah, that's how we got to this point.
[28:45]
Sounds like I need to join the Discord server because there are people.
[28:48]
Oh, yeah, you should join. If any of this seems interesting to anyone listening, the one takeaway is you should join our Discord because a lot of people in that Discord share your weird niche interest for DAGs and pipeline.
[29:02]
Engineering and function chaining.
[29:03]
Yeah, function chaining. And if you hate it, come tell us why.
[29:06]
Unknown
Yes.
[29:07]
Solomon Hykes
So the next question was going to be because you talked a little bit about the kind of the logic of starting from the core, the API, the declarative parts, and then the imperative parts with SDK. And one thing when I was looking at documentation is you can write your pipeline as a set of functions, but you can execute each function separately, which is the equivalent of executing each step in a pipeline separately with the Dagger cli. But you could also execute just let's quotes on quotes. I'm putting air quotes here. The last step, which for example, for an image is publishing and it will automatically resolve all the previous steps.
[29:43]
Right.
[29:44]
Can we talk about that? Because that's actually super interesting.
[29:47]
Yeah, I agree. And it's almost like it sort of requires rethinking the model. You know, it's sort of okay, you have to change your frame for how you look at the problem, and then once it clicks, everything is much more fun. And so this frame that you're talking about, it's similar to Just in Time manufacturing.
[30:09]
Unknown
Yeah.
[30:10]
Solomon Hykes
You're familiar with that? Yeah. So it used to be you would just build 100,000 units at a time of this one car model, and then you store it in a warehouse, and then you wait for people to buy it, order it or whatever. And then, you know, Toyota and this whole Kanban movement came in. And I guess now it went through a whole hype curve. But I mean, it was revolutionary, the idea that you would manufacture on demand and if your systems could support it, it required retooling everything, rethinking everything. But the efficiencies were massive because you just didn't have to deal with all this inventory and you could adapt and be much faster. But the parallel here is start with what you need plus the DAG of all the dependencies, and then the engine will figure out what needs to be executed or has been executed before and can be loaded from cache. So you don't talk to Dagger in terms of what to do. You describe a full Graph of what could be done, and then you say what you want, and then from there, the engine will give you what you want. So if you want to publish an image to this registry, say that. And Dagger knows exactly how to get you there, right?
[31:26]
Unknown
Yeah.
[31:26]
Solomon Hykes
What's interesting is, over time, you start collapsing, because here's what happens. You start. Usually what happens is you have an existing set of pipelines.
[31:34]
Unknown
Yes.
[31:35]
Solomon Hykes
And then you start. It's. For whatever reason, you start using Dagger. You want to simplify it or make it faster. And so usually you start with a small piece, you pick the pipeline that's just the most painful, and you call it daggerizing. You daggerize it. And you can do that very easily, very incrementally, because we don't. You don't have to throw away your existing CI. Right. It's just a tool that will run inside your CI and over time, your CI, gradually, you kind of eat it from the inside. It just becomes this envelope for running dagger pipelines. Right. And then the same pipelines you can run locally outside of CI. So as you do that, you start finding opportunities to collapse with this on demand, this pull model. It's a pull model as opposed to pushing. For example, in a typical CI CD pipeline these days, there's a lot of intermediary artifacts that are being built and then pushed somewhere, only to then have another pipeline pull them, pull them to do something else with them. Sometimes there's several steps of that, especially now when there's models and other additional layers. And we just kind of got used to this, like, oh, yeah, here. What's this pipeline's job? Oh, it's to build and push. What's this other pipeline's job? Oh, it's to pull and do something else and then push or kubectl apply or whatever. And then so you daggerize. What happens is you daggerize one. Great. You just made it more efficient. And then later you say, okay, let's daggerize more stuff. Let me daggerize this. So now you have a dagger pipeline pushing to Registry and then another dagger pipeline that's triggered by some complicated system that pulls from that registry to do something else. And at some point you're like, wait, why do I need this registry? It's just a cache.
[33:16]
Yes.
[33:16]
I'm just using it as a cache, literally. And I could remove another 500 lines if I just merge these two functions and I just call the second one, and it will kind of, on demand, call the other one, and the artifact will just flow through and the dagger has a Cache. So now literally the cache, the artifact that you used to push explicitly and pull is now in the cache.
[33:38]
Unknown
Yeah.
[33:39]
Solomon Hykes
So I think that's a really powerful mechanism and I think it's going to take several years for it to play out. And it's not just Dagger, it's a general efficiency that just needs to be implemented because this is so inefficient right now. And the end, I think the end result is, I think that the very concept of an intermediary CI server kind of goes away.
[34:00]
Unknown
Yeah.
[34:01]
Solomon Hykes
Because all that whole thing in the middle is the embodiment of this push model because I'm developing here. So when I'm developing, I need to know the results of my tests or I need to lint my code or I need to do all these things. Right. So there's things that you need during development and then there's things you need at deployment. So really it's the production server that needs the final container or kubernetes configuration or whatever to apply. And so ideally, and I don't think we're ready as an ecosystem yet, but eventually I think the production server can just say, can ask for the exact artifact it needs and then the pipeline to produce that artifact kicks off on demand at that moment. So whatever tests or builds or code generation steps or anything, configuration generation, anything at all can be kicked off on demand from deployment. So I think if you have development and then CI and cd, this is simplify, I think development and deployment kind of eat CI. They each kind of eat half of it and you end up with two things, development and production.
[35:21]
Yeah. And just to be clear, for people who are listening to this, what we've been talking about for the last five minutes, I guess. So let's take, and you correct me if I'm wrong, Right. So let's take a very simple example of a pipeline. You need to build an image, push it somewhere, test it and deploy it, or publish it. Well, let's say build, test, publish. Right. If you are using a typical pipeline, that would be three steps, YAML, you know, build, test, publish. With all this intermediary pull push you're talking about, or doesn't necessarily have to be a pull push. It could be saved to a local folder, shared across the steps. That's like an example. Right. But with Dagger, what you would do is you would write your build function, write your test function which depends on the build function, then write your publish function which depends on the test function. And then once you execute the publish function, then Dagger automatically knows, oh These things needs the test, which itself needs the build. And that's the graph part you talked about.
[36:22]
Exactly. It's really similar to targets in the makefile rules in Bazel. Right. So build systems have this similar DAG model with rules or targets.
[36:34]
Exactly.
[36:34]
But it's that. But applied to the entire pipeline.
[36:38]
Yes. And I think the powerful part is the fact that you don't have to do that intermediary. What am I going to do with my artifact when I move from one step to another?
[36:46]
Right, yeah, yeah, you still can. I mean, of course. Yeah, there's an adoption part which is, I think. It's not like we had this novel idea and no one had thought of this before. It's hard to design and implement correctly, but we're hardly the first to ship a good implementation of this kind of a DAG model.
[37:04]
Right, yeah.
[37:04]
What's really hard is making it practical for people to adopt and practical to adopt in a ubiquitous way. So it adapts to enough real software projects out there that you can reach critical mass. And in this case, the whole point is critical mass. If you don't have critical mass, then you're not useful. You can't justify the overhead of the complexity. So it's. For us, making it very easy to incrementally adopt is very important.
[37:31]
Unknown
Yeah.
[37:32]
Solomon Hykes
And so my next question was going to be, so you built, you wrote all your pipelines and all your stuff, and then you have to. While you are at the stage where you're going to use dagger to run your pipeline locally, your functions could be camel case, which is, you know, each word has a capital letter. But then the CLI will execute it as a kebab case, which I find super fun to write.
[37:54]
I see you've actually played with dagger.
[37:55]
Yes, I did, yeah. So kebab case basically means if your function is called publish with capital P, you executed as dagger. Run publish with small P. But if it's a more complex word, it will just put in small letters all the words and then daisy chain them with dashes, essentially kebab case versus common case. This is quite interesting because in other. I mean, if you take Java as an example, Java will have a flag where you can pass the class name as a full camel case class name.
[38:22]
Abdel Sagiwar
Right.
[38:22]
Solomon Hykes
So what was the logic there? Why the kebab case versus the camel case?
[38:26]
Yeah, well, first of all, this is a hot, hotly debated topic.
[38:29]
All right?
[38:29]
It's a great question because it points at a really important dimension of dagger, which is the importance of a cross Language ecosystem. So, you know, pipelines, everyone needs a pipeline and everyone wants to develop their part of their pipeline that's relevant to them in the language that's familiar to them. That means in order to actually solve the problem for everyone, Dagger has to work great across many languages, and it has to allow composition and linking of the different steps of the pipeline across languages. That's a really hard problem to solve. And at some point you're going to have a collision between the conventions and expectations of each language. Silo. Right, and so then how to capitalize things is one area where you have this sort of culture shock.
[39:25]
Yes.
[39:25]
And the place where the culture shock takes place is Dagger. Right. So just to give a little context, so Daggers Engineering, it has an API. And then you call this API to describe what to do declaratively as a dag, you do that by chaining functions. So each box, if a graph, has boxes and arrows, the boxes are function calls. You call these functions through this API, but what functions do you call? What's available to you? The Dagger API comes with batteries included. So there's a set of core types and core functions, attach those types for fundamental operations and then you kind of build up from there. So the fundamental operations are pull a container image, run a container, move files around, Git, clone, things like that. Also networking, you can bind a port from one container to another, you can set up tunnels. And so from those building blocks you can build almost any pipeline. And we're still expanding that API, but a lot of it's there. Oh, there's secrets also. There's a core secret type, so you can safely pass secrets where it's around, et cetera. So with that, you can write a client in Go or a client in Python, and then it calls that API and it does something cool. But very quickly what happens is you want to extend that API. You don't want to just you from your little corner run a cool pipeline. You want to encapsulate that pipeline logic you just wrote and abstract it in a new type that you define. Like a custom artifact that represents your Python project with a particular way of building it, or your deployment platform with a particular set of tokens you have to pass whatever. And so the extension is key. And so we created a system where you can package your Dagger functions written in your language into what we call a module. And that module is basically an extension for the Dagger API. So the dagger engine loads it and then it does magic. And then if a client calls that engine and queries the API now there are new your types and your functions are also available. And then you can do that recursively. So you can call, you can load a module, that module itself depends on other modules and those dependencies happen cross language. So that's the key. So I can write a Python mod, a module in Python using the Python SDK, and then I can use someone else's module written in Go, et cetera, et cetera. Where everyone meets is this GraphQL layer. All the way back to your question. Python functions in each of these silos, we want Python developers to feel at home. When you write Dagger functions in Python, it should feel like real Python. Same in Go, same in each SDK. Then as an extra feature, I realize I'm talking a lot, maybe too much context.
[42:21]
No, I love it.
[42:22]
But I guess this will act as a reference. Then you can also call any of those functions from the cli. So basically you can dagger call and you can compose a pipeline dynamically from the command line by saying call this function from this module and then chain to this, to this, to this. And in the cli, there's also an expectation of capitalization, like you said, Kebab case. It's weird for a shell scripter, a DevOps person writing YAML and shell scripts, you type shell commands all day and it's very rare that they're capitalized, right? Yes, it's just weird. Like in Java, things are capitalized and it feels weird to the shell script. We don't want it to feel weird, we want it to feel familiar. So what we do is we translate the capitalization. So you write your function name in Python with a Python convention, and then if someone calls that function from the command line will expose the same function name. But in a shell friendly capitalization, if a Go developer calls that function in their module, we're going to generate Go bindings for them that look, they have the capitalization of Go, etc. So occasionally people get confused by that, like why? Or we mess it up. Like there's lots of edge cases like where do you put the dashes? Anyway? What a rabbit hole. What a fun rabbit hole. But it's worth it. That part works really well. It's what makes dagger work. You know, cross language composition is really hard. You know, there's grpc.
[43:49]
Unknown
Yes.
[43:50]
Solomon Hykes
There's just rust, I guess.
[43:51]
Unknown
Yeah.
[43:52]
Solomon Hykes
But yeah, I'm really glad with how it turned out.
[43:56]
Unknown
Yeah.
[43:56]
Solomon Hykes
I am coming from the shell world, so for me it felt familiar. That's why.
[44:00]
Yeah.
[44:01]
So you talked also about composition and the fact that you can reuse other people's functions or modules. And those are published into the Daggerverse?
[44:08]
Yes, well, yes, they're searchable in the Dagiverse.
[44:12]
Oh, so they can be hosted somewhere else.
[44:14]
Yeah, so we copied the Go module system. Exactly. So these modules are just code. First of all, they're not binaries. You don't distribute binary artifacts. It's a source code ecosystem. That's one thing I was always frustrated with with Docker is we, you know, we had an ecosystem of binaries. These images, yeah, obviously very powerful. But you also want to know what's behind these images, what's inside, how was this built? And we never got around to standardizing that because, you know, everyone adopted Docker and then different platforms went their different ways and created their versions of this. So Red Hat at their own source code ecosystem, cloud foundry, you know, a gazillion different platforms. Right, yeah. And that was a source of frustration for me because there's some features, some things you can do only if you have visibility into the source code. Right? Yeah, you can just do more stuff with the platform. So this is a pure source code ecosystem. Also, it's easier to trust and verify. You can go look at the code. Do I trust this or not? We don't host the modules. So just like go's model, which I love, it's code. So we already have a very efficient system for distributing code. It's called Git. So actually, yeah, any Git server can host a dagger module and then you don't need any third party service to access it. You just point your dagger, CLI or SDK at that repo and it will load it and just do its thing. But if you want to find modules or get information about modules, like is this trusted? Is this popular? Where are the modules? We have a search engine called daggerverse that basically indexes all these modules and over time we're going to give you more useful information. And again, Google has their package index. I forget what it's called.
[46:02]
Yeah, I forgot the name. I know what you're talking about.
[46:04]
Yeah, same thing. Same thing.
[46:06]
Nice, Nice. Awesome. So I'm going to take the conversation in a slightly different direction, but it's kind of still related. Infrastructure as code. Where do you see dagger fits? If it ever fits into that world?
[46:18]
I think it's very complementary. I mean, it's a very common integration and superficially you'd think, oh, there's overlap. This is code that is code. Surely Dagger will try to do infrastructure and, you know, I Don't know. Terraform, Pulumi will try to do deployment, pipelines, build, et cetera. And superficially you can always find a maximalist for every tool that will try just to do everything with that tool.
[46:42]
Unknown
Yes.
[46:43]
Solomon Hykes
But in the case of Dagger, we're very clear that Dagger is not an infrastructure as code platform, it's a pipeline platform. And one very common task that is automated in the pipeline is infrastructure provisioning. So it's very common to have a dagger pipeline that includes a step that calls Terraform or Pulumi or something like that. And because it's code on both sides, actually it's much nicer to integrate. Yeah, you know, there's a lot more you can do.
[47:18]
Unknown
Nice.
[47:18]
Solomon Hykes
So mostly it's that they're very complementary tools because fundamentally the model is different. A dagger is about one way stateless, cacheable pipelines, just a flow of artifacts.
[47:31]
Unknown
Yes.
[47:31]
Solomon Hykes
Infrastructure management is fundamentally about two way sync of state. Right. You have the state of your cloud resources and then you have the view of that state and then you try to reconcile. That's really kind of like the ARC thing. I would hate to do that. It's really hard. But yeah, it's a great business if you can do it because you have all this lock in like who's going to go and mess with the AWS provider? Right. It's like a driver.
[47:55]
Unknown
Yes.
[47:56]
Solomon Hykes
And so we gladly integrate that and just make sure all the hard work by Pulumi and Terraform is available as a first class citizen in your pipeline because your pipeline involves provisioning, infrastructure and also 10,000 other things that you need to glue together. So that's our job. Yeah.
[48:14]
Unknown
Nice. Nice. Awesome.
[48:15]
Solomon Hykes
This is going to be my last question. I took a lot of time from you.
[48:18]
Oh, this is. I love talking about my product, so don't worry about that.
[48:23]
Awesome, awesome. So what are your thoughts on the open source business? In the lights of everything that's been happening recently. Without mentioning any names.
[48:32]
Oh, like licensing changes, things like that.
[48:34]
Just to name a few licensing changes, stopping publishing, built artifacts for free, things of that nature.
[48:41]
Yeah, I think, I don't know, I feel like at some point people started thinking open source was like a business category. Like what business are you in? Oh, I'm in the business of open source. But that's not a business category. It's an implementation detail of a product and a business. Right. So I think if you group everything between open source and not open source, then you'll get confused because there's a lot of very different products and businesses that happen to create open source code or be involved in open source. So we're one model. On the other end of the spectrum, you have a business like post hog just to pick one that I'm familiar with. You know, it's like the open source Amplitude, you know, and product analytics, but open source. And it's a very different situation because in their case they open source as a business argument, you know, hey, this is open source, so that means you can go and run it yourself if you want. You know, you're not locked into us. It's like a, you know, it's a pragmatic business decision. And in our case it's different because the Dagger, not open source does not make sense because we need this developer ecosystem. And so you need to give developers what they need to be able to customize and build their own software on top of the platform. So in our case it's not about convincing a buyer that they won't be locked in. I mean it helps, but it's just a different, it's part of a community led growth strategy. Right. So I don't know, I guess licensing doesn't matter in my opinion as much as people think. If you have a business that requires changing the license, that means you probably screwed up something else, you know. Yeah, I don't know. Here's our model. Our model is the Red Hat model. Basically, you know, extreme openness on ip. So, you know, we have a regular open source license with no plans of changing it and strict control on the trademark. So you can take the Dagger engine and modify it, redistribute it, do anything you want with it, because it's true open source. But if you want to call it Dagger. Yeah, then there are rules, right? You can't take Dagger, patch it, rebuild it, change it in any way and then still call it Dagger because that's our trademark. So that's really not about open source, really, it's about any software product. Because otherwise, let's say someone ships like a broken modified version or they ship a feature that they thought was great but we don't like, then that's confusing to users. What is Dagger, what is not Dagger? So that's really important to us. And also it was very important to Red Hat. I think it's a great model. That way you stay in control of what's your product, what's your product experience. And then if the community doesn't like what you're doing, they can always fork it and create something else with a different name. And everyone's happy, you know.
[51:31]
Awesome, awesome. And I think that we couldn't have ended it better so with Levius at that. Thank you very much for your time, Solomon. This was a pleasure talking to you.
[51:39]
Oh, my pleasure. Yeah, thanks for having me.
[51:41]
Awesome.
[51:44]
Kaslan Fields
Thank you very much, Abdal, for that interview.
[51:46]
Unknown
It's really exciting, of course, to get to speak to someone who has had.
[51:50]
Kaslan Fields
Such an impact in this area of the industry.
[51:53]
Unknown
I've seen Solomon talk about Dagger at Kubecon. He gave a keynote where he talked about it a little bit, but I hadn't looked into it myself. So I was excited to learn a little bit about what it is. And so it's a CICD solution and we'll get more into that in a second. But one thing I wanted to say about my perspective on this is I feel like cicd is always something that I try to avoid in our world. It's like very closely tied to the concept of containers because of the whole concept of it works on my machine being kind of resolved by the way that containers encapsulate the processes. And so it's very close to concepts of cicd and it's used in a.
[52:39]
Kaslan Fields
Lot of CICD solutions.
[52:42]
Unknown
And so the two are close enough to each other that there's some assumption, I think that when you understand something about containers and kubernetes, you also know a bit about cicd. And I have aggressively avoided that personally because I think, I think because I come from kind of more of the.
[52:59]
Kaslan Fields
Sys admin side of things and I.
[53:02]
Unknown
Really love that side of things and anything that brings me closer to the developer side of things, I need to avoid. I love getting the things to run on production, but I don't like the process of getting to production from the developer space. But with Solomon describing it, I was like, maybe I should look into this more.
[53:23]
Abdel Sagiwar
Yeah, I think that the biggest drawback or challenge or whatever you want to call it for CI CD for most people is the fact that when people talk about it, it usually has these synonyms of very slow feedback loop in the sense that you have to submit your code and then wait for the pipeline to run to finally realize, oh, there is an error.
[53:46]
Solomon Hykes
Right.
[53:47]
Abdel Sagiwar
And most CICD tools, most open source CICD tools at least doesn't have a way to run your pipeline locally. Right. So you get like your fast feedback loop. I mean, there are solutions, for example, Scaffold Google released like a while ago, that both can be used to do local pipelines and you know, run it on some CI CD stuff. But yeah, most of the time people have to have this like slow feedback loop, which I think is a huge drawback if you're a developer, basically, because as developer you want to see if your code works and you want to see that very fast.
[54:20]
Unknown
It is one of those things where when you talk to developers they're like, ah, yeah, CI cd.
[54:26]
Abdel Sagiwar
And that's what Dagger is trying to solve. Right. I mean, beside trying to solve it through code instead of configuration files, it's also trying to be that thing you can run locally and you can run in cloud and have that expected same behavior if you want to. I think this is the easiest way to phrase it, basically consistency in terms of how it works.
[54:48]
Unknown
I think there's a lot of philosophy also in the world of cicd. The whole concept of, I like in the testing and observability world, like running things in production and the whole concept of you never really know if something is going to run in production when you're running it in development environment, because you can try to make it as close as you can, but you can never be quite sure. So that CI CD step is really important and when done well, it can prevent a lot of challenges, but it's.
[55:21]
Kaslan Fields
A very difficult thing to do. Well, so. Interesting.
[55:24]
Abdel Sagiwar
Yeah. And as Solomon said in the episode, it's basically the inspiration or the reason why Dagger exists is coming from this realization that most people start with something simple and then start building things into their own CI CD tools, which then turns into like a Frankenstein type, like configuration file plus bash scripts plus hacks to make it do whatever you need it to do. Right. So that was like the kind of starting point of why Dagger exists in the first place. Right.
[55:53]
Unknown
Which is very understandable because getting from solving the problem of it worked on my machine and it's not working over here is a very difficult one. And that's kind of the problem that CICD is generally trying to solve is you've got it running in one environment and you need it to run in a different environment. How do you make sure along the way that by the time you get to that other environment it's going to work?
[56:16]
Yeah.
[56:16]
So I liked that he called out the origin of the name of directed acyclic graphs.
[56:23]
Abdel Sagiwar
Yeah, dag. Yeah. Besides the fact that Solomon seems to have some very interesting things that start with D so Docker Dagger, you know.
[56:34]
Unknown
Yeah, that's true.
[56:35]
Abdel Sagiwar
Actually there is a video on YouTube of him answering specifically this question of like.
[56:42]
Unknown
Really?
[56:42]
Solomon Hykes
Yeah, yeah.
[56:43]
Abdel Sagiwar
There is actually a video like, I don't Know which. I don't remember which conference, but we will make sure to have it in the show notes. That's essentially because preparing for the episode, I was looking up his stuff on YouTube to see like, you know, kind of like what he talks about and how to not ask him same questions that have been asked before. And one of the questions specifically was like, why do you have an affinity for things that start with D?
[57:04]
Unknown
I had not thought about that as a trend, but.
[57:08]
Abdel Sagiwar
Well, yeah, I should have asked him the question, like, what's after the dagger?
[57:12]
Unknown
Yeah. Is it something else that starts with a D? Yeah, exactly.
[57:15]
Solomon Hykes
I think one funny.
[57:16]
Abdel Sagiwar
One funny, interesting thing that. I mean, totally off topic really, but, like, it just interesting to me is right before jumping on the episode to record, I was on Wikipedia, because of course he has a Wikipedia page. And I realized that he actually speaks French. So, yeah, when we started, we started.
[57:35]
Solomon Hykes
Talking in French and he was like, why?
[57:37]
Abdel Sagiwar
How do you know French? And I was like, because he lived in France for a while, so it was quite interesting for me. They're just like, completely.
[57:43]
Unknown
Yeah, he talked about that.
[57:45]
Abdel Sagiwar
Yeah, yeah, it was.
[57:46]
Solomon Hykes
It was.
[57:46]
Unknown
He talked about that at Kubecon in Paris in the keynote. Yes, yes.
[57:50]
Abdel Sagiwar
It was just very interesting because he has such a unique name. Like Solomon Hikes is not really a. I mean, I think the last name is probably common, but, like, Solomon is not a very common name.
[58:00]
Kaslan Fields
That's true.
[58:01]
Abdel Sagiwar
So that's why I was like, why do you speak French? And then, yeah, he gave me a little bit of the backstory.
[58:07]
Unknown
But getting back to Dagger and the concept of directed acyclic graphs, so what he was saying was that pipelines are always graphs, if you get right down to it. It's about how you connect one system to another, essentially in the steps along the way. So it can always be represented as a graph. So he's starting with that as the baseline, which I found very interesting. And I liked the way that he compared it to Docker. Docker was about standardizing at the right level. He talked about how there was also efforts going on around the same time trying to standardize the way that applications were run, but that were happening at a lower level at like a hardware level, which I can't imagine that ever working. That would have been very difficult to implement on a global scale. But Docker, what I think was really successful about it was that it did standardize at the right level. So he's trying to do that again here kind of with starting at the directed acyclic Graph?
[59:06]
Abdel Sagiwar
Yeah. I mean, if there is anything that people should take from this episode is the fact that one of the core features that I really like about Dagger, because I tested it, and it's obvious in the episode that I'm a big fan, is that as you are moving your artifact from one step to another in your CI CD pipeline, you don't have to save that artifact somewhere. Right. Like, as you said, the graph representation of a CI CD pipeline, which is like steps connected by lines. As you are moving from one step to another, the artifact is taken care of by Dagger. So you don't really. Because in a typical CI CD pipeline, what you will have to do is build the Docker image, for example, and then push it somewhere so that the next step can pull it and do things with it. Right. Either you push it to a registry or you save it to some local directory. But with Dagger, you don't have to do that. Right. Like, the artifact itself will be passed along to the next step in that graph. So it's technically less work or less code that you will have to care about.
[60:08]
Unknown
And something else he did mention there, which I also thought was really interesting, was his frustration with Docker images being a binary file and that being the unit that you kind of share around because you lose so much in the conversion from the source code to the binary file. And he talked a bit about how in Dagger, they tried to address that, and I didn't quite understand exactly what that was, so it sounded kind of like some kind of marketplace of Dagger pipelines that are shared as source code.
[60:40]
Kaslan Fields
Is that what it was?
[60:41]
Abdel Sagiwar
Yeah.
[60:41]
Solomon Hykes
So the Daggerverse, they call it.
[60:42]
Abdel Sagiwar
It's a marketplace of basically functions, because in Dagger world, steps in your CC pipelines are functions. And so basically you don't have to reinvent the world. You can just use everybody's or somebody's function. So they have this place where they can share the code. But what's interesting about that specific thing is they don't host the code, Right. They basically index it so you have a place to search for stuff, but the code is not in the dugoverse, it's somewhere else. They just index. They merely index the mood walls. So you can just find them in a global search database kind of thing, and then you can just reuse them or import them, or import the code and modify it or do whatever you want with it. Right. So, yeah, that's essentially the Daggerverse.
[61:31]
Kaslan Fields
I'm seeing it in my head as.
[61:32]
Unknown
More like a stack overflow of pretty.
[61:35]
Abdel Sagiwar
Much, I guess, components yeah, minus the opinions, I guess.
[61:41]
Unknown
Yeah, good point. That's probably the main part of stack overflow. So it's basically kind of a marketplace of source code, of functions, and you compose a single dagger pipeline of multiple functions. So it's like the components that you can use to build the right pipeline for you. Which I feel like makes sense with the issue he was talking about of you wouldn't want that to be a binary. Something like that.
[62:10]
Yes.
[62:10]
Abdel Sagiwar
Because that was his frustration, as you said with Docker, the fact that Docker Hub is essentially binaries with not necessarily the code itself. Right. So. And I think it boils down to the matter of trust. Like, do you actually trust a built binary that you don't have access to? And to remedy this, in doc reverse, they basically index the code so the code is not on doc reverse. Right. So on Dag reverse. So, yeah, I think it's quite interesting as an approach and it comes from learning from stuff that they have done at Docker that they wanted to do differently.
[62:42]
Unknown
I felt very validated when he said that too, because I have also been frustrated by this about binaries. Like, I'll get a binary of something and I'm like, how do I know what's in this thing?
[62:54]
Abdel Sagiwar
And you shouldn't actually trust stuff from Docker Hub.
[62:58]
Unknown
I was like, I guess this is just the way it's done and I should just be happy with it. But if Solomon Hykes isn't happy about it, I don't have to be either.
[63:05]
Abdel Sagiwar
Correct.
[63:06]
Solomon Hykes
Yeah.
[63:06]
Abdel Sagiwar
Usually the way I describe it when I do talks about security, generally speaking is do not Download images from Dockerhub, IO Iamahacker, HackMe, please kinda images. Don't do that. Right. Because you never know.
[63:21]
Unknown
I wonder if something like that exists. It would be pretty funny.
[63:25]
Abdel Sagiwar
Probably not the same way I'm describing it, but I am quite confident it does.
[63:31]
Unknown
Well, yes, definitely does in that respect. But something with that naming would be pretty funny from like a.
[63:37]
Abdel Sagiwar
It would be very funny.
[63:38]
Unknown
A security professional who's trying to run like some of like capture the flag or something would love to see. That would be pretty funny. Yes.
[63:46]
Abdel Sagiwar
If you listen to this and you find something on Docker Hub that has this, please send it to us.
[63:51]
Solomon Hykes
It would be interesting to see.
[63:53]
Unknown
Yes, please do. And one last thing from the interview that I wanted to talk about was his discussion about how IOC relates to Dagger. I loved how he framed it as like. It's very understandable that people would compare the two and wonder if Dagger was going to be An IOC solution of some sort in infrastructure or iac. I put IOC infrastructure, code, infrastructure as code, infrastructure of code. But yeah, it makes sense that folks would wonder if they're going to be in that space because infrastructure management and pipeline management can be very conceptually similar things. But I loved how he framed it as infrastructure management is two way.
[64:42]
Kaslan Fields
It's not something that you do once.
[64:44]
Unknown
You're not just moving something from local to production, you're setting something up and then you are managing it over some amount of time. So you're going to have to be.
[64:55]
Kaslan Fields
Able to manage changes to it as well. Which is very different from what he's.
[64:59]
Unknown
Trying to do with Dagger, which is a one way street. It is a directed acyclic graph where you go from the beginning to the end and then you don't do things again in the middle. So I loved that explanation of like infrastructure as code integrates with Dagger, but Dagger has no interest in becoming an infrastructure as code solution itself.
[65:20]
Abdel Sagiwar
Yes, and also the fact that he described it as these things can be complementary to each other. Right. Because the, I mean the very typical conversation you would have with people is where does infrastructure as code start and stop and where does your CI CD pipeline start? Like, do you use infrastructure as code tools to do everything including deploying the app itself, or do you use other things? So like, having that clear, distinguishing between the two is pretty good, but having them being kind of complementary to each other is also interesting, I think. In my opinion.
[65:56]
Unknown
I think that is one thing that has kind of put me off from cicd in the past is that it is such a broad area. Where does it start and stop? Yes.
[66:07]
Yes.
[66:08]
And so I like this take on.
[66:09]
Kaslan Fields
It and I'm glad that we learned about it today. Thank you very much Abdel for conducting that interview and thank you Solomon for being on.
[66:15]
Abdel Sagiwar
Thank you. I hope you have enjoyed it.
[66:18]
Kaslan Fields
Thank you everyone for listening. That brings us to the end of another episode. If you enjoyed the show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on social media kumareyspod or reach us by email@kumarespodcastgoogle.com you can also check out the website@kumarespodcast.com where you'll find transcripts, show notes and links. To subscribe, please consider rating us in your podcast player so that we can.
[66:45]
Unknown
Help more people find and enjoy the show. Thanks for listening and we'll see you next time.