
QuantStack is an open-source technology software company specializing in tools for data science, scientific computing, and visualization. They are known for maintaining vital projects such as Jupyter, the conda-forge package channel,
Loading summary
Sylvain Corle
Quantstack is an open source technology software company specializing in tools for data science, scientific computing and visualization. They're known for maintaining vital projects such as Jupyter, the Conda Forge package channel and the Mamba package manager. Sylvain Corle is the CEO of QuantStack. He joins the podcast to talk about his company Conda, Mamba, the new Mamba 2.0 release, software, supply chain security and more. Gregor Vand is a security focused technologist and is the founder and CTO of MailPass. Previously, Gregor was a CTO across cybersecurity, cyber insurance and general software engineering companies. He has been based in Asia Pacific for almost a decade and can be found via his profile at Vand HK.
Gregor Vand
Hi Sylvain, welcome to Software Engineering Daily.
Sylvain Corle
Hi Gregor, thanks for having me here.
Gregor Vand
Yeah, very exciting to have you here today, Sylvain. I think a lot of the listeners will know quite a bit about you and the projects you work on and you might not need any introduction to them. But equally we'll have a lot of listeners today who know nothing, which is also exciting and you get to talk to some completely new people. In terms of what we're here to talk about today, generally speaking it's Mamba and through the company QuantStack. So I think main thing is just to sort of set the scene here actually just to ask like what is, you know, you're the CEO of QuantStack and based in Paris, what is QuantStack and sort of what's the relationship to Mamba? How did it come to be, you know, I think a key maintainer of Mamba. Maybe just start there.
Sylvain Corle
Sure.
So yeah, so QuantStack, it's a team mostly it's a team of open source maintainers of key projects of the scientific computing ecosystem. So some of the main projects that we are active in are Jupyter.
We're very active in the Jupyter project. The team comprises over 10 people working.
Full time on the project and we've been some of the main drivers in.
The recent innovations in Jupyter, such as collaborative editing, the visual debugger for Jupyter Lab, the new version of the new flavor of the Jupyter notebook that came out recently.
So jupyter has been one of the main projects that we've been active in in the past years. We're also very active in the open source package management ecosystem for science, mostly.
With the Mamba project that we're going.
To talk about today and Counterforge and.
Finally there is a new chapter that.
We'Ve started actually a few weeks ago with the Apache Arrow project.
In the wake of the recent layoffs at Voltron Data, bunch of maintainers of.
Apache ARR showed up at Pedata Paris and they told us that they were looking for a new home. And so we were starting this new chapter which is very exciting. And so QuantStack more than a team of open source developers. We're not a startup, so we operate under more of a service consultancy model where people and companies that depend on.
These tools in their operations or in.
Their products contract us out to do.
Bug fixes, maintenance and sometimes add new features.
Yeah, so it really started as this form of self employment for myself and a couple of others. So at the very beginning we did not have any sort of strategy for growth and nearly all of the business.
Was inbound and this was like this.
For a few years and eventually it became a bit more deliberate and now we are a team of about 30 people. The team is not just in France, so we have the biggest group is in France, but about half of it. And then we have significant team in Germany and also folks in Austria, in the UK and Spain now.
Gregor Vand
Awesome. And just to clarify, has it always been quite scientific leaning in terms of the focus or did that come out of a project?
Sylvain Corle
It's always been indeed focused on sciences since the very beginning, just from our.
Professional backgrounds and the projects that we were focused on.
Gregor Vand
I think that's a good sort of just distinction to make in terms of where any of this has come from. So again, for anyone not familiar with any of these sort of companies or packages that we'll be talking about today. So we're here to talk about mamba and it has this sort of intertwined relationship with some other things like conda, Forge, Anaconda. Do you want to just maybe speak at a high level to kind of what the interrelationship is between them?
Sylvain Corle
Yeah, so assuming that people don't necessarily know what CONDA is, maybe I should define it. So CONDA is a general purpose package.
Manager that works on multiple platforms like.
Windows, Linux, OS X and it's very popular in the scientific computing ecosystem. So maybe I should better define what it's not because there is a lot of confusion about it. So CONDA is not a Python package manager in that it's not a package.
Manager for the Python programming language. It's more similar to YUM or dpkg.
Like apt, GET and rpm, you know.
The classical Linux package managers.
It's in that what is installed is binary packages and already pre built assets. It's different from Linux package managers in.
That we can create multiple software environments.
In multiple locations on the file system and it's cross platform.
Gregor Vand
And again, sort of in the context of a lot of science based projects, is it fair to say that they're often working with multiple environments and actually that could be using quite different packages or languages? And that's sort of part of the inspiration behind all of this.
Sylvain Corle
Yeah, so came out of the Python.
Scientific computing community, even though it's not.
A Python package manager. And it started from the observation that a lot of the popular Python packages were actually built upon Fortune or C code bases for efficiency and had a thin layer of Python for usage in the Python interpreter.
And these posed significant distribution challenges.
This whole thing started before Python had a weird format for distributing packages that would embed binaries in the Python packages. So this was the raison d'etre of the project to really enable a better story for distributing binary packages. The other thing is, I think the notion of environment is really key. People use Docker as some kind of.
Way to bundle a bunch of stuff.
Together that you can easily distribute, but it's still more of a way to.
Distribute something where there could be a lot of mess.
Package managers are really key in distributing software in a way that's reproducible, in my opinion. And reproducibility is also another key problem in scientific computing and in science in general. So being able to switch back and forth, like if you have, let's say.
A jupyter notebook that you've written maybe.
Like 10 years ago, which does some.
Number crunching and produces a few plots that you use for a scientific paper. How do you run it today? How do you reproduce the set of packages and data sets that you were.
Using to produce these plots?
And this is one of the reasons why having a strong environment story for creating such bundles of packages is really.
Important in scientific computing.
Gregor Vand
Yeah, makes a lot of sense. So yeah, I feel we've got a lot to cover today, so we're going to sort of dive straight in. Mamba 2.0, I believe, has not that long been released. What would you describe as kind of the big changes there? And I mean, I guess Mamba generally is written in C, is that correct?
Sylvain Corle
Right. Mamba is really meant to be a drop in replacement for CONDA initially. And it started as. So basically when the KANDA community grew, there is this community channel of packages called Kanda Forge that starting to overgrow the rest of the ecosystem and had tens of thousands of packages. And the way the KANDA solver was devised was actually crumbling under the weight of Kanda Forge. And MONBAS was created by Wolf Folprest.
Who was an employee of Konstack at the time.
As initially just a hack, basically I'm going to delegate the solving of environment and the dependency resolution to another library written in C and use CONDA for everything else, which was still fragile, but proved very promising.
So our first approach was to reach.
Out to the CONDA project, Anaconda Inc. And tell them about it and ask.
If they would be willing to fund.
QuantStack to make this a reality. And at the time they were not so interested yet in mamba. So we decided to make it its own thing. So what MAMBA is today is really.
An alternative to Canada to install the.
Packages of the same ecosystem that's fully compatible with Cana, even supports the same command line options.
So.
But unlike conda, it's written in C, which really helps with speed in some areas. There is another flavor of MAMBA that is also very popular, called Macro mamba. And Macro MAMBA is a single statically linked bundle of mamba. It's essentially the same code base but with different linkage. And if you want to install Micro Mamba, it's just four megabytes. It's a four megabytes download, while an.
Installer for like a basic installation of.
CONDA requires a Python interpreter and a.
Bunch of dependencies in the space environment. And download sizes are in the tens.
Of megabytes or maybe 80 or 90 megabytes over 100 and some platforms.
So Micro Mobile has also proven very.
Useful in CI workflows where we want.
To bootstrap an environment very quickly and.
Fully self contained and can be used.
To create CONDA environments from scratch.
Gregor Vand
Yeah, exactly. So Mamba 2 released in just the last couple of months, is that right?
Sylvain Corle
Yes, that's right.
Gregor Vand
Okay, so talk to us about that.
Sylvain Corle
Yes. So Mamba 2 is, first of all.
It'S the result of a very boring refactor of mamba.
Really. MAMBA was, I would say that the MAMBA first major release was sort of the result of a rush.
Basically there was such demand for it.
In the community that we were rushing.
To cover the features of conda.
And it was entirely built to be used as a command line utility. But then some people started using it as a toolkit. They wanted to use the internal components of mamba. And these were not necessarily written in a way that was Very solid. And because if we assume that people use it as a common line utility, we, we can make all kinds of assumptions that are not necessarily satisfied if you are making a toolkit or a library that ought to be used in web services and be thread safe and whatnot. So basically Mamba 2 is in many ways almost a rewrite of Mamba in a more deliberate software engineering. With a more deliberate software engineering bottom up approach.
Right.
There are a bunch of new features though. First we as of Mamba 2, we can specify mirrors for package channels, which is really important for things I'm going to talk about later. And this will soon be utilized in.
Popular installers such as those of Kanda Forge and others.
And also we support more protocols for downloading packages. And one key protocol that we wanted to support and is enabled in Mamba 2 behind an experimental flag is getting packages from OCR registries and other cloud storage solutions.
Gregor Vand
Gotcha. So let's sort of dive into one of the kind of key topics I think would be great to cover today. And you know, I'd be curious, think a thread throughout this would just always be what has. Or maybe there's been no change, but what has Mamba 2 maybe brought to this in any, in any different way over version one. But let's talk about vendor neutrality effectively and what can you do say on the basis of, for example, let's take Mamba specifically for now and then if you would like to branch out into any other areas, what's kind of preventing a single organization and I guess in this case it might be quantstack from for example, dominating the project's direction. What would you say to that?
Sylvain Corle
Before we get to Mamba specifically, I think just wanted to take a step back and talk a bit about more like why open source at all. And I think there is a specific reason in the case of scientific computing that is not necessarily as present for other areas of computing in that in my opinion there would be a very deep contradiction for physicists, for example, to try to understand nature with a tool.
That they don't have the right to understand.
And so this contradiction is the key reason why scientists, long before the open source movement was a thing. We're sharing code and sending descriptions and.
In depth documentation of their code to each other.
This is really the origin of the.
World Wide Web, which was started at.
CERN and studied by a physicist. And then if we look at more recent events, and the reason why the Python Open Source Scientific Computing Committee is so big is that it was also built As a reaction to the scientific.
Computing world being held by a few.
Corporations that were imposing very high costs on licenses for computing tools. And these costs were preventing people from certain countries from engaging because they couldn't afford the licenses. They also prevented in some cases students from using the tools.
And so they were really building a.
World garden around scientific computing. So everyone in this field has this atavic sort of need for openness and making sure that we are not giving ourselves up to one corporation. Right. So distribution of packages and package management is really one of the key areas in which having one actor dominating how you distribute and install packages can be very harmful. And so obviously here in the case of Carda, we're talking about Anaconda Inc. Which in many ways is a company.
That I really look up to and.
Has done great achievements in this area making scientific software more usable and accessible. But conda has challenges with respect to vendor neutrality. And so obviously if you check out the kind of documentation, it's pointing to the Anaconda distribution and channels in some areas Anaconda channels are hard coded in the CONDA code base. One more I would say problematic thing is that there are some root cryptographic keys that are also hard coded in the code base that are used in package signing protocol use.
The update framework, which is some kind.
Of extension of it, uses asymmetric key pairs for signing packages, ensuring their authenticity. But at the moment only Anaconda Inc.
Can sign packages that could be verified with the conda client.
So first, you know, Anaconda Inc. And the Canta community, they have taken steps to resolve this. So some, you know, for example, some of the documentation has been fixed recently. There are open issues.
I don't know, maybe PR is open.
As well, removing the hard coding of KANDA channels. But I think in the end, like the elephant in the room is the.
Branding and identity of the project, which.
Is that Anaconda and CONDA really share a non trivial part of the name like the visual identity and the logos of the company and the project are very similar. And even if they really wanted to solve this, they are really facing a difficult situation. So yeah, this is one of the main reasons why just the existence of Mamba is important because it allows projects like Khanda Forge to potentially rely on another implementation of the KANDA content trust protocol. And in case of takeover or failure of the company, we have fallbacks that we can also use. So without even getting into the technical benefits of Mamba that people could disagree with.
Gregor Vand
Yeah, so I think just sort of then talking about I guess you've laid out a very clear reason why there are aspects of conda Forge plus Anaconda that their approach, it can work, but there's just various obvious drawbacks. If you just take it sort of at face value in terms of then how MAMBA has approached this. And if we think about things like community engagement and I'm very curious about what policies, like how have you set out policies in terms of ensuring contributors, regardless of affiliation, for example, have equal opportunity, which I think is kind of what you're getting at there. And maybe then just a follow on from that is things like the signing of packages, what policies are then around that if it's not this one entity who can effectively design their own policy around that and you can use it or not, where has MAMBA gone with that?
Sylvain Corle
So it's not as much as who can put code into the code base.
As much as what's in the code.
Base in terms of hard coding of channels and hard coding of keys.
I think these two things are really important.
If we want people to be able to, for example, start a new community channel and have a key signing ceremony for starting a well devised software supply chain security policy for the channel and then use mamba, there is no place where we would prevent this from happening at the moment. And so yeah, I think that's the main, the main thing. For example, if you download the main micro Mamba binary, everything can be overridden and it's not hard coding. Any channel by default, any package source by default.
So you have to configure it and choose where you're going to get your packages.
Gregor Vand
And I think an obvious kind of place to go next is really supply chain security. We've covered a couple of products and frameworks in the past on this topic, but Mamba, I believe this is kind of a core tenet as well of what MAMBA is supposed to be helping with. So maybe could you just speak a bit to sort of what does it bring in in that way? And again, if there's any, I guess, comparisons, you know, effectively with conda Forge or otherwise as to what is different and why, I think that's very interesting to understand.
Sylvain Corle
So it's a broad subject. So conda separation security. So one thing that is currently in the Marba code base is an implementation of the KANDA Account and Trust protocol, but in a way that would allow anyone to provide their install their public keys on the system so that we could check packages. It's going to be really important for a community channel like Kanda Forge to be able to sign the packages in the future as it's increasingly becoming almost a regulatory constraint. So with the recent laws that were enacted in the US and also coming to the eu, state agencies won't be able to use packages that are not implementing these kinds of good practices. As a consequence, it's going to percolate in the entire industry. So preventing Counterforge, which is the de facto main source of packages for scientific computing, from signing the packages, I think can be really harmful. That's why I think having an independent implementation of the current protocol without even getting into any kind of innovation is really important and it's really bound to vendor neutrality. And getting back to this question of vendor neutrality, actually, I think one thing I didn't do is I think we have a path forward that should probably allow everyone to continue operating in a way that's satisfactory, including Anaconda and including conda, Forge and everyone. And would also resolve this kind of.
Branding issue around the Canada project.
Because what I really think is that Anaconda is really trying to resolve the easiest situation. They acknowledge that there is a problem at the moment and so they have opened up CONDA for more community led governance. They have transferred over the CONDA trademark to the Non Focus foundation and like a proposal for the future would be to actually, rather than trying to have Khanda Project become more open but still.
Be bound by this kind of weird.
Situation with the name is to create a broader organization that would encompass mamba conda, but other clients like Pixie and.
That would not be named after either of these projects.
And CONDA would just be one of.
The members of this broader community.
And this community should be built upon.
Common standards and have an open governance.
For deciding what the future should be.
Gregor Vand
I believe at least at the moment, you have bi weekly dev meetings that you actually host for the project that people can come and be a part of. Is that correct? Does that sort of play into what you've just been talking about? Is that almost like a grassroots thing there where how can the people that have the same vision can come together and not just talk about that, but is that a sort of forum for talking about this as well?
Sylvain Corle
So Canada has their own sort of governance and regular meetings and they also now have their really nice system for proposing changes in Carda with the CEP process. We also host public meetings for the MAMBA project. CONDA forges its own thing as well and is really focused on the tools for building packages, but obviously they have a vested interest in the tooling. And so yeah, this is why my take is we need to actually acknowledge all of that and have sort of a broader community movement and to not continue with the snake based names. We could, you know, should call it something else completely, like not Rattler, not conda, not Mamba, maybe call it, I don't know, Scikit Packaging or whatever. Like something that really conveys the idea that we are about package management in a generic way and we come from these, we have these scientific roots.
Gregor Vand
Yeah, I mean this is obviously a bit of a sidebar, but I was kind of curious here, why Mamba? But just based on, I mean, I think obviously at the beginning of the episode you said you had approached Anaconda to suggest this concept and but at the same time, what was then the thinking to continue the snake naming if the idea was to have something that kind of didn't sit completely opposed to, but is meant to represent a different direction?
Sylvain Corle
So first, in the very beginning we weren't really thinking too much about it. Like Mamba was probably the result of a name search for fast Snake and speed was the main reason of why it was started. And we were just continuing the series of snakes in that ecosystem around Python and CONDA and whatnot. And since it was just a demo in the very beginning, we went to Anaconda as natural outlet for this demo and as a potential client that could fund this work. Then we continued working on it, but outside of B Label Cycles and as a side project for worldfall Project at.
Quantstack and a bunch of others.
And eventually the problems that CONDA was facing became key for some of our clients and we managed to put some CONDA and MAMBA related deliverables in some.
Client contracts and it became a thing.
So it's only recently that we realized that we needed to be more thoughtful about it and make it have this standing in the community that oh, maybe this should be a thing and it should be part of a bigger thing that includes CONDA and Pixie and whatnot.
Bitwarden Sponsor
Are your software deployments secure by design? Lately, Secure by Design and Shifting Left principles have been hot topics in the software industry, pushing development teams to make security a foundational part of software development. Today's sponsor, Bitwarden supports developers in securing every phase of the development lifecycle with end to end encrypted credential management. This ensures software is built on secure principles to prevent data leaks and unauthorized access. Try Bitwarden Secrets Manager, built specifically for developers to safeguard infrastructure and machine secrets or Bitwarden Password Manager for everyday logins and other sensitive information. Start a free trial today@bitwarden.com.
Gregor Vand
So just hopping back in from that, I'd like to just touch on the supply chain security bit a little bit more and then we're going to switch gears a bit to webassembly because I think that's it. Interesting place to go and leave this stuff behind. But yeah, I mean, just in terms of the software supply chain side of things, again, just what can you kind of speak to in terms of, I mean for example, I'm more familiar with for example Node Package Manager and NVM and all of that kind of ecosystem and sort of understanding what policies and what sort of decisions have been made around that to by no means fix a lot of problems, but at least enhance certain issues that have come up in the last couple of years. So again, what is MAMBA doing to that end?
Sylvain Corle
Yeah, so one of the features that actually came with Mamba 2 was the support of package mirrors, which is great from a vendor neutrality standpoint. Right. But actually brings more challenges with software separation security in that if you have a network of mirrors, could there be.
A bad actor, could there be a.
Mirror that is compromised? And there is a number of approaches to be a bad actor in network of mirrors for package tunnel. And this is actually sort of by chance a good reason why the CADA.
Content Trust protocol, which is based on.
The update framework, is a really good fit because it really addresses some of the key challenges that could happen in software distribution.
For example, how could someone managing a.
Mirror freeze their packages at a certain date before security update? All of the packages that they host would still be legitimate, everything would be real. But how can we prevent that?
So the signature should presumably work, right? So TUF addresses this by requiring some.
Kind of cryptographic heartbeat in that when you try to get a package from this channel or it will download cryptographic key that actually has expired, unless the root origin of the package sources has re signed it for this content, it's going to not consider this content as valid. So there is a number of attacks.
That could be done on the network.
Of mirrors that can't account and trust really address as you are.
So that's why it was important for us to implement it.
Now I really wish that we could.
Fix the upstream code base in conda.
So that we can install alternative keys for conda, Forge and other channels and.
Then we will be able to enable.
Package mirrors for these key channels used by the community.
But really distribution is just one part.
Of supply Chain security. There is another entire field that Mamba is not addressing at all, which is reproducible builds and guaranteeing that the content of the package that is shipped is legitimate, which is also a concern for many organizations. So we've worked with companies that build their own kandabased distribution from source and don't get any binary from the Internet, but use effectively Kanda Forge as some kind of Wikipedia of how to build stuff, because it's actually really hard to build the entire scientific computing stack from scratch. And so conda Forge is a really good source of information on this.
Gregor Vand
Yeah, I think we had an episode on not that long ago with a company, chainguard and that's sort of their goal is reproducible builds containers that they build from source. And I don't know what if any interaction they have with Anaconda Conda based packages at the moment. But yeah, just if any listeners are interested just in that pure topic generally, that's maybe an episode to go to. What would you say in terms of are there plans to kind of deliberately work with more, whether it's open source projects or companies that are aiming to go this direction on reproducible builds, or is it a sort of major concern at the moment?
Sylvain Corle
Not specifically reproducible builds because we don't have funding for this, It's a hard enough problem that we can't just hack our way around it. We're talking about making sure that the packages that are uploaded on channels have been produced by the people we think they are produced by. So sort of really signing at the build time and uploading them to package servers.
Gregor Vand
Let's switch gears to webassembly. Kind of exciting that there's quite a large amount of support now for webassembly from the Mamba ecosystem. Could you maybe just sort of speak to what inspired, I guess the decision to sort of add this support and like what has been that sort of journey of evolution for this support and what's kind of being produced as a result of this?
Sylvain Corle
Yeah, so this is probably the thing that I'm the most enthusiastic about these days in my work overall.
And it all started from a grant proposal.
So we were actually writing a grant for the French government because we're a French company, to develop a jupyter based platform for secondary education. So high school education to get kids to learn Python.
Around the same time, a group of.
High school teachers here in Paris worked on an interesting project called Baston, which is a pun in French which kind of sounds like Python, but it's A slang word for fist fight. And so Bastogne was a sort of fork of the classic Jupyter notebook.
But instead of coding out to a.
Server for executing the code that you would type, it would use this Python distribution in the browser called Biodite. And so this whole thing started in 2019 and they got Baston to be in a workable state basically and they deployed it for the Paris school district and it worked really well. And so other districts in the country started, you know, showing interest and so they signed, you know, agreements with the other districts and progressively they expanded this project to the entire country.
So the thing has, you know, matured.
A bit, became more solid over the years and now. So this project, like the deployment is called Capital Kapital, has half a million registered users and they have over 200,000 sessions per week, user sessions per week. And all of it is entirely served from one machine. So how come the main reason is that we like, by using webassembly, by running user code in the browser, we can become free of having to run have a Docker image running the cloud for each user session. Right? So this scalability is crazy. Just to give a comparison, UC Berkeley runs a data science class course called Data8.
It's a really big one. I think they have over 10,000 registered students. But there is a team of DevOps.
Engineers that operates the Kubernetes based deployment of Jupyter underlying this.
And it costs over $100,000 per year to run in cloud compute.
So there is a team of DevOps engineers and then there is significant hosting costs for allowing this, which is essentially having one Docker image per user session. Now if you don't need this, all.
There is on this server, hosted literally in the basement of high school here.
In Paris, is a content management system for the user notebooks.
And that's it.
So now if we start making multiplications, you realize, oh wait, France only has so many high school students.
It's not a very big country.
Right? And not all of them learn Python anyways. But if you consider bigger country like Nigeria, they have over 200 million people at the moment. But the forecast is that the population is probably going to grow by at least 100 million in the next 25 years. And most of these kids are going to go to high school and presumably in the 21st century they would want to learn programming. And at this scale, is it even feasible to have a Kubernetes based deployment.
Of Jupyter to learn Python?
I'm not so sure. Right. And if Nigeria wanted to do this. They are not the home of Microsoft or AWS or Alibaba. They would probably need to rent that space on someone else's cloud, right? While if they use a system based on what I just described in webassembly, they will be able to host their platform in a sovereign fashion. So to me this is really enormous because this model can be used to teach programming to a billion kids. It just works, right? Because basically you run everything on the browsers of the end user, right?
Gregor Vand
Just to put this in context, just to take super basic points here, what is a very base level of machine needed to run that just in a nice way on the client side, is what I mean.
Sylvain Corle
Well, depends on what you want to run.
Gregor Vand
Obviously the example you gave with, for example, the high school kids learning Python.
Sylvain Corle
Okay, so what kind of Python is a high school kid going to write?
They are going to learn how to.
Compute the greatest common denominator. The complexity of that code is probably lesser than the rendering of the ui. So for very basic things we don't need much, right? And you can already expose this kind of interactive computing environment for them. So problem of capital and this deployment that I talked about earlier is that they forked the original classic notebook from 2016 and this code base obviously is not maintained anymore and poses many challenges in terms of accessibility, in terms of security. And so this was the reason why we proposed to build a jupyter Lab based solution that would also use this new jupyterlab based notebook. And we called it jupyterlite.
And as soon as we started the.
Jupyterlight project, it also grew in adoption very quickly.
Now if you go to numpy.org and.
You scroll a bit, you'll find the Jupyterlite console.
You can try NumPy in the browser and there is nothing running in the cloud. If you go to the scikit Learn.
Documentation you will find renabled code snippets. And now they are going to run the MOOC on jupyterlight as well.
If you visit the Sympy project documentation.
You will also find a console there to try out Sympy in the browser. So after we developed jupyterlight we realized that we wanted to start expanding a bit.
What you can do in the browser.
And biodite is really meant to be about Python and the Python packaging ecosystem and we wanted to do a lot more and we started this project called emscripten Forge, which is a distribution of CONDA packages for the browser that goes way beyond the Python ecosystem.
One thing for example, is that we.
Recently released a jupyterlight terminal with an emulator of bash that runs in the browser. And then you can start typing bash commands like grep, sediment, touch, cat less, whatever and see it reflected in your file system and address these files in.
Notebooks running in your browser. And there is another ongoing project to.
Build R packages for webassembly and all of this using the same CONDA based and Mamba based package manager. So and even like, even beyond education, I think this is going to be really important for scientific publishing and long term reproducibility. For example, today any binary that runs on your computer, like be it ARM or x86 is probably going to require some kind of emulator to run on a machine in 20 years, while WebAssembly is a web standard. So presumably webassembly binaries should be runnable by web browsers in 20 years. So what I think is that a.
Bundle of a jupyter notebook and a.
Bunch of webassembly packages and small data set, all served statically at a given URL, is like a time capsule. It's like a website from the 90s that we can still see today. So this time capsule as a research paper doing some number crunching and data analysis and some discovery, could still be runnable in 20 years. And this to me is a real revolution.
So we're trying to push as much.
As possible the boundaries of what's possible in the browser.
Gregor Vand
Yeah, it's really fascinating way of looking at it. I mean, at the end of the day the browser has kind of become almost the OS of the average user these days. Generally most, I would say unpower users are basically just working through their browser for most things. But that said, there's still a way to go for what could in theory be run in the browser. And obviously you've given this great example of jupyterlite and made possible by emscriptionforge. And I guess my question is Mamba as a whole, is there going to be a fork in the road where if the webassembly side of things is, I don't want to say takes off, but you know what I mean, is there going to be a point where you have to actually make a choice between what the focus of Mamba is? Potentially.
Sylvain Corle
So this refactor that I spoke about earlier where we try to make Mamba morph a toolkit is really paying off.
Here because we are writing JavaScript bindings to some of the components of that.
Toolkit so that we can actually do the dependency resolution in the browser and Download things from CDNs from the browser directly and create Countdown environment in browser. So yeah, although in our team at the moment there are probably more people working on webassembly focus than on the core C code base of Mamba. But for obvious reasons whether the initial lift is really significant and there is a lot of code to be written for pieces that are just simply missing.
Gregor Vand
Gotcha. So I mean we are slightly cruising towards the end of the episode today. I think just in general it would be. I just want you to have a platform to be able to speak to developers here as much as being able to speak on behalf of Mamba and quantstack generally. What would you say? What's the most important thing that you think people should know about your vision for the future of package management? In terms of picking up Mamba versus anything else or just generally?
Sylvain Corle
For me this webassembly story is the.
Crossover between Jupyter and our effort on.
Package management and just having, I mean this idea of providing platform that can be used to teach programming to billion kids potentially and that will just work at this scale. If I have the opportunity to do.
This, I think it's probably the greatest.
Opportunity thing that I could do professionally in my life.
And so I'm taking it. I want to try. Right.
Maybe this is going to be another.
Platform that's going to be that.
But I think we have a shot. So I want to do this upholding the principles that we laid out in.
The very beginning about the open source.
Movement and the fact that the tools should be opened and openly governed. So join us.
That's the message to the people.
I think there is something happening now in the package management ecosystem as well as in the Jupyter ecosystem that's really important. And I've read somewhere that there was a survey. Is it IBM?
I'm not sure. They were trying to estimate the number.
Of users of Jupyter and the answer was probably in the order of magnitude of 10 million people in the world, which I think is pretty fair. Certainly more than a million and it's probably not 100. Right. So 10 million is probably not a crazy number.
If we ever get to 100 million.
It'S going to be with a troll like Jupiter 9. That's what I think. And yeah, we need help.
Gregor Vand
Yeah. Well just off the back of that, where's the best place to go? What's the best sort of inroad for someone to come when you talk about joining you, where should they start we.
Sylvain Corle
Have a number of easy fix issues in the relevant repositories on GitHub. Both Jupyter and Mamba have public meetings that you can find references to in our websites. So yeah, it's easy to engage, but just a pull request fixing the tiny usability thing that you find annoying when using it is already super welcome.
Gregor Vand
Awesome. I mean I think that's just great advice for anyone looking to get interested in open source, full stop. But I think you've heard it from the source today, which is it's a very welcoming community, it sounds like and exactly just lend a small hand and a who knows where that can go. So Sylvain, it's been fantastic to have you here today. I think again, some quite meaty topics and obviously Mamba is doing some pretty huge things and obviously there's always going to be some different groups in every area of tech and everyone has their different approaches and I think it's always great for our listener base to get to hear from those. They might have maybe seen some online discussions or even just they've really read a pull request or GitHub issue kind of thread and now they're getting to hear from the voice. So I think it's been a really valuable discussion.
Sylvain Corle
Thanks.
I wanted before we leave this, to make a shout out to Wolf Wolprecht. So Wolf is a former employee of QuantStack and he's the person who started the Mamba project while he was at Quantstack and did a lot of the initial work. Now Mamba is maintained by a team of four or five people working on the code base. And Wolf actually is still a big.
Driver behind this broader community.
He founded a company called Prefix Dev, which is the company behind Pixi. It's a set of tools built in Rust for addressing other package management issues and also seeks compatibility with the current ecosystem in terms of package format. And I would say that Rolf, in the past few years, not just with.
Mobab, and also what's going on with.
Pixie at the moment, has been one.
Of the main drivers for change and.
Innovation in that space. And maybe that's another message for people. They should really follow what's going on there.
Gregor Vand
Fantastic. Yeah, well, thanks for calling that one out and I hope we get to catch up again in, who knows, maybe a year's time or something like that. We'd love to be seeing where things are going, especially on the webassembly side. That just sounds a very exciting place to kind of be epicenter.
Sylvain Corle
So yeah, thank you, Gregor.
Gregor Vand
Thanks so much.
Podcast Summary: Software Engineering Daily - "Mamba and Software Package Security with Sylvain Corlay"
Release Date: January 23, 2025
Guests: Sylvain Corlay (CEO of QuantStack), Gregor Vand (Host, Software Engineering Daily)
Topics: QuantStack's role in the scientific computing ecosystem, Mamba and its 2.0 release, software supply chain security, vendor neutrality, and the integration of WebAssembly in package management.
Timestamp: [00:00 - 03:59]
Sylvain Corlay opens the discussion by introducing QuantStack, an open-source technology company based in Paris, specializing in tools for data science, scientific computing, and visualization. QuantStack is renowned for maintaining essential projects like Jupyter, the Conda Forge package channel, and the Mamba package manager.
Key Points:
Notable Quote:
“QuantStack more than a team of open source developers. We're not a startup, so we operate under more of a service consultancy model...” — Sylvain Corlay [03:12]
Timestamp: [04:10 - 07:57]
The conversation shifts to Conda, a pivotal tool in the scientific computing ecosystem. Sylvain clarifies common misconceptions about Conda, emphasizing that it is a general-purpose package manager akin to YUM or dpkg, rather than being specific to Python.
Key Points:
Notable Quote:
“Reproducibility is also another key problem in scientific computing and in science in general. So being able to switch back and forth… is really important.” — Sylvain Corlay [07:09]
Timestamp: [08:17 - 12:46]
Sylvain introduces Mamba, an alternative to Conda designed for speed and efficiency. Originally created as a workaround for Conda's performance issues with large package repositories like Conda Forge, Mamba has evolved into a robust package manager written in C.
Key Points:
Notable Quote:
“MAMBA was originally built to be used as a command line utility, but some people started using it as a toolkit. So Mamba 2 is in many ways almost a rewrite of Mamba in a more deliberate software engineering approach.” — Sylvain Corlay [11:12]
Timestamp: [13:18 - 17:39]
A significant portion of the discussion delves into the importance of vendor neutrality in package management. Sylvain underscores the necessity of an open and unbiased ecosystem to prevent any single entity from dominating the software distribution landscape.
Key Points:
Notable Quote:
“The notion of environment is really key… reproducibility is also another key problem in scientific computing and in science in general.” — Sylvain Corlay [07:09]
Timestamp: [18:34 - 29:03]
Gregor shifts the conversation to software supply chain security, an increasingly critical aspect of modern software development. Sylvain explains how Mamba 2.0 incorporates features to bolster security, ensuring the integrity and authenticity of packages.
Key Points:
Notable Quote:
“Conda separation security… Mamba 2 allows anyone to provide their own install their public keys… and it's bound to vendor neutrality.” — Sylvain Corlay [20:03]
Timestamp: [31:18 - 41:32]
Exploring beyond traditional package management, Sylvain discusses the integration of WebAssembly (WASM) into the Mamba ecosystem, highlighting its transformative potential in education and scientific publishing.
Key Points:
Notable Quote:
“This time capsule as a research paper doing some number crunching and data analysis and some discovery, could still be runnable in 20 years. And this to me is a real revolution.” — Sylvain Corlay [39:49]
Timestamp: [42:05 - 45:00]
As the discussion nears its conclusion, Sylvain shares his optimistic vision for the future of package management, emphasizing open governance and the transformative potential of WebAssembly.
Key Points:
Notable Quote:
“If you have the opportunity to do this, I think it's probably the greatest opportunity thing that I could do professionally in my life.” — Sylvain Corlay [42:08]
Timestamp: [45:00 - 46:17]
In the closing remarks, Sylvain acknowledges key contributors and reiterates the importance of community involvement in driving innovation within the package management space.
Key Points:
Notable Quote:
“Maybe that's another message for people. They should really follow what's going on there.” — Sylvain Corlay [45:23]
The episode provides an in-depth exploration of Mamba's evolution, its role in enhancing software package management, and the broader implications for scientific computing. Sylvain Corlay’s insights highlight the crucial interplay between vendor neutrality, security, and innovative technologies like WebAssembly in shaping the future of software engineering.
For developers and enthusiasts interested in contributing or learning more, Sylvain encourages engaging with open-source repositories on GitHub and participating in public meetings hosted by the Mamba and Jupyter communities.
Additional Resources:
End of Summary