Dockter for Nix 🌀


#1

Give us your feedback on a proposal to extend Dockter to allow it to build Nix environments.

What’s the problem we’re trying to solve?

Docker is useful for creating reproducible computing environments for research. Dockter is a tool we’ve been working on lately which aims to make it easier for researchers to create Docker images.

Docker is a great tool, but there are aspects of Docker images that hinder their use for reproducible research. These include:

  • long build times (due to poor reuse of build artifacts)

  • very large size of images (2-3Gb is common for images with R and/or Python)

  • poor reusability of images (e.g., adding one R package might create a new big Docker layer/image) thus making it hard to capture the long tail of domain-specific packages and save space on shared computing infrastructures

  • weak isolation of tools i.e. it’s hard to control for OS-specific environment influence on reproducibility (Python 2.7 compiled for an Ubuntu image might behave differently than Python 2.7 compiled on CentOS) and also traceback hidden dependencies

Nix has the potential to provide solutions to these problems. There are plenty of resources explaining why Nix is a great solution for creating reproducible computing environments. In the research context see:

We are already using Nix for Stencila - to create our core environments and Docker images. And, we always saw Dockter as a stepping stone towards a similar approach based Nix.

But, although it’s powerful, Nix has a very steep learning curve. This Hacker News post has some great comments referencing Nix’s power vs usability issue (our emphasis):

I love the core concepts behind Nix, and have great respect for their engineering abilities. However I am skeptical of their ability to achieve broad adoption beyond their current community of passionate experts.There are two reasons for my skepticism: 1. User experience. Unless you’re one of the “passionate experts”, the Nix user experience is pretty terrible. The learning curve is punishing compared to competing systems.

this is basically the same story as git (sans the cultural gravity of Linus). Incredibly well thought-out internal architecture, extremely talented developers, functional-style immutable data structures, idiosyncratic frontend that’s a low-level façade over the internals, painful learning curve, often difficult (but possible!) to recover from real-world corruption, etc.

Docker lacks many of the features that make Nix so powerful. On the other hand Docker is much more pleasant to use, for me at least. If a tool came along that combined the simplicity of Docker with the power of Nix, I would probably stop using both, and just use that. I think Nix itself could be that tool… if only they fixed the issues I described above.

So, in summary, the problem is “Docker is good, but has problems. Nix has solutions to these problems, but is very intimidating to use.”

What’s the solution?

Dockter currently provides an easier way for researchers to use Docker. We want to do the same thing for Nix: create a smoother on-ramp for researchers to use Nix.

Dockter currently takes source code and/or requirements files in a folder and generates a Dockerfile. When this project is finished it will be able to generate a default.nix file from which a Nix environment can be built (or Nix-based Docker image created).

We have designed the architecture of the Dockter code so that it can be easily extended to build targets other than Docker. Originally we thought we would reuse the code but create another, Nix specific, command line tool (called tonix or similar). But from a code maintenance perspective, despite the name, it makes more sense just to extend Dockter with this functionality.

This project will add the --nix switch to the Dockter command line tool. When the switch is used, Dockter will generate adefault.nix file for a project, instead of a Dockerfile. The same compile, build, and execute subcommands will be available but will have different semantics with the --nix flag.

Compile

Instead of learning how to write the Nix language, a user will be able to generate a default.nix file for the project using:

dockter --nix compile

For this subcommand, users won’t need either Docker or Nix to be installed.

Build

A user will be able to build a Docker image, with a Nix environment in it, using the build subcommand and the --nix flag,

dockter --nix build

Why would a user want to do this? Nix can generate lightweight Docker images (i.e. that only include the binaries and dependencies for the tools in default.nix and no OS tools) which can be considerably smaller than the Docker image generated via a Dockerfile. Dockter users that want the advantages of Nix for defining reproducible computing environments but that want to, or are limited to, running Docker might opt for this approach.

For this subcommand, users will need to have Docker installed but not Nix.

Execute

A user will be able to execute the project within a Nix shell (automatically installs dependencies defined in the environmet if necessary):

dockter --nix execute

This will be an alternative to handwriting your own default.nix file and doing,

nix-shell

What else is out there? How does this differ?

There are some existing tools that do exactly what we propose to do in this project: generate Nix files from source code and/or language requirements files. But they are language specific:

Also, all of these use a requirements file (i.e. requirements.txt or package.json) so don’t do the source code scanning for library, require etc that Dockter already does.

There is a repo2docker buildpack for Nix which builds a Docker image from an existing default.nix (what we describe in “Docker images via Nix” below). But this requires you to write the Nix files yourself - the whole point of what we are doing with this extension is to generate the Nix file for you.

Also worth noting is nijs: a potentially useful tool for us to generate Nix files from within Javasript/Typescript.

Who is this for?

We try to identify groups of researchers that are the primary target for something we develop. For this project it’s:

  • researchers that are advanced coders, are currently using Docker, and want to get started with Nix as a better way of building reproducible compute environs
  • researchers that are novice coders, who don’t want to learn either Docker or Nix, and just want to have a compute environment automagically built for them that they can share with others
  • research computing infrastructure providers who want to have a more robust and efficient way of providing for the almost infinite number of compute environments that are necessary to cater for the long tail of packages and package versions.

How does this fit with our vision of composable, reusable, and accessible executable documents?

Having reproducible computing environments is a necessary foundation for executing any documents that have system dependencies (ie. everything that can’t be run in the browser: R, Python etc). Dockter adds value for our users because it allows for custom compute environments for a researcher’s project based on the files in that project. This project goes a step further by using Nix to improve the speed that those compute environments can be (a) rebuilt after each change, and (b) provisioned for each user session.


#2

Commenting here with my background being nix/nixpkgs, some npm development, and primarily a python developer. If I am reading this correctly the idea is to convert a combination of requirements.txt, package.json, DESCRIPTION, etc and produce a nix representation of the dependencies in a file default.nix. I do not have much experience with R so I cannot speak about R packaging.

I see two routes nixpkgs or using node2nix, pip2nix, etc to develop the default.nix file.

Using nixpkgs has some limitations. javascript will most likely never be fully supported in nixpkgs since the packages update too quickly and there are too many of them. Python support is great but right now the package names are not all normalized correctly which will make conversion from requirements.txt non-trivial. Additionally requrements.txt, package.json, etc. allow you to fix the version. This is not currently the nixpkgs philosophy.

So leaves us with using node2nix, pip2nix, etc. All of these tools require visiting their respective package repositories and building the default.nix for the dependencies. This can be time consuming especially for npm dependencies. I don’t think that it would be a problem to combine them so I would assume this would be the easier route but also not guaranteed to work properly.

Of course a combination of both is also possible as well.


#3

Thanks @costrouc, I think you raise some valid points. We are probably going to try the easier route first of reusing node2nix and pip2nix and see how far we can go with those tools and how we can deal with the issues you mentioned. R will probably require implementing our own solution, which depending on how it goes, might make us revisit node and python for improvements.


#4

One issue that I see you will run into quite quickly with pip2nix and similar tools is that typically scientific software requires some compiled c code. For example numpy in python requires compiling a c extension. I cannot see an easy way for pip2nix to handle this. Nixpkgs does handle this. I also do not think that R support within nixpkgs would be out of reach and is quite doable. I just don’t think that the community is that large in nixpkgs and would love to see it grow.

So sadly no perfect solution. Long-term nixpkgs should be the right choice for anything besides npm.