Give us your feedback on a proposal to extend Dockter to allow it to build Nix environments.
What’s the problem we’re trying to solve?
Docker is useful for creating reproducible computing environments for research. Dockter is a tool we’ve been working on lately which aims to make it easier for researchers to create Docker images.
Docker is a great tool, but there are aspects of Docker images that hinder their use for reproducible research. These include:
long build times (due to poor reuse of build artifacts)
very large size of images (2-3Gb is common for images with R and/or Python)
poor reusability of images (e.g., adding one R package might create a new big Docker layer/image) thus making it hard to capture the long tail of domain-specific packages and save space on shared computing infrastructures
weak isolation of tools i.e. it’s hard to control for OS-specific environment influence on reproducibility (Python 2.7 compiled for an Ubuntu image might behave differently than Python 2.7 compiled on CentOS) and also traceback hidden dependencies
Nix has the potential to provide solutions to these problems. There are plenty of resources explaining why Nix is a great solution for creating reproducible computing environments. In the research context see:
We are already using Nix for Stencila - to create our
core environments and Docker images. And, we always saw Dockter as a stepping stone towards a similar approach based Nix.
But, although it’s powerful, Nix has a very steep learning curve. This Hacker News post has some great comments referencing Nix’s power vs usability issue (our emphasis):
I love the core concepts behind Nix, and have great respect for their engineering abilities. However I am skeptical of their ability to achieve broad adoption beyond their current community of passionate experts.There are two reasons for my skepticism: 1. User experience. Unless you’re one of the “passionate experts”, the Nix user experience is pretty terrible. The learning curve is punishing compared to competing systems.
this is basically the same story as
git(sans the cultural gravity of Linus). Incredibly well thought-out internal architecture, extremely talented developers, functional-style immutable data structures, idiosyncratic frontend that’s a low-level façade over the internals, painful learning curve, often difficult (but possible!) to recover from real-world corruption, etc.
Docker lacks many of the features that make Nix so powerful. On the other hand Docker is much more pleasant to use, for me at least. If a tool came along that combined the simplicity of Docker with the power of Nix, I would probably stop using both, and just use that. I think Nix itself could be that tool… if only they fixed the issues I described above.
So, in summary, the problem is “Docker is good, but has problems. Nix has solutions to these problems, but is very intimidating to use.”
What’s the solution?
Dockter currently provides an easier way for researchers to use Docker. We want to do the same thing for Nix: create a smoother on-ramp for researchers to use Nix.
Dockter currently takes source code and/or requirements files in a folder and generates a
Dockerfile. When this project is finished it will be able to generate a
default.nix file from which a Nix environment can be built (or Nix-based Docker image created).
We have designed the architecture of the Dockter code so that it can be easily extended to build targets other than Docker. Originally we thought we would reuse the code but create another, Nix specific, command line tool (called
tonix or similar). But from a code maintenance perspective, despite the name, it makes more sense just to extend Dockter with this functionality.
This project will add the
--nix switch to the Dockter command line tool. When the switch is used, Dockter will generate a
default.nix file for a project, instead of a
Dockerfile. The same
execute subcommands will be available but will have different semantics with the
Instead of learning how to write the Nix language, a user will be able to generate a
default.nix file for the project using:
dockter --nix compile
For this subcommand, users won’t need either Docker or Nix to be installed.
A user will be able to build a Docker image, with a Nix environment in it, using the
build subcommand and the
dockter --nix build
Why would a user want to do this? Nix can generate lightweight Docker images (i.e. that only include the binaries and dependencies for the tools in
default.nix and no OS tools) which can be considerably smaller than the Docker image generated via a Dockerfile. Dockter users that want the advantages of Nix for defining reproducible computing environments but that want to, or are limited to, running Docker might opt for this approach.
For this subcommand, users will need to have Docker installed but not Nix.
A user will be able to execute the project within a Nix shell (automatically installs dependencies defined in the environmet if necessary):
dockter --nix execute
This will be an alternative to handwriting your own
default.nix file and doing,
What else is out there? How does this differ?
There are some existing tools that do exactly what we propose to do in this project: generate Nix files from source code and/or language requirements files. But they are language specific:
Also, all of these use a requirements file (i.e.
package.json) so don’t do the source code scanning for
require etc that Dockter already does.
There is a
repo2docker buildpack for Nix which builds a Docker image from an existing
default.nix (what we describe in “Docker images via Nix” below). But this requires you to write the Nix files yourself - the whole point of what we are doing with this extension is to generate the Nix file for you.
Also worth noting is
nijs: a potentially useful tool for us to generate Nix files from within Javasript/Typescript.
Who is this for?
We try to identify groups of researchers that are the primary target for something we develop. For this project it’s:
- researchers that are advanced coders, are currently using Docker, and want to get started with Nix as a better way of building reproducible compute environs
- researchers that are novice coders, who don’t want to learn either Docker or Nix, and just want to have a compute environment automagically built for them that they can share with others
- research computing infrastructure providers who want to have a more robust and efficient way of providing for the almost infinite number of compute environments that are necessary to cater for the long tail of packages and package versions.
How does this fit with our vision of composable, reusable, and accessible executable documents?
Having reproducible computing environments is a necessary foundation for executing any documents that have system dependencies (ie. everything that can’t be run in the browser: R, Python etc). Dockter adds value for our users because it allows for custom compute environments for a researcher’s project based on the files in that project. This project goes a step further by using Nix to improve the speed that those compute environments can be (a) rebuilt after each change, and (b) provisioned for each user session.