Tuesday, December 19, 2017

Bypassing NPM's content addressable cache in Nix deployments and generating expressions from lock files

Roughly half a year ago, Node.js version 8 was released that also includes a major NPM package manager update (version 5). NPM version 5 has a number of substantial changes over the previous version, such as:

  • It uses package lock files that pinpoint the resolved versions of all dependencies and transitive dependencies. When a project with a bundled package-lock.json file is deployed, NPM will use the pinpointed versions of the packages that are in the lock file making it possible to exactly reproduce a deployment elsewhere. When a project without a lock file is deployed for the first time, NPM will generate a lock file.
  • It has a content-addressable cache that optimizes package retrieval processes and allows fully offline package installations.
  • It uses SHA-512 hashing (as opposed to the significantly weakened SHA-1), for packages published in the NPM registry.

Although these features offer significant benefits over previous versions, e.g. NPM deployments are now much faster, more secure and more reliable, it also comes with a big drawback -- it breaks the integration with the Nix package manager in node2nix. Solving these problems were much harder than I initially anticipated.

In this blog post, I will explain how I have adjusted the generation procedure to cope with NPM's new conflicting features. Moreover, I have extended node2nix with the ability to generate Nix expressions from package-lock.json files.

Lock files


One of the major new features in NPM 5.0 is the lock file (the idea itself is not so new since NPM-inspired solutions such as yarn and the PHP-based composer already support them for quite some time).

A major drawback of NPM's dependency management is that version specifiers are nominal. They can refer to specific versions of packages in the NPM registry, but also to version ranges, or external artifacts such as Git repositories. The latter category of version specifiers affect reproducibility -- for example, the version range specifier >= 1.0.0 may refer to version 1.0.0 today and to version 1.0.1 tomorrow making it extremely hard to reproduce a deployment elsewhere.

In a development project, it is still possible to control the versions of dependencies by using a package.json configuration that only refers to exact versions. However, for transitive dependencies that may still have loose version specifiers there is only very little control.

To solve this reproducibility problem, a package-lock.json file can be used -- a package lock file pinpoints the resolved versions of all dependencies and transitive dependencies making it possible to reproduce the exact same deployment elsewhere.

For example, for the NiJS package with the following package.json configuration:

{
  "name": "nijs",
  "version": "0.0.25",
  "description": "An internal DSL for the Nix package manager in JavaScript",
  "bin": {
    "nijs-build": "./bin/nijs-build.js",
    "nijs-execute": "./bin/nijs-execute.js"
  },
  "main": "./lib/nijs",
  "dependencies": {
    "optparse": ">= 1.0.3",
    "slasp": "0.0.4"
  },
  "devDependencies": {
    "jsdoc": "*"
  }
}

NPM may produce the following partial package-lock.json file:

{
  "name": "nijs",
  "version": "0.0.25",
  "lockfileVersion": 1,
  "requires": true,
  "dependencies": {
    "optparse": {
      "version": "1.0.5",
      "resolved": "https://registry.npmjs.org/optparse/-/optparse-1.0.5.tgz",
      "integrity": "sha1-dedallBmEescZbqJAY/wipgeLBY="
    },
    "requizzle": {
      "version": "0.2.1",
      "resolved": "https://registry.npmjs.org/requizzle/-/requizzle-0.2.1.tgz",
      "integrity": "sha1-aUPDUwxNmn5G8c3dUcFY/GcM294=",
      "dev": true,
      "requires": {
        "underscore": "1.6.0"
      },
      "dependencies": {
        "underscore": {
          "version": "1.6.0",
          "resolved": "https://registry.npmjs.org/underscore/-/underscore-1.6.0.tgz",
          "integrity": "sha1-izixDKze9jM3uLJOT/htRa6lKag=",
          "dev": true
        }
      }
    },
    ...
}

The above lock file pinpoints all dependencies and development dependencies including transitive dependencies to exact versions, including the locations where they can be obtained from and integrity hash codes that can be used to validate them.

The lock file can also be used to derive the entire structure of the node_modules/ folder in which all dependencies are stored. The top level dependencies property captures all packages that reside in the project's node_modules/ folder. The dependencies property of each dependency captures all packages that reside in a dependency's node_modules/ folder.

If NPM 5.0 is used and no package-lock.json is present in a project, it will automatically generate one.

Substituting dependencies


As mentioned in an earlier blog post, the most important technique to make Nix-NPM integration work is by substituting NPM's dependency management activities that conflict with Nix's dependency management -- Nix is much more strict with handling dependencies (e.g. it uses hash codes derived from the build inputs to identify a package as opposed to a name and version number).

Furthermore, in Nix build environments network access is restricted to prevent unknown artifacts to influence the outcome of a build. Only so-called fixed output derivations, whose output hashes should be known in advance (so that Nix can verify its integrity), are allowed to obtain artifacts from external sources.

To substitute NPM's dependency management, populating the node_modules/ folder ourselves with all required dependencies and substituting certain version specifiers, such as Git URLs, used to suffice. Unfortunately, with the newest NPM this substitution process no longer works. When running the following command in a Nix builder environment:

$ npm --offline install ...

The NPM package manager is forced to work in offline mode consulting its content-addressable cache for the retrieval of external artifacts. If NPM needs to consult an external resource, it throws an error.

Despite the fact that all dependencies are present in the node_modules/ folder, deployment fails with the following error message:

npm ERR! code ENOTCACHED
npm ERR! request to https://registry.npmjs.org/optparse failed: cache mode is 'only-if-cached' but no cached response available.

At first sight, the error message suggests that NPM always requires the dependencies to reside in the content-addressable cache to prevent it from downloading it from external sites. However, when we use NPM outside a Nix builder environment, wipe the cache, and perform an offline installation, it does seem to work properly:

$ npm install
$ rm -rf ~/.npm/_cacache
$ npm --offline install

Further experimentation reveals that NPM augments the package.json configuration files of all dependencies with additional metadata that are prefixed by an underscore (_):

{
  "_from": "optparse@>= 1.0.3",
  "_id": "optparse@1.0.5",
  "_inBundle": false,
  "_integrity": "sha1-dedallBmEescZbqJAY/wipgeLBY=",
  "_location": "/optparse",
  "_phantomChildren": {},
  "_requested": {
    "type": "range",
    "registry": true,
    "raw": "optparse@>= 1.0.3",
    "name": "optparse",
    "escapedName": "optparse",
    "rawSpec": ">= 1.0.3",
    "saveSpec": null,
    "fetchSpec": ">= 1.0.3"
  },
  "_requiredBy": [
    "/"
  ],
  "_resolved": "https://registry.npmjs.org/optparse/-/optparse-1.0.5.tgz",
  "_shasum": "75e75a96506611eb1c65ba89018ff08a981e2c16",
  "_spec": "optparse@>= 1.0.3",
  "_where": "/home/sander/teststuff/nijs",
  "name": "optparse",
  "version": "1.0.5",
  ...
}

It turns out that when the _integrity property in a package.json configuration matches the integrity field of the dependency in the lock file, NPM will not attempt to reinstall it.

To summarize, the problem can be solved in Nix builder environments by running a script that augments the package.json configuration files with _integrity fields with the values from the package-lock.json file.

For Git repository dependency specifiers, there seems to be an additional requirement -- it also seems to require the _resolved field to be set to the URL of the repository.

Reconstructing package lock files


The fact that we have discovered how to bypass the cache in a Nix builder environment makes it possible to fix the integration with the latest NPM. However, one of the limitations of this approach is that it only works for projects that have a package-lock.json file included.

Since lock files are still a relatively new concept, many NPM projects (in particular older projects that are not frequently updated) may not have a lock file included. As a result, their deployments will still fail.

Fortunately, we can reconstruct a minimal lock file from the project's package.json configuration and compose dependencies objects by traversing the package.json configurations inside the node_modules/ directory hierarchy.

The only attribute that cannot be immediately derived are the integrity fields containing hashes that are used for validation. It seems that we can bypass the integrity check by providing a dummy hash, such as:

integrity: "sha1-000000000000000000000000000=",

NPM does not seem to object when it encounters these dummy hashes allowing us to deploy projects with a reconstructed package-lock.json file. The solution is a very ugly hack, but it seems to work.

Generating Nix expressions from lock files


As explained earlier, lock files pinpoint the exact versions of all dependencies and transitive dependencies and describe the structure of the entire dependency graph.

Instead of simulating NPM's dependency resolution algorithm, we can also use the data provided by the lock files to generate Nix expressions. Lock files appear to contain most of the data we need -- the URLs/locations of the external artifacts and integrity hashes that we can use for validation.

Using lock files for generation offer the following advantages:

  • We no longer need to simulate NPM's dependency resolution algorithm. Despite my best efforts and fairly good results, it is hard to truly make it 100% identical to NPM's. When using a lock file, the dependency graph is already given, making deployment results much more accurate.
  • We no longer need to consult external resources to resolve versions and compute hashes making the generation process much faster. The only exception seems to be Git repositories -- Nix needs to know the output hash of the clone whereas for NPM the revision hash suffices. When we encounter a Git dependency, we still need to download it and compute the output hash.

Another minor technical challenge are the integrity hashes -- in NPM lock files integrity hashes are in base-64 notation, whereas Nix uses heximal notation or its own custom base-32 notation. We need to convert the NPM integrity hashes to a notation that Nix understands.

Unfortunately, lock files can only be used in development projects. It appears that packages that are installed directly from the NPM registry, e.g. end-user packages that are installed globally through npm install -g, never include a package lock file. (It even seems that the NPM registry blacklist the lock files when publishing a package in the registry).

For this reason, we still need to keep our own implementation of the dependency resolution algorithm.

Usage


By adding a script that augments the dependencies' package.json configuration files with _integrity fields and by optionally reconstructing a package-lock.json file, NPM integration with Nix has been restored.

Using the new NPM 5.x features is straight forward. The following command can be used to generate Nix expressions for a development project with a lock file:

$ node2nix -8 -l package-lock.json

The above command will directly generate Nix expressions from the package lock file, resulting in a much faster generation process.

When a development project does not ship with a lock file, you can use the following command-line instruction:

$ node2nix -8

The generator will use its own implementation of NPM's dependency resolution algorithm. When deploying the package, the builder will reconstruct a dummy lock file to allow the deployment to succeed.

In addition to development projects, it is also possible to install end-user software, by providing a JSON file (e.g. pkgs.json) that defines an array of dependency specifiers:

[
  "nijs"
, { "node2nix": "1.5.0" }
]

A Node.js 8 compatible expression can be generated as follows:

$ node2nix -8 -i pkgs.json

Discussion


The approach described in this blog post is not the first attempt to fix NPM 5.x integration. In my first attempt, I tried populating NPM's content-addressable cache in the Nix builder environment with artifacts that were obtained by the Nix package manager and forcing NPM to work in offline mode.

NPM exposes its download and cache-related functionality as a set of reusable APIs. For downloading packages from the NPM registry, pacote can be used. For downloading external artifacts through the HTTP protocol make-fetch-happen can be used. Both APIs are built on top of the content-addressable cache that can be controlled through the lower-level cacache API.

The real difficulty is that neither the high-level NPM APIs nor the npm cache command-line instruction work with local directories or local files -- they will only add artifacts to the cache if they come from a remote location. I have partially built my own API on top of cacache to populate the NPM cache with locally stored artifacts pretending that they were fetched from a remote location.

Although I had some basic functionality supported, it turned out to be much more complicated and time consuming to get all functionality implemented.

Furthermore, the NPM authors never promised that these APIs are stable, so the implementation may break at some point in time. As a result, I have decided to look for another approach.

Availability


I just released node2nix version 1.5.0 with NPM 5.x support. It can be obtained from the NPM registry, Github, or directly from the Nixpkgs repository.

1 comment: