Deterministic evaluation of Nix expressions (and tracking who produced a derivation) #553

copumpkin · 2015-06-03T22:11:55Z

I don't have a good sense of how gnarly this is. It's certainly impure from a Haskell pure FP standpoint, and might not even be possible in the way I describe.

What I want is something like:

let
  bar = "hi ${import ./name.nix}";
  dataFile = /home/copumpkin/datafile.txt;
  foo = import ./blah.nix { msg = bar; data = dataFile; pkgs = import <nixpkgs>; };
in builtins.evalClosure foo

to return some summary that would allow me to reconstruct the evaluation of foo later, in as faithful a manner as possible. As such, it would need to include something like:

What ./name.nix resolved to (a full path)
What ./blah.nix resolved to (a full path)
That we loaded dataFile from the store (with the full store path used).
That we used and it resolved to a path.
Any additional dependencies outside of the ones above that arise from evaluating the rest, including other NIX_PATH entries, and other stuff that might be involved in evaluating it.

I could see one implementation just making evalClosure into a derivation containing a "self-contained" nix tree that represents the elements I list above in a way that just evaluates properly. Such a primop could be called snapshot or something like that. I don't expect it to be fully possible to make 100% effective, but if it could capture most evaluation structures or fail loudly, that would achieve 99% of the goal.

The ultimate goal, if it isn't obvious, is providing reproducible evaluations. I want cryptographic linkage between a derivation in the store and a "nix language closure" that leads to it.

I don't know what would happen if you pass a "thunk that has already been forced" (in Haskell lingo) into the function. Ideally it would still work properly, but that seems even harder.

Does what I'm saying make sense? Is there something I'm missing that would make this fundamentally impossible?

The text was updated successfully, but these errors were encountered:

copumpkin · 2015-06-04T17:15:30Z

Thoughts @edolstra @shlevy?

copumpkin · 2015-06-13T21:51:38Z

Okay, I think I have a more concrete proposal for how this could work:

My goal

I want a cryptographically verifiable link between a derivation and a nix expression that built it (the most common case will be NixOS's configuration.nix, but this would be handy for any derivation).

The problem I'm tackling is the fact that something like system.copySystemConfiguration only scratches the surface. The system derivation contains the configuration.nix that produced it, but configuration.nix doesn't exist in isolation. When it's evaluated, it gets evaluated in the context of a particular nixpkgs or nixos. Just copying the configuration.nix doesn't preserve that for me. Furthermore, I could be calling any number of things to copy files from outside the store, or be rebuilding my NixOS from a particular nixpkgs clone which might not even be in a git commit!

What I want is a "complete" copySystemConfiguration that captures all those additional sources of variation, so that when I look at that configuration.nix, I can audit the complete set of expressions and environment that produced it. With deterministic builds, I can also verify that nobody's tampered with my filesystem. Even without them, I can greatly reduce the diff space I'd have to consider when double checking the contents of my store.

If you accept its value for configuration.nix, it shouldn't be a huge jump to see why someone might want it for arbitrary derivations, particularly if it doesn't incur much disk space overhead.

To be clear, I want this to be auditable more than .drv files are. I consider .nix to be source, .drv to be some low-level IR, and store outputs to be opaque binaries. I want to take an opaque binary, find its source, audit it, and then verify that the source produces (something very close to) the binary I have.

Building blocks

A "record mode" for Nix

This is not a primop, but rather a flag you run nix-build and friends with. It keeps track of all "impure" things during an evaluation. Then a playback mode will have nix reading in this file and whenever it sees the things described in here, it returns the saved value rather than re-evaluating. For example, if you run something like nix-build --record -A foo.bar, you might end up with the following file in the store:

recording:

# The expression being evaluated
foo.bar

<nixpkgs> -> /nix/store/1234j1kafjsklaghkl15-nixpkgs/

# Nix spotted that I was importing something from myownchannel and copied it into the store for the sake of this recording.
<myownchannel> -> /nix/store/6316j13kl5j3k1l6kh612k-mychannel/

# We encountered ./some/path when evaluating our expression in path/someExpression.nix, so we record that it was used and make sure it points somewhere we save in the store
path/someExpression.nix:./some/path/data.file -> /nix/store/1253812672819612j-data.file

# Someone even used one of these! Don't let it escape!
builtins.getEnv "MYENV" -> "SOME IMPURE ENVIRONMENT VARIABLE"

# Someone called readFile! call the cops! (and copy the stuff into the store, making sure we redirect the file read to the new location)
builtins.readFile "/home/copumpkin/data.txt" -> builtins.readFile /nix/store/14781927581abjkljkalf9-data.txt

# You get the idea...

Thus, the semantics of --record are roughly:

Evaluate as normal
If you come across a "source of impurity" (we'd have to go through the various builtin/path constructs and decide which count):

If it's "simple", record any relevant input to the builtin and its output so you can replay later
If it involves the filesystem (reads a file, directory, resolves a path, etc.) and it isn't looking at the store, copy specified entity into the store and record a redirection to the store location
1. Output a spec file like the example above to the store, using a hash based on its contents
2. This is where things get funky: inject a reference to the spec file generated in (3) into the output derivation (to some canonical location, like a symlink from nix-support/recording), making sure that the hash of the spec file perturbs the hash of the output derivation appropriately.

A corresponding playback mode

Playback mode would take a derivation that contains a nix-support/recording and play it back:

It verifies the hash on the recording itself. This is trivial because the filename should contain the hash which is based off of its contents
It then starts evaluating the expression at the top of the recording file
Whenever it encounters one of the aforementioned "impure builtins", it will check the recording file to see if the impurity is recorded. If not, it'll fail loudly. If so, it'll skip performing the impure operation and return the recorded value.
After evaluating the expression, it performs the same injection of the recording into the output derivation and ensures the hash matches the one in the store (this would reveal if someone had tampered with the recording)
It builds the derivation, much like how nix-build --check would, reporting if there are differences (and potentially what they are).

copumpkin · 2015-06-13T22:12:25Z

I also think the intensional store wouldn't solve the same thing this solves, although it would change its flavor a bit. What this does (or tries to, at least) is provide a strong link between the input and the output.

vcunat · 2015-07-31T16:15:05Z

I probably don't understand your goals. Currently you can re-evaluate a configuration and see if it produces the same derivation (even the name is a strong hash). With some hacking, the derivations should even be well-diffable.

The problem I see is that in principle, all packages transitively referenced from configuration do affect your system, which is quite a lot to audit by itself.

magnetophon · 2015-11-15T15:52:32Z

We where just talking about this on NixConf2015, and I have to say: +1!

fkz · 2015-11-16T10:34:31Z

following the NixConf-discussion +1 from me too

bbarker · 2017-10-18T19:58:16Z

Is there any update on this? just looking for something along the lines of yarn lock (javascript) or pip freeze (python)

copumpkin · 2017-10-18T20:05:35Z

Those two are sort of different from this, and more akin to either how Nix works already or #520 if you squint.

Unless I'm misunderstanding you, Nix builds are already pretty deterministic, and if you lock down the channel and the expression you're evaluating, you're more locked down than e.g., a pip freeze or a yarn lock.

I did actually start work on a new version of #709 but haven't put much effort against it recently. I'd want to understand if we already do what you want though, because in most cases we behave more like the locked versions of other package managers than anything else. This issue is about an even higher level of determinism and reproducibility.

copumpkin · 2018-04-12T13:52:56Z

I've been doing a lot of work on this privately recently, and it's renewed my interest in it. I can't promise a timeline, but this is definitely not dead.

bbarker · 2018-04-12T15:14:44Z

@copumpkin Sorry for the super delayed reply. I never have as much time for Nix (and other things) as I'd like. Is there a way to do --record at the environment level? If I could easily generate reproducible environments for scientific pipelines, this would be exactly what I'm looking for.

Also, I applaud the efforts in this issue to make such a feature even "more deterministic". Awesome!

Edit I put my efforts on doing this at: https://stackoverflow.com/questions/50242387/how-to-record-a-reproducible-profile-in-nix-especially-from-nix-env/50257762#50257762 It looks like it will probably work, but haven't heavily tested it yet. Also, the user experience of doing this sort of thing could probably be improved if it was integrated a bit more into the standard tooling.

CMCDragonkai · 2018-07-25T13:06:07Z

@copumpkin I had started thinking about this concept just now.

Basically I noticed that as I wrote Nix expressions, they are not really pure. Since they refer to things that are in the external world. For example using import ./foo.nix is not really pure. It's referring to some other nix expression that could be swapped out underneath it. Basically 2 evaluations of the same nix expression does not result in the same result.

Instead what really happens is sort of like multi-stage evaluation. Where we have source that executes within the context of the outside world. And the production of a derivation from that Nix expression which is in the /nix/store is the "closed expression" or "closure" that ends up being deterministic.

I got thinking about this when thinking about development workspaces such as inside nix-shell project based workflows vs Go's shared workspace structure, and about how names we use in programming refer to things that is sometimes deterministic and other times depends on some constraints (such as versions) and other times refer to whatever is at that that path.

I'd be interested in this feature though, seems like it would be useful for debugging.

edolstra · 2018-07-25T13:43:40Z

@CMCDragonkai Regarding import ./foo.nix being impure, you may be interested in the --pure-eval flag in Nix 2, which disallows access to any files unless they were fetched by a call to builtins.fetch{Git,Mercurial,Tarball} with a revision or content hash. So multiple evaluations of the same command line arguments will produce the same result. See d4dcffd.

copumpkin · 2018-07-25T14:06:30Z

I love the direction --pure is going, but in practice we use enough of those constructs in nixpkgs that I at least can't just live in the --pure world without some sort of mechanism like what we put together in that CR. I do actually have a nearly finished new version of that CR using a lot more pure Nix (with scopedImport) to do most of the work, but ran into real life 😦

CMCDragonkai · 2018-07-28T02:17:08Z

@edolstra What about a builtins.fetchFile? That has a content addressed constraint as well? We have to expand all the nix-prefetch-* commands to support all these pure possibilities.

virusdave · 2019-09-11T19:17:53Z

@copumpkin Any update on this?

nixos-discourse · 2020-01-18T04:49:24Z

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/see-diff-between-generation-configurations-or-see-previous-generations-configuration-nix/5493/4

stale · 2021-02-16T03:49:08Z

I marked this as stale due to inactivity. → More info

stale · 2022-04-29T00:09:44Z

I closed this issue due to inactivity. → More info

roberth · 2023-11-17T11:14:03Z

The idea of a record mode is useful beyond just figuring out the evaluation-level inputs of an output like a NixOS toplevel.

Unfortunately "recording" doesn't compose, because the language is call-by-need: when you record one thing and then record the next, things that were already evaluated for the first will not be evaluated again, and therefore not be recorded. So a faithful recording of a second thing requires a restart of the evaluator.

Of course that doesn't mean that we couldn't take advantage of such a tracking feature. Often enough we only need one thing, and such a single thing that comes to mind in particular is evaluating a devshell.
Those tend to be affected by only part of the inputs that are tracked in the evaluation cache, which means that shells are evaluated far more often than needed. Composition is not an issue here, because the evaluator will run for a single purpose and (almost) a single attribute.
By tracking which actual files affect the devshell, and storing that information in a clever index (non-trivial), we could speed up repeated devshell invocations significantly and provide a "watch" mode, both of which could significantly speed up dev tooling that relies / can rely on dev shells and such.

So I would argue that this functionality like this should still be implemented - as part of the eval cache layer (which is currently separate from the normal evaluator).
I can't think of a good way for Nix to store this information in the more free and arbitrary world of impure evaluation, and perhaps it's not even a significant improvement compared to nix-instantiate -v, which shows which Nix files are loaded.

Unfortunately storing it in the output or in the store does not seem feasible because it's really up to the "evaluation driver" to discover and use this information, and only "after" the underlying two layers have done their work (evaluator and store).

nixos-discourse · 2024-05-29T10:04:25Z

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/pre-rfc-implement-dependency-retrieval-primitive/43418/14

copumpkin mentioned this issue Jun 14, 2015

Export & Compare NixOS configurations. NixOS/nixpkgs#7092

Open

copumpkin mentioned this issue Jul 31, 2015

stage-1: fix typo that breaks resume NixOS/nixpkgs#9056

Merged

copumpkin changed the title ~~A primop for [stored] "evaluation-time closure" of a value~~ Deterministic evaluation of Nix expressions (and tracking who produced a derivation) Nov 15, 2015

fkz mentioned this issue Nov 19, 2015

WIP: deterministic evaluation of expressions/derivations #709

Closed

14 tasks

copumpkin mentioned this issue Feb 2, 2017

A couple of simple primops for easier generic manipulation of n-ary functions #1213

Open

shlevy added the backlog label Apr 1, 2018

shlevy assigned peti Apr 1, 2018

copumpkin self-assigned this Apr 12, 2018

peti removed their assignment Apr 12, 2018

peti added the improvement label Apr 12, 2018

peti added question and removed backlog labels Apr 21, 2018

CMCDragonkai mentioned this issue Aug 11, 2018

Networked Expressions MatrixAI/Architect#12

Open

stale bot added the stale label Feb 16, 2021

stale bot closed this as completed Apr 29, 2022

thufschmitt reopened this Feb 24, 2023

stale bot removed the stale label Nov 17, 2023

roberth added language The Nix expression language; parser, interpreter, primops, evaluation, etc stale labels Nov 17, 2023

stale bot removed the stale label Nov 17, 2023

roberth added feature Feature request or proposal new-cli Relating to the "nix" command performance labels Nov 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deterministic evaluation of Nix expressions (and tracking who produced a derivation) #553

Deterministic evaluation of Nix expressions (and tracking who produced a derivation) #553

copumpkin commented Jun 3, 2015

copumpkin commented Jun 4, 2015

copumpkin commented Jun 13, 2015

copumpkin commented Jun 13, 2015

vcunat commented Jul 31, 2015

magnetophon commented Nov 15, 2015

fkz commented Nov 16, 2015

bbarker commented Oct 18, 2017

copumpkin commented Oct 18, 2017

copumpkin commented Apr 12, 2018

bbarker commented Apr 12, 2018 •

edited

Loading

CMCDragonkai commented Jul 25, 2018 •

edited

Loading

edolstra commented Jul 25, 2018

copumpkin commented Jul 25, 2018

CMCDragonkai commented Jul 28, 2018 •

edited

Loading

virusdave commented Sep 11, 2019

nixos-discourse commented Jan 18, 2020

stale bot commented Feb 16, 2021

stale bot commented Apr 29, 2022

roberth commented Nov 17, 2023 •

edited

Loading

nixos-discourse commented May 29, 2024

Deterministic evaluation of Nix expressions (and tracking who produced a derivation) #553

Deterministic evaluation of Nix expressions (and tracking who produced a derivation) #553

Comments

copumpkin commented Jun 3, 2015

copumpkin commented Jun 4, 2015

copumpkin commented Jun 13, 2015

My goal

Building blocks

A "record mode" for Nix

A corresponding playback mode

copumpkin commented Jun 13, 2015

vcunat commented Jul 31, 2015

magnetophon commented Nov 15, 2015

fkz commented Nov 16, 2015

bbarker commented Oct 18, 2017

copumpkin commented Oct 18, 2017

copumpkin commented Apr 12, 2018

bbarker commented Apr 12, 2018 • edited Loading

CMCDragonkai commented Jul 25, 2018 • edited Loading

edolstra commented Jul 25, 2018

copumpkin commented Jul 25, 2018

CMCDragonkai commented Jul 28, 2018 • edited Loading

virusdave commented Sep 11, 2019

nixos-discourse commented Jan 18, 2020

stale bot commented Feb 16, 2021

stale bot commented Apr 29, 2022

roberth commented Nov 17, 2023 • edited Loading

nixos-discourse commented May 29, 2024

bbarker commented Apr 12, 2018 •

edited

Loading

CMCDragonkai commented Jul 25, 2018 •

edited

Loading

CMCDragonkai commented Jul 28, 2018 •

edited

Loading

roberth commented Nov 17, 2023 •

edited

Loading