Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-contained outputs #7087

Merged
merged 4 commits into from
Jan 30, 2023
Merged

Conversation

ncfavier
Copy link
Member

@ncfavier ncfavier commented Sep 23, 2022

Fixes #5633

Adds a new derivation attribute possibleReferences which restricts the list of paths that are searched in build outputs in order to establish runtime references.

Useful when creating filesystem images containing their own embedded Nix store: setting possibleReferences = []; makes them self-contained blobs of data with no runtime dependencies.

To do:

  • allow specifying output names just like in allowedReferences. Not clear if this is useful, but why not. We can intersect the list of output names with the keys of scratchOutputs when adding those to the set.
  • should we have impossibleReferences as well? (I don't think so)

EDIT: the attribute has been made a __structuredAttrs boolean (unsafeDiscardReferences.out = true;) and hidden under the discard-references experimental feature.

@ncfavier
Copy link
Member Author

ncfavier commented Sep 23, 2022

I think CI is failing because the daemon doesn't have the patch? Checks pass locally

EDIT: ah, it's because of the check against a nixUnstable daemon. I guess it's ok then.

Copy link
Member

@thufschmitt thufschmitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's a much needed thing, glad to see a PR for that!

Left a few comments/suggestions, but overall 👍

tests/check-refs.sh Outdated Show resolved Hide resolved
Comment on lines 2158 to 2162
std::set_intersection(referenceablePaths.begin(), referenceablePaths.end(),
spec.begin(), spec.end(),
std::inserter(tmp, tmp.begin()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to also log (notice I would say) the references that we erase here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit skeptical: I don't think it adds any information that the user doesn't already know, and I think there are usually too many paths in referenceablePaths for this to be useful and not spam.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be useful to debug some potential errors where you're accidentally referencing too much. But I might be wrong and the spamming argument is totally valid. So let's keep things that way)

tests/check-refs.sh Outdated Show resolved Hide resolved
tests/check-refs.nix Outdated Show resolved Hide resolved
doc/manual/src/language/advanced-attributes.md Outdated Show resolved Hide resolved
tests/check-refs.sh Show resolved Hide resolved
@ncfavier ncfavier force-pushed the referenceablePaths branch 2 times, most recently from 03f04dc to ab929ec Compare September 24, 2022 22:56
@ncfavier
Copy link
Member Author

I renamed the attribute to possibleReferences.

@edolstra
Copy link
Member

I'm not in favor of this. It breaks a fundamental property of Nix, namely the closure invariant, i.e. every reference of a valid path is also present and valid, and thus there are no "gaps" in closures. It decouples the references stored in the Nix DB from the actual references in the filesystem.

I realize that the main use case here is filesystem images, where it is desirable and reasonable to ignore references, but this PR also opens the possibility of optional runtime dependencies and is sure to be (ab)used to for that. E.g. if I have a package like digikam that has an expensive reference to marble (which is very big), I could mark marble as an ignored dependency. So digikam will be able to use marble if it happens to be present. But that means that closures no longer behave in a predicable way; their behaviour would depend on other paths that may be present in the store.

A minor issue is that it's currently at least theoretically possible to recover the references graph (i.e. the contents of the Nix database) simply by scanning all store paths for references to all other store paths. With this PR, that property no longer holds since you need the possibleReferences attribute of the derivations that built the store paths. So it wouldn't be possible anymore to have a nix rebuild-db command.

@ncfavier
Copy link
Member Author

ncfavier commented Sep 26, 2022

Regarding the first point, Nix does not have that property. I can easily make a program that effectively references a store path in a way that is not detected by the scanner, simply by mangling the hash. We are using a heuristic to detect references, which works well enough but is nonetheless sometimes faulty, and this PR allows overriding the heuristic when it is faulty. This does not change any fundamental property of Nix. If you fear that it would be abused, we can always add a big red warning in the documentation.

I agree that the second point is more of a problem. How about, as an alternative, storing the list of possible references in a designated path under the output tree, say /nix-support/possible-references? Pros: reference scanning is deterministic. Obvious cons: only works for derivations producing directories; scanning gets a bit weird because the scanner then needs to ignore that file and scan the rest of the directory. Or maybe just /nix-support/references that says exactly what the references are? (This seems like the easiest to implement.)

@thufschmitt
Copy link
Member

I agree that this isn't really breaking the closure invariant since we can already rot13 (or any reversible transformation) twice the output to get rid of the dependencies. It's just making this less hacky for the cases where it's needed.

@edolstra Is the digikam example something you've seen asked for in practice? It feels to me that if I really wanted this kind of optional dependencies and couldn't afford a proper wrapper I'd rather rely on an env variable pointing to some path outside of the store (like what NixOS does with the opengl drivers or just any script relying on $PATH) rather than hardcoding a possibly existing path

If that's the case, would it be reasonable to just make it possible to ignore all the references? That should solve the most common use-case while not allowing this kind of optional dependencies pattern

@edolstra
Copy link
Member

Of course you can hide dependencies, just as you can hide pointers to a conservative garbage collector by negating the bits in the pointer, but then it's kind of clear you're doing something naughty. This would make dangling references a supported feature and that's a pretty fundamental change.

At the very least, this should be an experimental feature.

@ncfavier
Copy link
Member Author

I'm not sure an experimental feature really makes sense for this.

If the intent is to make sure it's not "too easy" to (ab)use, how about giving it a scary name like unsafeRestrictReferences and not including it in the documentation, like unsafeDiscardStringContext (which can also be used for optional dependencies)?

@Ericson2314
Copy link
Member

A minor issue is that it's currently at least theoretically possible to recover the references graph (i.e. the contents of the Nix database) simply by scanning all store paths for references to all other store paths.

That's just not true.

  1. We would need an exception for drv files. The output paths in drv files do not count as references.

  2. Anyone is free to manually type a non-present nix store path in a build, e.g. /nix/store/himynameisjohn-foobar and arrange for it to be in an output. References of outputs are always a subset of drv inputs. Since things string merely looks like a store path but isn't a drv intput it won't count. But if we were to scan for references after the fact as you purpose, we would have no reason not to count it!

    With input-addressed paths, scanning for references to rebuild the DB can in fact find spurious cycles this way!

Bottom line is, the references are not mere metadata, or a mere cache of scanning. References are data proper, as much part of the store object as its file system objects are. This is the only correct way to account for how Nix actually behaves.

@Ericson2314
Copy link
Member

Experimental feature still makes sense, however. By default things should start as experimental features this is good.

@ncfavier
Copy link
Member Author

I've renamed the attribute to __visibleReferences, hidden it under the experimental feature invisible-references and documented it as experimental.

I'm not sure what else needs to be done, e.g. should this interact with requiredSystemFeatures somehow?

@thufschmitt
Copy link
Member

I'm not sure what else needs to be done, e.g. should this interact with requiredSystemFeatures somehow?

That could be good (so that you don't dispatch a build to a builder that doesn't have the xp feature), but it's not a blocker imho. And maybe not really needed at all since I hope that this won't stay experimental for too long (ideally it could ship as Xp for the next release and then stable in the following one)

doc/manual/src/language/advanced-attributes.md Outdated Show resolved Hide resolved
Comment on lines 2158 to 2162
std::set_intersection(referenceablePaths.begin(), referenceablePaths.end(),
spec.begin(), spec.end(),
std::inserter(tmp, tmp.begin()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be useful to debug some potential errors where you're accidentally referencing too much. But I might be wrong and the spamming argument is totally valid. So let's keep things that way)

@ncfavier
Copy link
Member Author

ncfavier commented Oct 10, 2022

Not sure how to get the test daemon to pick up the experimental feature, enableFeatures invisible-references doesn't work either.

@thufschmitt
Copy link
Member

Not sure how to get the test daemon to pick up the experimental feature, enableFeatures invisible-references doesn't work either.

If you haven't done that, I think you need to call restartDaemon after that to force the daemon to take the new config into account

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nix-team-meeting-minutes-oct-7-2022/22369/1

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2022-10-14-nix-team-meeting-minutes-4/22811/1

@ncfavier
Copy link
Member Author

Is anything else blocking this?

@Ericson2314
Copy link
Member

@ncfavier @edolstra will need to be convinced, or delegate the decision, or similar. This is in backlog while the Nix team is getting going.

@roberth
Copy link
Member

roberth commented Nov 8, 2022

Here's another case.

It'd be much easier to test as well, if it wouldn't pull in all the contents as transitive dependencies of the test.

@thufschmitt thufschmitt self-assigned this Nov 25, 2022
@7c6f434c
Copy link
Member

7c6f434c commented Feb 3, 2023

Well, this as experimental will not really get us data on the drawbacks of this compared to the safer «declare an output self-contained», as the hiding will need to be very explicitly asked by the user.

Hopefully this will stay experimental…

@roberth
Copy link
Member

roberth commented Feb 3, 2023

@7c6f434c makes a good point.

Actually having this as experimental makes the feature much worse because any mistakenly unconfigured builder will ignore the attribute and produce an output with a ton of dependencies that will propagate through caches etc. Users won't know why it happened and then think that this feature has a bug.

The real problem is forward compatibility. I don't think we have a way for a derivation to request that it's built by a Nix that has specific support for something, do we? requiredSystemFeatures could be that, but I don't know if we want to use it for this purpose, as any new such feature requires remote builder reconfiguration and such.

@ncfavier
Copy link
Member Author

ncfavier commented Feb 3, 2023

will ignore the attribute

It will fail, actually.

error: experimental Nix feature 'discard-references' is disabled; use '--extra-experimental-features discard-references' to override

I think ignoring the attribute would actually be preferable. People who know about the experimental feature can try it out. Others get the status quo ("ton of dependencies").

@roberth
Copy link
Member

roberth commented Feb 3, 2023

Right, so now we have the worst of all options

  • no guarantee that discard-references will work, because stable nix does still ignore it
  • no possibility to actually enable this in widely used expressions, as expression authors will not want to require users to enable the feature
  • users who might encounter problems have to opt in, but maybe they don't do it

I don't know what we're afraid of or why we should make life so hard for everyone including ourselves (Nix team) who need to learn about this feature's use. Unknown unknowns, really?

@7c6f434c
Copy link
Member

7c6f434c commented Feb 3, 2023

I don't know what we're afraid of or why we should make life so hard for everyone including ourselves (Nix team) who need to learn about this feature's use. Unknown unknowns, really?

Well, it violates a pretty fundamental assumption that everyone considers a foundational principle of Nix. It definitely can be abused badly… Is it that hard to overrideDerivation its use into any image-generating expression, though (provided you have it enabled)?

@ncfavier
Copy link
Member Author

ncfavier commented Feb 3, 2023

It's not hard, provided the image builder supports __structuredAttrs: NixOS/nixpkgs#214373

With the above PR I'm able to build a reference-free ISO with config.system.build.isoImage.overrideAttrs (_: { unsafeDiscardReferences.out = true; }).

@roberth
Copy link
Member

roberth commented Feb 3, 2023

Well, it violates a pretty fundamental assumption that everyone considers a foundational principle of Nix.

So does unsafeDiscardStringContext, and yet I have never encountered anyone who has a problem with it.

It definitely can be abused badly…

Abused sure, anything can be abused, but badly? In a way where the responsibility falls on Nix?

I suggest we remove the experimental flag soon, and if anyone wants to exercise their caution, work on a forward compatibility attribute instead; a known problem.

@7c6f434c
Copy link
Member

7c6f434c commented Feb 4, 2023

Abused sure, anything can be abused, but badly? In a way where the responsibility falls on Nix?

«I have done a GC and now a package I have installed has reduced functionality, isn't this supposed to be impossible» is quite likely to be blamed on Nix.

@roberth
Copy link
Member

roberth commented Feb 4, 2023

Sure. I'd rather have a useful tool than a safe tool. Let's agree to disagree.

@7c6f434c
Copy link
Member

7c6f434c commented Feb 4, 2023

It's not merely about a safe tool, it is about whether a tool gives you guarantees you can quickly understand and reason about. Reducing unpredictable interactions is more or less the whole point of Nix, so exceptions to it that can be hidden in an expression reduce the value.

As a useful tool I would prefer a trusted user dropping all the references of a specific path, as this is a clearly informed local choice, and the user could just do this anyway via pushing to a local cache and dropping the references (or prefetching as FOD, or whatever they prefer)

@ncfavier
Copy link
Member Author

ncfavier commented Feb 4, 2023

«I have done a GC and now a package I have installed has reduced functionality, isn't this supposed to be impossible» is quite likely to be blamed on Nix.

How often do you see people complaining that Nix is not pure enough, or not safe enough? How often do you see people complaining about Nix being too resource-hungry or wasteful? I certainly see the latter way more often.

exceptions to it that can be hidden in an expression reduce the value.

No they don't. GHC wouldn't exist without unsafePerformIO. Nix isn't a proof assistant, it's a build system.

Using a heuristic for determining references and not providing an escape hatch for false positives in the heuristic is nonsensical.

a trusted user dropping all the references of a specific path

The burden shouldn't fall on the user though (no matter how trusted or high-up-the-supply-chain). As a derivation author, I know that the hashes appearing in out aren't actual references, so I make the informed choice to use unsafeDiscardReferences.

@7c6f434c
Copy link
Member

7c6f434c commented Feb 4, 2023

How often do you see people complaining that Nix is not pure enough, or not safe enough?

I personally? A lot of time. Like a significant part of discussion about flakes is specifically about what is the proper make to get things more pure. I do also see discussions of how to make sandboxing safer.

I do see complaints that are basically «deduplication needs to be improved a lot», which seems to be primary motivation for Content-Addressed work.

Of model-breaking complaints, I mostly see «how do I make one store path writeable and seeded by some nix-build provided contents». Somehow, the idea of actually making it an option doesn't seem to get traction. Some of the issues can be resolved with carefully constructed FHS user envs, but not all of then.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nix-team-report-2022-10-2023-03/27486/1

@RaitoBezarius
Copy link
Member

Hello, we are multiple projects who are interested into that feature:

Unfortunately, the way this feature is designed makes it very hard to let user who want to opt-in into this unsafeness get the benefits.

For example, unsafeDiscardReferences.out = true; will eval but not build if you don't have the experimental features, this prevent graceful degradation to the usual behavior, which would be nice.

Furthermore, if we want to have it in the Hydra buildfarm for our current disk images, this would make it very hard because of the experimental status.

I don't know what are the right ways to get out of this situation, but I would appreciate if we could make this feature useful for the interested consumers, it would definitely bring awesome improvements for the community.

I understand where @edolstra and @7c6f434c come from on the risk but I feel like we can be proactive against abuse and be very clear about it. I would even go and say there could be some flag which can disable this feature at runtime with a flag or an environment variable if needed?

Thank you to everyone involved into this!

@7c6f434c
Copy link
Member

7c6f434c commented May 11, 2023 via email

@Ericson2314
Copy link
Member

Having the same derivation build two different ways depending on experimental feature usage sounds very bad to me. Failing so the success means the same result is much better.

We could add a primop to ask if an experimental feature is enabled (impure mode only until eval cache keys improve), and that will allow ergonomic falling back at a better layer of abstraction.

What is hard about Hydra? If one is running ones own hydra, one can just change the Nix configuration.

@7c6f434c
Copy link
Member

7c6f434c commented May 11, 2023 via email

@Ericson2314
Copy link
Member

Then s/build/realization process/ or whatever. "Derivation -> set of store objects" is the mapping I want to not depend on which extensions are enabled (modulo failing to produce a key-value pair at all).

@roberth
Copy link
Member

roberth commented May 12, 2023

Another idea is to add a flag to the Nixpkgs config so that users who have enabled the experimental feature can opt in explicitly.
That's a bit more machinery than I would like, but it'd work. Something like nixpkgs.config.experimentalFeatures.unsafeDiscardReferences = true; on NixOS?

So to summarize, it seems that we have to give up on one of

  • reproducible: references will be the same, as long as Nix isn't old
  • easy to collaborate with non-experimental users
  • easy to enable: only in nix.conf and not in Nixpkgs

The last one seems least bad to sacrifice.

@Ericson2314
Copy link
Member

Yeah I agree experimental features should probably not be changing behavior, more like monotonically increasing behaviors. e.g. when we add something like __contentAddressed it is never ignored: we technically break anything that used it before to instead fail if the experimental feature is not enabled.

@Ericson2314
Copy link
Member

Ericson2314 commented May 12, 2023

Also on a tangent, instead of periodically stealing __blah attrs, we could have some sort of proactive stealing of an entire namespace (sub attribute set with structured attrs? prefix without structured attrs?) and Nix could simply refuse to build a derivaiton with a so-designated magic attribute it doesn't know about. This gives us better forward-compat.

@7c6f434c
Copy link
Member

Then s/build/realization process/ or whatever. "Derivation -> set of store objects" is the mapping I want to not depend on which extensions are enabled (modulo failing to produce a key-value pair at all).

Well, the set of objects in the store here is the same. If the argument for the feature is that the references ignored don't matter, well, then they shouldn't matter in either direction. (And dropping some references when pushing to a cache is already pretty easy).

@thufschmitt
Copy link
Member

I think it makes sense to just stabilize this per https://nixos.org/manual/nix/stable/contributing/experimental-features.html#lifecycle-of-an-experimental-feature. @NixOS/nix-team, any thoughts on that?

That obviously wouldn't solve all the problems (since everything but the latest Nix versions would still fail on it), so the idea of a nixpkgs config option like @roberth mentioned seems very good. That's fairly similar to what CA derivations have (there's a nixpkgs.config.contenAddressedByDefault flag that can be flipped to start using them). That would give us a nice migration path until the required Nix version to evaluate nixpkgs gets recent-enough to have this working and stable.

@thufschmitt thufschmitt mentioned this pull request May 12, 2023
8 tasks
thufschmitt pushed a commit to tweag/nix that referenced this pull request May 12, 2023
It has been there for a few releases now (landed in 2.14.0), doesn't
seem to cause any major issue and is wanted in a few places
(NixOS#7087 (comment)).
@thufschmitt
Copy link
Member

I think it makes sense to just stabilize this

#8322

thufschmitt pushed a commit to tweag/nix that referenced this pull request Aug 7, 2023
It has been there for a few releases now (landed in 2.14.0), doesn't
seem to cause any major issue and is wanted in a few places
(NixOS#7087 (comment)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Feature request or proposal
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Override or remove the runtime references of a path
9 participants