Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allowSubstitutes is an anti-feature #4442

Closed
domenkozar opened this issue Jan 11, 2021 · 27 comments
Closed

allowSubstitutes is an anti-feature #4442

domenkozar opened this issue Jan 11, 2021 · 27 comments

Comments

@domenkozar
Copy link
Member

domenkozar commented Jan 11, 2021

I can't describe how many hours I and people I have helped have lost to save a few miliseconds of machine time.

For example preferLocalBuild ignores builders and --max-jobs per #3810 giving the user the impression that flags are being ignored.

Having such exceptions completely change the way Nix works makes it also really hard to debug binary cache issues - I've gotten lots of reports of why is binary cache not substituting.

Refs #3686

@grahamc
Copy link
Member

grahamc commented Jan 11, 2021

I think I would rather keep preferLocalBuild and have it respect --max-jobs instead. I don't relish the idea of uploading 10's of GB of data to a remote server over DSL to write a short text file, just to download it again.

@Ninlives
Copy link
Contributor

the preferLocalBuild overwrites the --max-jobs may be an anti-feature, but allowSubstitutes/preferLocalBuild themselves are useful. I do not want to waste time on querying binary cache when I just want to write a text file.

@Ma27
Copy link
Member

Ma27 commented Jan 11, 2021

The problem I have with removing preferLocalBuild is that I regularly use it to configure on a per-derivation level whether or not I want remote builds. As @grahamc stated, this is helpful to avoid uploading tons of data to a remote server just to build e.g. a proprietary package where only an unpack and a patchelf happens. This is particularly helpful for me since I'm regularly (not during the pandemic though) at places where I have rather bad internet.

@domenkozar
Copy link
Member Author

It's a great power feature that hits Nix beginners really had and given that it's used in nixpkgs there no opt-out.

Of course all expirienced Nix users will say they have no problem with it, but it makes behavior with substituters and remove builders hard to debug and reason about.

@bgamari
Copy link
Contributor

bgamari commented Jan 11, 2021

I use preferLocalBuild quite heavily as it can mean the difference between waiting a few seconds and a few minutes in the case of derivations with large inputs and outputs.

@domenkozar
Copy link
Member Author

I often (need to start tracking how often) get Cachix support request why is something not being substituted and I have templated an answer, but the response from Nix beginners is usually: wow

This probably can be fixed with better logging/messaging and fixing all the bugs, but I feel like that's one of those things that will never materialize.

@Ma27
Copy link
Member

Ma27 commented Jan 11, 2021

Maybe it would help to make this clear in at least the new progress bar that's currently under development: #4296

@domenkozar
Copy link
Member Author

domenkozar commented Jan 11, 2021

I understand that preferLocalBuild can save minutes, but allowSubstitutes saving is really marginal: as those paths still need to be uploaded, so really you only save the substitution.

For preferLocalBuild there should at least be good -v story to explain why something was built where it was.

@roberth
Copy link
Member

roberth commented Jan 11, 2021

allowSubstitutes can cause the download of build tools for small store paths that have already been built and uploaded. It's really not that clever, last time I checked. Mostly relevant for C/D agents and other cases where only runtime closures are necessary.

@rickynils
Copy link
Member

From the discussion here, it seems there is a majority in favor of keeping preferLocalBuild as it works today, but do something about the max-jobs=0 situation. I'm also of the opinion that text files and similar should be built locally by default, but that it should be possible to not build them locally. The "prefer" word seems to me imply that this was intended from the start, and that the max-jobs=0 behavior is just a bug. I've opened a PR resolving that bug.

@edolstra edolstra added improvement and removed bug labels Jan 13, 2021
@bhipple
Copy link
Contributor

bhipple commented Jun 23, 2021

preferLocalBuild = true I can sorta understand, but is there any real use case for allowSubstitutes = false? I'm running into issues with CI and #4364, and while I see there are workarounds like https://github.com/Mic92/nix-build-uncached/#packages-with-allowsubstitutes--false I'm wondering if I should just patch nix itself to always return allowSubstitutes = true?

@ajs124
Copy link
Member

ajs124 commented Jun 23, 2021

is there any real use case for allowSubstitutes = false

NixOS/nixpkgs#123943

@domenkozar
Copy link
Member Author

That seems like a bug to me rather than a use case for allowSubstitutes.

@colemickens
Copy link
Member

colemickens commented Oct 9, 2021

As described in #5358 this lends itself to some very frustrating user experiences. It is very disappointing to me that nix-store -r --dry-run and nix-build --dry-run can basically give a completely different answer, all owing to this "feature".

I can't even really begin to even account for how much time I wasted on this, I've invested hours here and there over months to try to figure out why the ideal (per-my-understanding-before-now) CI flow for Nix kept rebuilding unnecessarily.

@bgamari
Copy link
Contributor

bgamari commented Oct 11, 2021

I think it is important that we split the question here: I personally see very good reasons for having preferLocalBuild; I would be strongly opposed to removing it without providing some other way to avoid extraneous network I/O for "trivial" derivations. On the other hand, I couldn't care less about allowSubstitutes. In my opinion, this ticket should be restricted to discussing the latter.

@domenkozar domenkozar changed the title allowSubstitutes/preferLocalBuild are an anti-feature allowSubstitutes is an anti-feature Oct 13, 2021
@rickynils
Copy link
Member

NixOS' top-level system derivation has allowSubstitutes = false, which means building NixOS configs can take quite a bit longer than really necessary, if you for example run the build from a CI machine with an empty nix store (like GitHub Actions). Then Nix will try to build all dependencies of the system individually instead of just fetching the top level derivation with its dependency closure from a binary cache.

Some interesting history:
allowSubstitutes = false was implied if preferLocalBuild = true, but this was changed by @edolstra in b64988b. In that commit, Eelco states that allowSubstitutes = false is kind of hurtful for performance.

I think we have some different options for improving this situation:

  • Nuke allowSubstitutes and simply ignore the fact that under some specific circumstances it could be faster to build a derivation than to fetch it (though you still have the ability to control this with preferLocalBuild)
  • Make it possible override allowSubstitutes with a Nix option (something like always-substitute). That way you can force substitution when it makes sense for you (like in CI etc).
  • Nuke allowSubstitutes and instead race building and fetching against each other. This could either be done for all derivations or just for the ones that has allowSubstitutes = false today.

@roberth
Copy link
Member

roberth commented Nov 10, 2021

@rickynils perhaps another option:

  • Ignore allowSubstitutes and honor preferLocalBuild only when all dependencies* are already present. This is
    • quick on user systems and build agents with a persistent store
    • but also efficient for pre-built server configs, ephemeral build environments
    • and it does not put undue load on binary caches (from races with a predictable winner)

*: all dependencies, perhaps excluding dependencies that have preferLocalBuild = true themselves so as to allow not just single derivations but subgraphs of preferLocalBuild derivations to behave as suggested.

I'd have to read up on the updated 2.4 realisation logic tbh.

@Kha
Copy link
Contributor

Kha commented Nov 10, 2021

Has there been any proposal for finer-grained (per-derivation/flake) substituter lists? It seems like allowSubstitutes = false is often used to mean "well it's certainly not in cache.nixos.org". Now that substituters can be configured in flake.nix, it would be an interesting optimization to apply only them and not cache.nixos.org etc. to derivations of that flake. Though "derivations of that flake" is not a concept that exists right now afaik and could be subtle regarding overrides etc...

@michaelpj
Copy link

Nuke allowSubstitutes and instead race building and fetching against each other.

Interesting. I suggested this a long time ago in #3019 as a way to make adding substituters guaranteed to never slow things down, I think you're right that it would also make allowSubstitutes pointless.

rycee added a commit to nix-community/home-manager that referenced this issue Nov 12, 2021
rycee added a commit to nix-community/home-manager that referenced this issue Nov 12, 2021
peterhoeg pushed a commit to peterhoeg/home-manager that referenced this issue Nov 29, 2021
michaelpj added a commit to input-output-hk/haskell.nix that referenced this issue Mar 2, 2022
`runCommandLocal` sets both `preferLocalBuild=true` and
`allowSubstitutes=false`. There's an
[argument](NixOS/nix#4442) that the latter is a
misfeature. There's really no reason not to download something from a
substitutor if it's actually been built already.

We can get the effect that we want easily enough by just setting
`preferLocalBuild=true` alone.
michaelpj added a commit to input-output-hk/haskell.nix that referenced this issue Mar 2, 2022
`runCommandLocal` sets both `preferLocalBuild=true` and
`allowSubstitutes=false`. There's an
[argument](NixOS/nix#4442) that the latter is a
misfeature. There's really no reason not to download something from a
substitutor if it's actually been built already.

We can get the effect that we want easily enough by just setting
`preferLocalBuild=true` alone.
hamishmack pushed a commit to input-output-hk/haskell.nix that referenced this issue Mar 3, 2022
`runCommandLocal` sets both `preferLocalBuild=true` and
`allowSubstitutes=false`. There's an
[argument](NixOS/nix#4442) that the latter is a
misfeature. There's really no reason not to download something from a
substitutor if it's actually been built already.

We can get the effect that we want easily enough by just setting
`preferLocalBuild=true` alone.
@stale stale bot added the stale label Aug 13, 2022
@Gerschtli
Copy link

Any news here? This issue still is relevant.

@nrdxp
Copy link

nrdxp commented Nov 2, 2022

some derivations with preferLocalBuild may have heavy dependencies that should still be substituted (a NixOS system derivation is one good example). I currently build my own system in CI using a remote store build (--store) and then upload the result to my cache using nix copy --from $builder --to $cache.

What I would prefer is if nix copy just had a flag to ignore any derivations in the chain of dependencies that have preferLocalBuild set. That way I can still make sure all the heavy dependencies are built for my system, ensuring they are cached, while not wasting time and cache space on those derivations that are gonna be built locally anyway.

mupdt pushed a commit to mupdt/nix that referenced this issue Mar 1, 2023
This resolves the problems detailed here:
NixOS#4442
The `allowSubstitute=false` "feature" wreaks havoc on CI / AWS / container
type builds, because we *cannot* cache hit on AWS pkgs, and further features
like `nix-build-if-changed` do not work without really ugly wrapper hacks:
NixOS#4364
Seen in the wild, in addition to the above, we've hit two problems at PDT specifically:
1. User buildEnv results can't be substituted out of S3. This means that in
   pdtnix, we can't push them to S3 from CA and have hosts like mtsync pull
   them. This then results in mtsync failing, because it tries to build, but it
   doesn't have DTS-8 installed.
2. On AWS, there are >10,000 pkgs in texlive that say
   allowSubstitutes=false. These are all in S3, but if you cold start a host it
   will rebuild them all!

Moreover, our buildPDTEnv calls are sometimes very expensive (20,000+ symlinks
in turnover-tool style bundles!), so the whole notion that this is faster to
build locally than substitute out of S3 is flawed. We also run pytest
suites, which involve starting tests on NFS, etc., and do a bunch of `find`
filtering to find tests.

Per comment upstream, this is a controversial change and unlikely to be
accepted upstream without more discussion (currently underway in the
community), so we'll keep it internal for now.
@lovesegfault
Copy link
Member

I have a PR in nixpkgs which allows users to configure allowSubstitutes to be ignored: NixOS/nixpkgs#221048

It's not a perfect solution, since it causes rebuilds, but it does help.

@domenkozar
Copy link
Member Author

It also breaks horribly if your local system doesn't match the one you're substituting, as Nix will try to build and fails because it can't.

@elaforge
Copy link

elaforge commented Oct 9, 2023

Just to pile on here, I spent quite a bit of time debugging due to allowSubstitutes=false, because it causes queryUnbuilts to basically lie. I had to go tracing down into the queryUnbuilts source to finally find the hardcoded thing in there, and before that while I knew (and heavily use) preferLocalBuilds I had no idea allowSubstitutes=false existed. I'm still not really sure why both are needed.

Now that I know it exists, it means I can't trust queryUnbuilts output, but have to do a two step: queryUnbuilts, then for things it claims are unbuilt, parse the drvs to their outs, and then wopQuerySubstitutablePaths on those outs... which seems to not use narinfo cache or local cache so is less efficient, though I'm not sure. I couldn't find any documentation for narinfo cache at all, let alone which ops use it. Anyway different story.

I think over the years this was also the source of a lot of mysterious rebuilds I never understood, that went away when we stopped using nixpkgs writeText and wrote our own.

@lovesegfault
Copy link
Member

Folks, I'm happy to report that #8047 has been merged!

In the next release, you'll be able to completely disable allowSubstitutes by setting always-allow-substitutes = true 🥳

@elaforge
Copy link

Nice, just slightly disappointed it wasn't named disallow-allow-substitutes-is-false = true for maximum confusion.

@fricklerhandwerk
Copy link
Contributor

Closed by #8047

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests