Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bintools: Add response file support to ld-wrapper #213831

Merged

Conversation

Gabriella439
Copy link
Contributor

@Gabriella439 Gabriella439 commented Jan 31, 2023

The motivation behind this is to alleviate the problem described in #41340. I'm not sure if this completely fixes the problem, but it eliminates one more area where we can exceed command line length limits.

This is essentially the same change as in #112449, except for ld-wrapper.sh instead of cc-wrapper.sh.

However, that change alone was not enough; on macOS the ld provided by darwin.cctools fails if you use process substitution to generate the response file, so I put up a PR to fix that:

tpoechtrager/cctools-port#131

… and I included a patch referencing that fix so that the new ld-wrapper still works on macOS.

I verified that the ld provided by .#bintools now correctly handles response files without a segmentation fault

Things done
  • Built on platform(s)
    • aarch64-darwin
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

@Gabriella439
Copy link
Contributor Author

For convenience, here is the relevant excerpt of the build failure from the eval:

error: builder for '/nix/store/a1qy9fhlggpwsz71kcc2pqy5b481cgx3-aws-c-common-0.8.9.drv' failed with exit code 2;
       last 10 log lines:
       > 423/423 Test #309: device_rand_u16_distribution .........................................   Passed    6.65 sec
       >
       > 99% tests passed, 1 tests failed out of 423
       >
       > Total Test time (real) =   8.33 sec
       >
       > The following tests FAILED:
       >   118 - bus_async_test_churn (SEGFAULT)
       > Errors while running CTest
       > make: *** [Makefile:71: test] Error 8
       For full logs, run 'nix log /nix/store/a1qy9fhlggpwsz71kcc2pqy5b481cgx3-aws-c-common-0.8.9.drv'.

… which appears to be a flaky test failure for an unrelated test that just happened to be downstream of this change: https://github.com/awslabs/aws-c-common/blob/8c51377ddff0aea9e21ecd9f58e9a4d283539751/tests/bus_test.c#L354-L436

Copy link
Contributor

@trofi trofi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Very nice fix! Survived a few thousand packages on x86_64-linux from my system as well.

@Gabriella439
Copy link
Contributor Author

Since this is deep in the dependency tree, do I need to select another base branch to merge into? I don't know exactly what is the policy for PRs like these

(I'm still doing some additional testing before merging this, too, on our internal codebase)

@trofi
Copy link
Contributor

trofi commented Feb 2, 2023

Ah, yes. It should be rebased against staging. Otherwise it should be OK.

trofi

This comment was marked as outdated.

@Gabriella439 Gabriella439 changed the base branch from master to staging February 2, 2023 19:34
@trofi
Copy link
Contributor

trofi commented Feb 2, 2023

Eval failed with:

The following tests FAILED:
        118 - bus_async_test_churn (SEGFAULT)
Errors while running CTest
make: *** [Makefile:71: test] Error 8
error: builder for '/nix/store/mgcshrniwk02rqv9mzr6vzqq4d9g6zsb-aws-c-common-0.8.9.drv' failed with exit code 2;

Not sure if by related.

@trofi
Copy link
Contributor

trofi commented Feb 2, 2023

@ofborg eval

@Gabriella439
Copy link
Contributor Author

Yeah, I think that test is flaky and just happens to be downstream of this change. It failed earlier on this PR, then passed, then failed again

The motivation behind this is to alleviate the problem
described in NixOS#41340.
I'm not sure if this completely fixes the problem, but it
eliminates one more area where we can exceed command line
length limits.

This is essentially the same change as in NixOS#112449,
except for `ld-wrapper.sh` instead of `cc-wrapper.sh`.

However, that change alone was not enough; on macOS the
`ld` provided by `darwin.cctools` fails if you use process
substitution to generate the response file, so I put up a
PR to fix that:

tpoechtrager/cctools-port#131

… and I included a patch referencing that fix so that the
new `ld-wrapper` still works on macOS.
Without this fix `ld` fails when response files are enabled and
the command line length exceeds 64KB.  For more details, see:

tpoechtrager/cctools-port#132
@Gabriella439
Copy link
Contributor Author

So I found the root cause of the failures, which I fixed in 1d9ab50. It turns out the failures were indicative of a real problem which is gone now.

I've also stress tested this on our internal Haskell codebase which has a large number of dependencies (both Haskell and non-Haskell dependencies) and is also itself a Haskell project with ~5000 modules (so, a lot to link in a single command, which is what motivated this fix in the first place). I verified that with this change and https://gitlab.haskell.org/ghc/ghc/-/merge_requests/9897 we no longer hit the ARG_MAX linker limit due to consistent use of response files across the stack.

There was one failure that ofborg reported (on x86_64-darwin) that seems to just be a timeout, but the other platforms built fine, so I'm inclined to merge once the upstream PR against cctools-port (tpoechtrager/cctools-port#132) is merged and then I'll adjust this PR to update the patch URL before merging.

The hash remains the same (I checked)
@Gabriella439 Gabriella439 merged commit 79484b1 into NixOS:staging Feb 24, 2023
@Gabriella439 Gabriella439 mentioned this pull request Feb 28, 2023
6 tasks
Gabriella439 added a commit to MercuryTechnologies/nixpkgs that referenced this pull request Feb 28, 2023
When testing NixOS#213831 internally we ran into a build failure caused by
that change for the `proj` package (on `x86_64-linux`).  We're not sure
how that's possible, but the nature of the build failure was that the
network tests for the `proj` package were failing due to missing
certificates, so I fixed the build failure by adding `cacert` as a
test dependency.

It's still not clear (A) why the cert suddenly became necessary after
the change in NixOS#213831 or (B) why the build worked at all before, but
this is probably the right thing to do regardless because the test
suite does have a network component.
Gabriella439 added a commit to MercuryTechnologies/nixpkgs that referenced this pull request Feb 28, 2023
When testing NixOS#213831 internally we ran into a build failure caused by
that change for the `proj` package (on `x86_64-linux`).  We're not sure
how that's possible, but the nature of the build failure was that the
network tests for the `proj` package were failing due to missing
certificates, so I fixed the build failure by adding `cacert` as a
test dependency.

It's still not clear (A) why the cert suddenly became necessary after
the change in NixOS#213831 or (B) why the build worked at all before, but
this is probably the right thing to do regardless because the test
suite does have a network component.
Gabriella439 added a commit that referenced this pull request Mar 1, 2023
When testing #213831 internally we ran into a build failure caused by
that change for the `proj` package (on `x86_64-linux`).  We're not sure
how that's possible, but the nature of the build failure was that the
network tests for the `proj` package were failing due to missing
certificates, so I fixed the build failure by adding `cacert` as a
test dependency.

It's still not clear (A) why the cert suddenly became necessary after
the change in #213831 or (B) why the build worked at all before, but
this is probably the right thing to do regardless because the test
suite does have a network component.
reckenrode added a commit to reckenrode/nixpkgs that referenced this pull request May 31, 2024
This changes ld-wrapper to use a temporary file for the response file
passed to ld instead of using process substitution.

ld64 does not handle long command-lines when reading from the response
file, which defeats the point of using a response file to handle long
command-lines. cctools-port was patched to work around this, but nixpkgs
is now using Apple’s source release directly instead of the port.

Since it’s preferable not to patch Apple’s release heavily (to reduce
the difficulty of updating to new versions and to match upstream’s
behavior), use the approach that was adopted in cc-wrapper to work
around issues with response files in newer versions of clang.

Related PRs (cctools-port):
- NixOS#213831
- tpoechtrager/cctools-port#132

Related PRs (cc-wrapper):
- NixOS#245282
- NixOS#258608
reckenrode added a commit to reckenrode/nixpkgs that referenced this pull request Jun 2, 2024
This changes ld-wrapper to use a temporary file for the response file
passed to ld instead of using process substitution.

ld64 does not handle long command-lines when reading from the response
file, which defeats the point of using a response file to handle long
command-lines. cctools-port was patched to work around this, but nixpkgs
is now using Apple’s source release directly instead of the port.

Since it’s preferable not to patch Apple’s release heavily (to reduce
the difficulty of updating to new versions and to match upstream’s
behavior), use the approach that was adopted in cc-wrapper to work
around issues with response files in newer versions of clang.

Related PRs (cctools-port):
- NixOS#213831
- tpoechtrager/cctools-port#132

Related PRs (cc-wrapper):
- NixOS#245282
- NixOS#258608
reckenrode added a commit to reckenrode/nixpkgs that referenced this pull request Jul 13, 2024
This changes ld-wrapper to use a temporary file for the response file
passed to ld instead of using process substitution.

ld64 does not handle long command-lines when reading from the response
file, which defeats the point of using a response file to handle long
command-lines. cctools-port was patched to work around this, but nixpkgs
is now using Apple’s source release directly instead of the port.

Since it’s preferable not to patch Apple’s release heavily (to reduce
the difficulty of updating to new versions and to match upstream’s
behavior), use the approach that was adopted in cc-wrapper to work
around issues with response files in newer versions of clang.

Related PRs (cctools-port):
- NixOS#213831
- tpoechtrager/cctools-port#132

Related PRs (cc-wrapper):
- NixOS#245282
- NixOS#258608
reckenrode added a commit to reckenrode/nixpkgs that referenced this pull request Jul 16, 2024
This changes ld-wrapper to use a temporary file for the response file
passed to ld instead of using process substitution.

ld64 does not handle long command-lines when reading from the response
file, which defeats the point of using a response file to handle long
command-lines. cctools-port was patched to work around this, but nixpkgs
is now using Apple’s source release directly instead of the port.

Since it’s preferable not to patch Apple’s release heavily (to reduce
the difficulty of updating to new versions and to match upstream’s
behavior), use the approach that was adopted in cc-wrapper to work
around issues with response files in newer versions of clang.

Related PRs (cctools-port):
- NixOS#213831
- tpoechtrager/cctools-port#132

Related PRs (cc-wrapper):
- NixOS#245282
- NixOS#258608
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants