Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot run binary that dlopens shared objects from a {py,sh}_test #6700

Closed
crorvick opened this issue Nov 19, 2018 · 19 comments
Closed

cannot run binary that dlopens shared objects from a {py,sh}_test #6700

crorvick opened this issue Nov 19, 2018 · 19 comments
Labels
P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-Rules-CPP Issues for C++ rules type: bug

Comments

@crorvick
Copy link

Description of the problem / feature request:

I am trying to run a cc_binary target as part of an sh_test. Passing the binary target as a data dependency causes Bazel to build it, but it does not create the runfiles hierarchy.

Feature requests: what underlying problem are you trying to solve with this feature?

The binary I am trying to run depends on the runfiles hierarchy existing so that its dependencies are met. Without this my test is failing unless I explicitly build the cc_binary target first.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

chris@remi ~/code $ git clone https://github.com/crorvick/bazel-hello.git
Cloning into 'bazel-hello'...
remote: Enumerating objects: 11, done.
remote: Counting objects: 100% (11/11), done.
remote: Compressing objects: 100% (10/10), done.
remote: Total 22 (delta 0), reused 9 (delta 0), pack-reused 11
Unpacking objects: 100% (22/22), done.

chris@remi ~/code $ cd bazel-hello
chris@remi ~/code/bazel-hello (master=) $ bazel run //test:hello_test
Starting local Bazel server and connecting to it...
INFO: Analysed target //test:hello_test (20 packages loaded).
INFO: Found 1 target...
Target //test:hello_test up-to-date:
  bazel-bin/test/hello_test
INFO: Elapsed time: 4.226s, Critical Path: 0.60s
INFO: 5 processes: 5 linux-sandbox.
INFO: Build completed successfully, 13 total actions
INFO: Build completed successfully, 13 total actions
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //test:hello_test
-----------------------------------------------------------------------------
Running src/hello/hello
Hello, world!

chris@remi ~/code/bazel-hello (master %=) $ ls bazel-bin/src/hello
hello
hello-2.params
_objs

chris@remi ~/code/bazel-hello (master %=) $ bazel build //src/hello:hello
INFO: Analysed target //src/hello:hello (0 packages loaded).
INFO: Found 1 target...
Target //src/hello:hello up-to-date:
  bazel-bin/src/hello/hello
INFO: Elapsed time: 0.179s, Critical Path: 0.01s
INFO: 0 processes.
INFO: Build completed successfully, 3 total actions

chris@remi ~/code/bazel-hello (master %=) $ ls bazel-bin/src/hello
hello
hello-2.params
hello.runfiles
hello.runfiles_manifest
_objs

chris@remi ~/code/bazel-hello (master %=) $ 

What operating system are you running Bazel on?

Gentoo

What's the output of bazel info release?

release 0.18.0- (@non-git)

If bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.

emerge bazel

@crorvick
Copy link
Author

Is anyone able to comment on whether this is expected behavior or not?

@benjaminp
Copy link
Collaborator

Yes, the runfiles tree of the cc_binary will be merged into the sh_test's runfiles.

@crorvick
Copy link
Author

@benjaminp, got it, thank you for the reply. Does that mean there is a problem with the construction of the test?

def not_really_a_test(name, binary):
    native.sh_test(
        name = name,
        srcs = [ "test.sh" ],
        args = [ "$(location %s)" % binary ],
        data = [ binary ],
    )

We are running the binary using the path passed in on the command line as determined using the $(location ...) mechanism. In the real scenario this binary is dlopen(3)'ing shared objects from the runfiles hierarchy, but this fails when the binary is used in the context of the test unless we explicitly build it first. Any suggestions on what we should be doing?

@benjaminp
Copy link
Collaborator

You'll probably want to locate the binary with the bash runfiles library.

@crorvick
Copy link
Author

@benjaminp, your suggestion sounded very promising and I was happy that we might have an easy fix, but unfortunately this does not seem to help. I mocked up a test script around the actual binary I am having problems with and ran into the same error. The script simply takes a label pointing to a cc_binary and shows what it resolved the label to using rlocation:

#!/bin/bash

set -euo pipefail
# --- begin runfiles.bash initialization ---
if [[ ! -d "${RUNFILES_DIR:-/dev/null}" && ! -f "${RUNFILES_MANIFEST_FILE:-/dev/null}" ]]; then
  if [[ -f "$0.runfiles_manifest" ]]; then
    export RUNFILES_MANIFEST_FILE="$0.runfiles_manifest"
  elif [[ -f "$0.runfiles/MANIFEST" ]]; then
    export RUNFILES_MANIFEST_FILE="$0.runfiles/MANIFEST"
  elif [[ -f "$0.runfiles/bazel_tools/tools/bash/runfiles/runfiles.bash"
  ]]; then
    export RUNFILES_DIR="$0.runfiles"
  fi
fi
if [[ -f "${RUNFILES_DIR:-/dev/null}/bazel_tools/tools/bash/runfiles/runfiles.bash"
]]; then
  source "${RUNFILES_DIR}/bazel_tools/tools/bash/runfiles/runfiles.bash"
elif [[ -f "${RUNFILES_MANIFEST_FILE:-/dev/null}" ]]; then
  source "$(grep -m1 "^bazel_tools/tools/bash/runfiles/runfiles.bash " \
            "$RUNFILES_MANIFEST_FILE" | cut -d ' ' -f 2-)"
else
  echo >&2 "ERROR: cannot find
  @bazel_tools//tools/bash/runfiles:runfiles.bash"
  exit 1
fi

binary=$(rlocation __main__/$1)

echo "CMD: $binary"
"$binary"

I then added an sh_test. Normally this is setup with a macro but I list it explicitly below:

sh_test(
    name = "my_test",
    srcs = ["mock_test.sh"],
    args = "$(location %s)" % "//source/test_binary"],
    data = ["//source/test_binary"],
    deps = ["@bazel_tools//tools/bash/runfiles"],
)

Running the test runs into an error:

$ bazel run //tests:my_test
INFO: Analysed target //tests:my_test (1 packages loaded, 2 targets configured).
INFO: Found 1 target...
Target //tests:my_test up-to-date:
  bazel-bin/tests/my_test
INFO: Elapsed time: 1.502s, Critical Path: 0.00s, Remote (0.00% of the time): [queue: 0.00%, setup: 0.00%, process: 0.00%]
INFO: 0 processes.
INFO: Build completed successfully, 4 total actions
INFO: Build completed successfully, 4 total actions
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //tests:my_test
-----------------------------------------------------------------------------
CMD: /home/crorvick/.cache/bazel/_bazel_crorvick/467abba1d07a1b9bff494b98f194083f/execroot/__main__/bazel-out/k8-opt/bin/tests/my_test.runfiles/__main__/source/test_binary/test_binary
Intel MKL FATAL ERROR: Cannot load libmkl_core.so.

I do see libmkl_core.so in the test scripts runfiles, but the test binary doesn't seem to be able to dlopen it from this location. The binary is using an RPATH relative to $ORIGIN to find this shared object. If I explicitly build the //source/test_binary target, though, the test does work again.

Any suggestions?

@crorvick
Copy link
Author

crorvick commented Nov 22, 2018

@benjaminp @mhlopko, is maybe the underlying problem that Bazel just doesn't robustly support dlopen(3)'ing shared objects? Looking at the RPATH of my binary I see that shared objects that are linked at startup (third-party libraries included in the srcs attribute of a cc_library) are found via $ORIGIN/../../_solib_local/... while those that are dlopen'd (included as data dependencies) are found via $ORIGIN/binary_name.runfiles/.... Clearly Bazel is trying to make this work because it is inserting a path for the libraries that are dlopen'd into the RPATH but this path does not resolve when the binary is run from another Bazel target.

@crorvick
Copy link
Author

@mhlopko @ulfjack @philwo, can anyone comment as to whether this looks like a bug or if I am doing something wrong? Thanks!

@crorvick crorvick changed the title adding binary as data dependency of test does not create runfiles hierarchy cannot run binary that dlopens shared objects from a {py,sh}_test Nov 26, 2018
@hlopko
Copy link
Member

hlopko commented Nov 27, 2018

It is absolutely a bug, quite long standing one. @oquenchil @lberki is this only about adding another rpath or there is something more involved?

@hlopko hlopko added the team-Rules-CPP Issues for C++ rules label Nov 27, 2018
@ulfjack
Copy link
Contributor

ulfjack commented Nov 28, 2018

The path changes depending on the name of the binary that includes the dynamic library. I'm not sure whether adding an rpath will be sufficient.

@lberki
Copy link
Contributor

lberki commented Nov 28, 2018

/cc @laszlocsomor

Well... the problem is that what RPATH you use is actually important no matter where you use the binary. So I'd rather we take a step back and figure out how RPATH and dynamic library loading should look like instead of a quick band-aid.

FWIW, the runfiles of the cc_binary are merged into the runfiles of sh_binary so they should be available there. Worst case, you can try running the library with LD_LIBRARY_PATH=<something useful> from the shell test. Not that it's nice, but that doesn't require changes to Bazel.

@benjaminp
Copy link
Collaborator

I think this is why you want your runtime dynamic linker to support $EXEC_ORIGIN.

@crorvick
Copy link
Author

@benjaminp, can you point me to something that elaborates on this? Is this a thing outside of Google?

@crorvick
Copy link
Author

crorvick commented Nov 28, 2018

@benjaminp, I found a thread on the glibc mailing list that explains $EXEC_ORIGIN. I hope I don't have to rely on support for this being merged in order to resolve my issue as I would expect it would be years before I would see it. It seems like this is more about working around implementation details of Bazel than something actually required for performing hermetic builds.

Note that Bazel seems to add <binary>.runfiles/path/to/libs/for/dlopening to the RPATH for supporting this use case but I do not see that path in my test's runfiles hierarchy, so I'm not even sure the idea of re-pointing $ORIGIN is a solution in itself.

@hlopko hlopko added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Jan 14, 2019
@hlopko hlopko self-assigned this Jan 14, 2019
@hlopko hlopko removed their assignment Dec 6, 2019
@crorvick
Copy link
Author

@hlopko, I see you unassigned the issue but curious if any progress was made? I haven't been able to upgrade Bazel for a while due to difficulties with getting away from my CROSSTOOLS file.

@c-mita c-mita added P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) and removed P2 We'll consider working on this in future. (Assignee optional) labels Nov 23, 2020
@github-actions
Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 2+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Apr 12, 2023
@crorvick
Copy link
Author

Was the $EXEC_ORIGIN patch merged into glibc?

@github-actions github-actions bot removed the stale Issues or PRs that are stale (no activity for 30 days) label Apr 13, 2023
@novas0x2a
Copy link

EXEC_ORIGIN was never merged, no.

@fmeum
Copy link
Collaborator

fmeum commented May 2, 2024

@crorvick If this is still an issue, could you update your reproducer to demonstrate the problem you are seeing when using the runfiles library? A few issues related to missing RPATH entries have been fixed since you last tried this.

@fmeum
Copy link
Collaborator

fmeum commented May 30, 2024

With 75e5d2f and cc_shared_library, I'm not aware of any cases that don't work. The commit includes a py_test that uses the runfiles library to dlopen a shared object that itself has shared library dependencies.

Please give this a try with Bazel 7.2.0rc2 and create a new issue if problems persist.

@fmeum fmeum closed this as completed May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-Rules-CPP Issues for C++ rules type: bug
Projects
None yet
Development

No branches or pull requests

10 participants