Skip to content
This repository has been archived by the owner on Aug 16, 2023. It is now read-only.

BinaryProvider 0.5.9 breaks packages #194

Closed
fingolfin opened this issue May 7, 2020 · 31 comments · Fixed by #195
Closed

BinaryProvider 0.5.9 breaks packages #194

fingolfin opened this issue May 7, 2020 · 31 comments · Fixed by #195

Comments

@fingolfin
Copy link
Member

As the title says. I see errors like this:

   Building Polymake → `~/.julia/packages/Polymake/wYSJS/deps/build.log`
┌ Error: Error building `Polymake`:
│ [ Info: MacOS(:x86_64, compiler_abi=CompilerABI(:gcc8, :cxx03))
│ [ Info: Downloading https://github.com/JuliaPackaging/Yggdrasil/releases/download/MPFR-v4.0.2-1/MPFR.v4.0.2.x86_64-apple-darwin14.tar.gz to /Users/mhorn/.julia/packages/Polymake/wYSJS/deps/usr/downloads/MPFR.v4.0.2.x86_64-apple-darwin14.tar.gz...
│ ERROR: LoadError: LoadError: LibraryProduct(nothing, ["libmpfr"], :libmpfr, "Prefix(/Users/mhorn/.julia/packages/Polymake/wYSJS/deps/usr)") is not satisfied, cannot generate deps.jl!
│ Stacktrace:
│  [1] error(::String) at ./error.jl:33
│  [2] write_deps_file(::String, ::Array{LibraryProduct,1}; verbose::Bool, isolate::Bool) at /Users/mhorn/.julia/packages/BinaryProvider/GeAtj/src/Products.jl:419
│  [3] top-level scope at /Users/mhorn/.julia/packages/Polymake/wYSJS/deps/build_MPFR.v4.0.2.jl:48
│  [4] include(::Module, ::String) at ./Base.jl:377
│  [5] include(::String) at ./loading.jl:1124
│  [6] top-level scope at /Users/mhorn/.julia/packages/Polymake/wYSJS/deps/build.jl:77
│  [7] eval at ./boot.jl:331 [inlined]
│  [8] evalfile(::String, ::Array{String,1}) at ./loading.jl:1120 (repeats 2 times)
│  [9] top-level scope at /Users/mhorn/.julia/packages/Polymake/wYSJS/deps/build.jl:96
│  [10] include(::String) at ./client.jl:439
│  [11] top-level scope at none:5
│ in expression starting at /Users/mhorn/.julia/packages/Polymake/wYSJS/deps/build_MPFR.v4.0.2.jl:48
│ in expression starting at /Users/mhorn/.julia/packages/Polymake/wYSJS/deps/build.jl:77

or this

   Building GAP → `~/Projekte/OSCAR/GAP.jl/deps/build.log`
[ Info: Downloading https://github.com/bicycle1885/ZlibBuilder/releases/download/v1.0.4/Zlib.v1.2.11.x86_64-apple-darwin14.tar.gz to /Users/mhorn/Projekte/OSCAR/GAP.jl/deps/usr/downloads/Zlib.v1.2.11.x86_64-apple-darwin14.tar.gz...
[00:52:19] ######################################################################## 100.0%##O#- #    [00:52:20] ######################################################################## 100.0%
ERROR: LoadError: LoadError: LibraryProduct(nothing, ["libz"], :libz, "Prefix(/Users/mhorn/Projekte/OSCAR/GAP.jl/deps/usr)") is not satisfied, cannot generate deps.jl!
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] write_deps_file(::String, ::Array{LibraryProduct,1}; verbose::Bool, isolate::Bool) at /Users/mhorn/.julia/packages/BinaryProvider/GeAtj/src/Products.jl:419
 [3] top-level scope at /Users/mhorn/Projekte/OSCAR/GAP.jl/deps/build_Zlib.v1.2.11.jl:48
 [4] include(::Module, ::String) at ./Base.jl:377
 [5] include(::String) at /Users/mhorn/Projekte/OSCAR/GAP.jl/deps/build.jl:43
 [6] top-level scope at /Users/mhorn/Projekte/OSCAR/GAP.jl/deps/build.jl:43
 [7] eval at ./boot.jl:331 [inlined]
 [8] top-level scope at /Users/mhorn/Projekte/OSCAR/GAP.jl/deps/build.jl:43
 [9] include(::String) at ./client.jl:439
 [10] top-level scope at none:5
in expression starting at /Users/mhorn/Projekte/OSCAR/GAP.jl/deps/build_Zlib.v1.2.11.jl:48
in expression starting at /Users/mhorn/Projekte/OSCAR/GAP.jl/deps/build.jl:36

And on the Julia slack, I saw other people report similar issues.

CC @fieker

@fingolfin fingolfin changed the title BinaryProvide 0.5.9 breaks packages BinaryProvider 0.5.9 breaks packages May 7, 2020
@staticfloat
Copy link
Member

Can you set verbose = true and do a manual Pkg.build()? I see you have your own hand-crafted build.jl that reads in the other build.jl files, so you'll need to evaluate that inside of the modules you're creating here: https://github.com/oscar-system/GAP.jl/blob/master/deps/build.jl#L41-L43

@fingolfin
Copy link
Member Author

OK, I edited build_Zlib.v1.2.11.jl to hardcode verbose = true. Got this:

(@v1.4) pkg> build -v GAP
   Building GAP → `~/Projekte/OSCAR/GAP.jl/deps/build.log`
[ Info: Directory /Users/mhorn/Projekte/OSCAR/GAP.jl/deps/usr/lib does not exist!
[ Info: Downloading https://github.com/bicycle1885/ZlibBuilder/releases/download/v1.0.4/Zlib.v1.2.11.x86_64-apple-darwin14.tar.gz to /Users/mhorn/Projekte/OSCAR/GAP.jl/deps/usr/downloads/Zlib.v1.2.11.x86_64-apple-darwin14.tar.gz...
[01:47:22] ######################################################################## 100.0%##O=#  #   [01:47:22] ######################################################################## 100.0%
[ Info: No hash cache found
[ Info: Calculated hash a28fb652f94f1b1548197a3b3c9a71ec56fa50bb7d39454c5e12320335d30934 for file /Users/mhorn/Projekte/OSCAR/GAP.jl/deps/usr/downloads/Zlib.v1.2.11.x86_64-apple-darwin14.tar.gz
[ Info: Installing /Users/mhorn/Projekte/OSCAR/GAP.jl/deps/usr/downloads/Zlib.v1.2.11.x86_64-apple-darwin14.tar.gz into /Users/mhorn/Projekte/OSCAR/GAP.jl/deps/usr
[ Info: Found a valid dl path libz.1.2.11.dylib while looking for libz
[ Info: /Users/mhorn/Projekte/OSCAR/GAP.jl/deps/usr/lib/libz.1.2.11.dylib matches our search criteria of libz
[ Info: Found a valid dl path libz.1.dylib while looking for libz
[ Info: /Users/mhorn/Projekte/OSCAR/GAP.jl/deps/usr/lib/libz.1.dylib matches our search criteria of libz
[ Info: Found a valid dl path libz.dylib while looking for libz
[ Info: /Users/mhorn/Projekte/OSCAR/GAP.jl/deps/usr/lib/libz.dylib matches our search criteria of libz
[ Info: Could not locate libz inside /Users/mhorn/Projekte/OSCAR/GAP.jl/deps/usr/lib
ERROR: LoadError: LoadError: LibraryProduct(nothing, ["libz"], :libz, "Prefix(/Users/mhorn/Projekte/OSCAR/GAP.jl/deps/usr)") is not satisfied, cannot generate deps.jl!
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] write_deps_file(::String, ::Array{LibraryProduct,1}; verbose::Bool, isolate::Bool) at /Users/mhorn/.julia/packages/BinaryProvider/GeAtj/src/Products.jl:419
 [3] top-level scope at /Users/mhorn/Projekte/OSCAR/GAP.jl/deps/build_Zlib.v1.2.11.jl:48
 [4] include(::String) at ./client.jl:439
 [5] top-level scope at /Users/mhorn/Projekte/OSCAR/GAP.jl/deps/build.jl:44
 [6] include(::String) at ./client.jl:439
 [7] top-level scope at none:5
in expression starting at /Users/mhorn/Projekte/OSCAR/GAP.jl/deps/build_Zlib.v1.2.11.jl:48
in expression starting at /Users/mhorn/Projekte/OSCAR/GAP.jl/deps/build.jl:36
┌ Error: Error building `GAP`:
└ @ Pkg.Operations /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.4/Pkg/src/Operations.jl:892

BTW, similar error with a package not by us:

(@v1.4) pkg> add Libtask
  Resolving package versions...
  Installed Libtask ─ v0.4.0
   Updating `~/.julia/environments/v1.4/Project.toml`
  [6f1fad26] + Libtask v0.4.0
   Updating `~/.julia/environments/v1.4/Manifest.toml`
  [6f1fad26] + Libtask v0.4.0
   Building Libtask → `~/.julia/packages/Libtask/Q1ugc/deps/build.log`
┌ Error: Error building `Libtask`:
│ [ Info: Downloading https://github.com/JuliaBinaryWrappers/Libtask_jll.jl/releases/download/Libtask-v0.3.1+0/Libtask.v0.3.1.x86_64-apple-darwin14.tar.gz to /Users/mhorn/.julia/packages/Libtask/Q1ugc/deps/usr/downloads/Libtask.v0.3.1.x86_64-apple-darwin14.tar.gz...
│ ERROR: LoadError: LibraryProduct(nothing, ["libtask_v1_3"], :libtask_v1_3, "Prefix(/Users/mhorn/.julia/packages/Libtask/Q1ugc/deps/usr)") is not satisfied, cannot generate deps.jl!
│ Stacktrace:
│  [1] error(::String) at ./error.jl:33
│  [2] write_deps_file(::String, ::Array{LibraryProduct,1}; verbose::Bool, isolate::Bool) at /Users/mhorn/.julia/packages/BinaryProvider/GeAtj/src/Products.jl:419
│  [3] top-level scope at /Users/mhorn/.julia/packages/Libtask/Q1ugc/deps/build.jl:58
│  [4] include(::String) at ./client.jl:439
│  [5] top-level scope at none:5
│ in expression starting at /Users/mhorn/.julia/packages/Libtask/Q1ugc/deps/build.jl:58

@staticfloat
Copy link
Member

What happens if you try to dlopen("/Users/mhorn/Projekte/OSCAR/GAP.jl/deps/usr/lib/libz.dylib")?

@fingolfin
Copy link
Member Author

julia> using Libdl

julia> dlopen("/Users/mhorn/Projekte/OSCAR/GAP.jl/deps/usr/lib/libz.dylib")
Ptr{Nothing} @0x00007fcb71cdf830

@fingolfin
Copy link
Member Author

Here is another one I just run into, with yet another package:

   Building FFMPEG → `~/.julia/packages/FFMPEG/guN1x/deps/build.log`
┌ Error: Error building `FFMPEG`:
│ [ Info: Downloading https://github.com/JuliaBinaryWrappers/Bzip2_jll.jl/releases/download/Bzip2-v1.0.6+1/Bzip2.v1.0.6.x86_64-apple-darwin14.tar.gz to /Users/mhorn/.julia/packages/FFMPEG/guN1x/deps/usr/downloads/Bzip2.v1.0.6.x86_64-apple-darwin14.tar.gz...
│ ERROR: LoadError: LoadError: LibraryProduct(nothing, ["libbz2"], :libbzip2, "Prefix(/Users/mhorn/.julia/packages/FFMPEG/guN1x/deps/usr)") is not satisfied, cannot generate deps.jl!
│ Stacktrace:
│  [1] error(::String) at ./error.jl:33
│  [2] write_deps_file(::String, ::Array{LibraryProduct,1}; verbose::Bool, isolate::Bool) at /Users/mhorn/.julia/packages/BinaryProvider/GeAtj/src/Products.jl:419
│  [3] top-level scope at /Users/mhorn/.julia/packages/FFMPEG/guN1x/deps/build_Bzip2.v1.0.6.jl:48
│  [4] include(::Module, ::String) at ./Base.jl:377
│  [5] include(::String) at /Users/mhorn/.julia/packages/FFMPEG/guN1x/deps/build.jl:44
│  [6] top-level scope at /Users/mhorn/.julia/packages/FFMPEG/guN1x/deps/build.jl:45
│  [7] include(::String) at ./client.jl:439
│  [8] top-level scope at none:5
│ in expression starting at /Users/mhorn/.julia/packages/FFMPEG/guN1x/deps/build_Bzip2.v1.0.6.jl:48
│ in expression starting at /Users/mhorn/.julia/packages/FFMPEG/guN1x/deps/build.jl:38
[09:42:33] ######################################################################## 100.0%##O#- #    [09:42:34] ######################################################################## 100.0%
└ @ Pkg.Operations /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.4/Pkg/src/Operations.jl:892

@fingolfin
Copy link
Member Author

It seems this issue only happens for me on macOS; I just tried it on a Linux server, and everything (?) seems fine so far

@fingolfin
Copy link
Member Author

BTW, is it on purpose that the master branch is stuck at v0.5.5? The only branch containing 0.5.9 seems to be rollback, is that the official "development" branch then?!

@MagnusKoudahl
Copy link

Chiming in to report similar issues with Libtask on Arch Linux. Manually running the build file, ie julia build.jl allowed the package to build as normal. FFMPEG works out of the box

@giordano
Copy link
Member

giordano commented May 8, 2020

BTW, is it on purpose that the master branch is stuck at v0.5.5? The only branch containing 0.5.9 seems to be rollback, is that the official "development" branch then?!

History is a bit convoluted, there are two branches: master has the yet-to-be-released v0.6.0 which will require Julia v1.3, while rollback has the v0.5 series which supports also older versions of Julia.

Chiming in to report similar issues with Libtask on Arch Linux. Manually running the build file, ie julia build.jl allowed the package to build as normal. FFMPEG works out of the box

I also use Arch Linux, but I can't reproduce any of the issues reported. For example, Libtask:

(tmp) pkg> add Libtask
   Updating registry at `~/.julia/registries/General`
   Updating git-repo `https://github.com/JuliaRegistries/General.git`
  Resolving package versions...
  Installed Libtask ─ v0.4.0
   Updating `/tmp/Project.toml`
  [6f1fad26] + Libtask v0.4.0
   Updating `/tmp/Manifest.toml`
  [b99e7846] + BinaryProvider v0.5.9
  [6f1fad26] + Libtask v0.4.0
  [2a0f44e3] + Base64 
  [ade2ca70] + Dates 
  [b77e0a4c] + InteractiveUtils 
  [76f85450] + LibGit2 
  [8f399da3] + Libdl 
  [56ddb016] + Logging 
  [d6f4376e] + Markdown 
  [44cfe95a] + Pkg 
  [de0858da] + Printf 
  [3fa0cd96] + REPL 
  [9a3f8284] + Random 
  [ea8e919c] + SHA 
  [9e88b42a] + Serialization 
  [6462fe0b] + Sockets 
  [cf7118a7] + UUIDs 
  [4ec0a83e] + Unicode 
   Building Libtask → `~/.julia/packages/Libtask/Q1ugc/deps/build.log`

(tmp) pkg> build Libtask
   Building Libtask → `~/.julia/packages/Libtask/Q1ugc/deps/build.log`

(tmp) pkg>

@yufongpeng
Copy link

(@Strings) pkg> build
   Building PCRE2 → `C:\Users\sciph\.julia\packages\PCRE2\fIxtF\deps\build.log`
┌ Error: Error building `PCRE2`: 
│ ERROR: LoadError: LibraryProduct(nothing, ["libpcre2-8"], :libpcre2_8, "Prefix(C:\\Users\\sciph\\.julia\\packages\\PCRE2\\fIxtF\\deps\\usr)") is not satisfied, cannot generate deps.jl!
│ Stacktrace:
│  [1] error(::String) at .\error.jl:33
│  [2] write_deps_file(::String, ::Array{LibraryProduct,1}; verbose::Bool, isolate::Bool) at C:\Users\sciph\.julia\packages\BinaryProvider\GeAtj\src\Products.jl:419
│  [3] write_deps_file(::String, ::Array{LibraryProduct,1}) at C:\Users\sciph\.julia\packages\BinaryProvider\GeAtj\src\Products.jl:400
│  [4] top-level scope at C:\Users\sciph\.julia\packages\PCRE2\fIxtF\deps\build.jl:43
│  [5] include(::String) at .\client.jl:439
│  [6] top-level scope at none:5
│ in expression starting at C:\Users\sciph\.julia\packages\PCRE2\fIxtF\deps\build.jl:43
└ @ Pkg.Operations D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.4\Pkg\src\Operations.jl:892

I encounter the same issue, with Window10, and julia v1.4.1.

@fingolfin
Copy link
Member Author

I've now also experienced this on a Gentoo machine:

(v1.4) pkg> add GAP
  Resolving package versions...
  Installed GAP ─ v0.4.0
   Updating `/var/autofs/nfs/cip/alggeom/horn/.julia/environments/v1.4/Project.toml`
  [c863536a] + GAP v0.4.0
   Updating `/var/autofs/nfs/cip/alggeom/horn/.julia/environments/v1.4/Manifest.toml`
  [c863536a] + GAP v0.4.0
   Building GAP → `~/.julia/packages/GAP/vvuPQ/deps/build.log`
┌ Error: Error building `GAP`:
│ [ Info: Downloading https://github.com/bicycle1885/ZlibBuilder/releases/download/v1.0.4/Zlib.v1.2.11.x86_64-linux-gnu.tar.gz to /users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/usr/downloads/Zlib.v1.2.11.x86_64-linux-gnu.tar.gz...
│ ERROR: LoadError: LoadError: LibraryProduct(nothing, ["libz"], :libz, "Prefix(/users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/usr)") is not satisfied, cannot generate deps.jl!
│ Stacktrace:
│  [1] error(::String) at ./error.jl:33
│  [2] write_deps_file(::String, ::Array{LibraryProduct,1}; verbose::Bool, isolate::Bool) at /users/cip/alggeom/horn/.julia/packages/BinaryProvider/GeAtj/src/Products.jl:419
│  [3] top-level scope at /users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/build_Zlib.v1.2.11.jl:48
│  [4] include(::Module, ::String) at ./Base.jl:377
│  [5] include(::String) at /users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/build.jl:43
│  [6] top-level scope at /users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/build.jl:43
│  [7] eval at ./boot.jl:331 [inlined]
│  [8] top-level scope at /users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/build.jl:43
│  [9] include(::String) at ./client.jl:439
│  [10] top-level scope at none:5
│ in expression starting at /users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/build_Zlib.v1.2.11.jl:48
│ in expression starting at /users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/build.jl:36

The same (add GAP) worked fine on another machine (running under ArchLinux, not that I think it necessarily matters).

@giordano
Copy link
Member

Can someone please try and debug the issue on the platforms where they can reproduce it? It's a bit hard for me to understand what's going wrong without being able to experience it.

@fingolfin
Copy link
Member Author

I'd love to but I have no idea where to start? I guess the "satisfied" function needs to be scrutinized? I'll try tonight, but some guidance would be appreciated

@giordano
Copy link
Member

This should be the main difference between v0.5.8 and v0.5.9: e6710a7. For the rest I backported some changes there were on master but not on rollback. This is the full diff: v0.5.8...v0.5.9

@yufongpeng
Copy link

I think the problem is isolated dlopen in Products.jl:

function write_deps_file(depsjl_path::AbstractString, products::Vector{P};
                         verbose::Bool=false) where {P <: Product}	                         verbose::Bool=false, isolate::Bool=true) where {P <: Product}
.
.
.
# Begin by ensuring that we can satisfy every product RIGHT NOW
    for p in products
        if !satisfied(p; verbose=verbose, isolate=isolate)
            error("$p is not satisfied, cannot generate deps.jl!")
        end
    end
.
.
.

If I turn isolate into false, it works.
I don't really know what "isolated dlopen" is, though.

@giordano
Copy link
Member

I think the problem is isolated dlopen in Products.jl:

This is interesting, since e6710a7 was meant to fix another bug, observed in jump-dev/CSDP.jl#56.

I don't really know what "isolated dlopen" is, though.

It runs dlopen in a new Julia session.

@fingolfin
Copy link
Member Author

So, interestingly, I can't reproduce the issue if I do it "interactively". Here is a verbatim copy of something I just tried; note that I first get an error when satisfied for libz is called as part of the build process of GAP.jl; but it works when I call it manually:

$ julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.4.1 (2020-04-14)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(v1.4) pkg> build -v GAP
   Building GAP → `~/.julia/packages/GAP/vvuPQ/deps/build.log`
ERROR: LoadError: LoadError: LibraryProduct(nothing, ["libz"], :libz, "Prefix(/users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/usr)") is not satisfied, cannot generate deps.jl!
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] write_deps_file(::String, ::Array{LibraryProduct,1}; verbose::Bool, isolate::Bool) at /users/cip/alggeom/horn/.julia/packages/BinaryProvider/GeAtj/src/Products.jl:419
 [3] top-level scope at /users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/build_Zlib.v1.2.11.jl:48
 [4] include(::Module, ::String) at ./Base.jl:377
 [5] include(::String) at /users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/build.jl:43
 [6] top-level scope at /users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/build.jl:43
 [7] eval at ./boot.jl:331 [inlined]
 [8] top-level scope at /users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/build.jl:43
 [9] include(::String) at ./client.jl:439
 [10] top-level scope at none:5
in expression starting at /users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/build_Zlib.v1.2.11.jl:48
in expression starting at /users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/build.jl:36
┌ Error: Error building `GAP`:
└ @ Pkg.Operations /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Pkg/src/Operations.jl:892

julia> using BinaryProvider ; prefix = abspath(joinpath(dirname(Base.find_package("GAP")), "..", "deps", "usr"))
"/users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/usr"

julia> p = LibraryProduct(Prefix(prefix), ["libz"], :libz)
LibraryProduct(nothing, ["libz"], :libz, "Prefix(/users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/usr)")

julia> satisfied(p; verbose=true, isolate=true)
[ Info: Found a valid dl path libz.so while looking for libz
[ Info: /users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/usr/lib/libz.so matches our search criteria of libz
true

@fingolfin
Copy link
Member Author

I've used the traditional debugging method of "insert lots of println to get a better idea", and am ever more baffled... Here is what the code inside isolate now looks in my hacked up version:

                        # Isolated dlopen is a lot slower, but safer
                        @show dl_path
                        @show dl_esc_path = replace(dl_path, "\\"=>"\\\\")
                        println("Base.julia_cmd() = ", Base.julia_cmd())
                        if success(`$(Base.julia_cmd()) -e "import Libdl; Libdl.dlopen(\"$(dl_esc_path)\")"`)
                            println("SUCCESS with ", dl_path)
                            return dl_path
                        else
                            println("FAILURE with ", dl_path)
                        end

And here is what I see:

(v1.4) pkg> build -v GAP
   Building GAP → `~/.julia/packages/GAP/vvuPQ/deps/build.log`
dl_path = "/users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/usr/lib/libz.so"
dl_esc_path = replace(dl_path, "\\" => "\\\\") = "/users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/usr/lib/libz.so"
Base.julia_cmd() = `/var/autofs/nfs/merkur.soft/julia-1.4.1/bin/julia -Cnative -J/var/autofs/nfs/merkur.soft/julia-1.4.1/lib/julia/sys.so -O0 -g1`
FAILURE with /users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/usr/lib/libz.so
dl_path = "/users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/usr/lib/libz.so.1"
dl_esc_path = replace(dl_path, "\\" => "\\\\") = "/users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/usr/lib/libz.so.1"
Base.julia_cmd() = `/var/autofs/nfs/merkur.soft/julia-1.4.1/bin/julia -Cnative -J/var/autofs/nfs/merkur.soft/julia-1.4.1/lib/julia/sys.so -O0 -g1`
FAILURE with /users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/usr/lib/libz.so.1
dl_path = "/users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/usr/lib/libz.so.1.2.11"
dl_esc_path = replace(dl_path, "\\" => "\\\\") = "/users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/usr/lib/libz.so.1.2.11"
Base.julia_cmd() = `/var/autofs/nfs/merkur.soft/julia-1.4.1/bin/julia -Cnative -J/var/autofs/nfs/merkur.soft/julia-1.4.1/lib/julia/sys.so -O0 -g1`
FAILURE with /users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/usr/lib/libz.so.1.2.11
ERROR: LoadError: LoadError: LibraryProduct(nothing, ["libz"], :libz, "Prefix(/users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/usr)") is not satisfied, cannot generate deps.jl!
Stacktrace:
 [1] error(::String) at ./error.jl:33

And yet, if I directly perform the "isolated" call to Libdl.dlopen, it works fine:

$ julia -e "import Libdl; Libdl.dlopen(\"/users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/usr/lib/libz.so\")" ; echo $?
0

"But", one may complain, " this is not exactly what the code in BinaryProvider does, it uses Base.julia_cmd()!". OK, let's try that; I already printed it above for that reason:

$ /var/autofs/nfs/merkur.soft/julia-1.4.1/bin/julia -Cnative -J/var/autofs/nfs/merkur.soft/julia-1.4.1/lib/julia/sys.so -O0 -g1 -e "import Libdl; Libdl.dlopen(\"/users/cip/alggeom/horn/.julia/packages/GAP/vvuPQ/deps/usr/lib/libz.so\")" ; echo $?
0

So this still works. At this point the only thing I can think of is "what's the environment like". And so I start julia and do println(ENV); and I also modify the code above to also do that, and diff the result... and there is one difference: when running the code inside locate that I modified, one extra env var is set (and presumably passed down to the subprocess:

JULIA_LOAD_PATH=@:/tmp/jl_PeoWcW

I assume this is being set by sandbox() from Pkg.jl.

I can investigate further tomorrow but now need to get some sleep.

@fingolfin
Copy link
Member Author

@giordano any other ideas what I could/should try? This issue is really concerning to me; while it only affects a fraction of users, for those it's a showstopper. I've considered to just hardcode a dependency on BinaryProvider 0.5.8 in all our packages, but that's not nice either.

@giordano
Copy link
Member

If you have a system where you can reproduce the issue, can you play with the value of the isolate keyword to see if there is a combination that makes it work?

@giordano
Copy link
Member

BTW, I don't know if it's clear that I have zero experience with BinaryProvider, #192 has been my first and only contribution so far 🙂

@fingolfin
Copy link
Member Author

@giordano ah, sorry, I did in fact not realize it :-) Just had a look at https://github.com/JuliaPackaging/BinaryProvider.jl/graphs/contributors and I guess the person to bug about this is @staticfloat .

Sorry for being dogged about this, but I really think this is a serious issue that potentially affects many, many users (and then is basically impossible to debug for them); in my eyes, BinaryProvider is really a central package in the Julia ecosystem, and quite frankly the fact that it's been broken for some of us for >11 days now troubles me.

@fingolfin
Copy link
Member Author

And yeah, turning isolate off "fixes" it, but that doesn't explain the underlying problem (which seems to be related to the Pkg sandboxing code -- see also my extensive comments above).

I have a machine where I can reproduce this 100% and still am willing to run any kind of tests there, but I need some guidance, because e.g. understanding and then debugging the Pkg sandbox code looks like a LOT of work if done blindly.

@GregPlowman
Copy link

I have the same problem on Windows 10. Several packages fail to build using BinaryProvider v0.5.9.
Currently I have BinaryProvider pinned at v0.5.8 and all seems fine, with all packages building successfully.

I'm willing to help test on Windows 10, if someone can provide guidance.

@giordano
Copy link
Member

I'd step into write_deps_file and try to understand why having isolate = false on this line makes so much difference.

BTW, to fix the issue can't you just set isolate = false in your calls to write_deps_file?

@fingolfin
Copy link
Member Author

I am not sure people saw this, so let me repeat parts of my comments from above... First, some background: I have two Linux systems I test on a lot; on one the issue occurs, on the other it does not. They run two different Linux flavors, and I am guessing some minor difference in there triggers the bug. Now, I did a lot of testing on the two systems in parallel.. identical Julia 1.4.1 setups.

First observation: if I run satisfied(LibraryProduct(Prefix(prefix), ["libz"], :libz); verbose=true, isolate=true) interactively, it works on both machines. But if it gets run as part of the build.jl of a package, it only works on one of the two.

If I extract the shell command which gets executed, i.e., $(Base.julia_cmd()) -e "import Libdl; Libdl.dlopen(\"$(dl_esc_path)\")" and run that directly (to be clear: I am printing that command, and executing that), then again, it works fine on both machines!

So it's not the command that is being invoked "in isolation", but rather something else that effects it... in its "environment"... so let's look at environment variables:

I then went on and inserted print statements, including println(ENV), into the locate function of BinaryProvider (this is the function which eventually performs the relevant call to dlopen). As it turns out, everything I can think of is identical when running the script interactively vs. running it from build.jl, with one exception: the environment variable JULIA_LOAD_PATH=@:/tmp/jl_PeoWcW is only set when called from build.jl, not when doing this interactively. I believe this environment variable is set by Pkg.jl as parts of its sandboxing code. So perhaps this part of the ingredients for the difference?

Alas, I tried to "disable" JULIA_LOAD_PATH before running the command, but no luck so far.

At this point I can't think of anything else that could be a source of difference, or anything else to try. I am happy to perform more tests, but as I said before, I need suggestions.

And to be clear, the reason isolate = true matters is because dlopen fails in the Julia launched as a subprocess, but not in the active Julia process.

@benlorenz
Copy link

benlorenz commented May 20, 2020

If you can reproduce this under linux you could try running the subprocess under strace or with debugging from ld-linux by adding one of the following to the julia command in locate:

env LD_DEBUG=libs:files LD_DEBUG_OUTPUT=/tmp/ldlinux.log $(Base.julia_cmd()) ...
strace -o /tmp/dlopen-err.log $(Base.julia_cmd()) ...

Maybe this gives an idea why dlopen fails, at least for linux, I have no idea if something similar can be done on mac os.

I have tried on gentoo, manjaro and opensuse and so far cannot reproduce this.

@fingolfin
Copy link
Member Author

fingolfin commented May 22, 2020

Thanks @benlorenz that was very helpful! Both command told me that in the "bad" case (triggered via e.g. build -v GAP) there is not even an attempt to open libz.so. In the good case (= copy&pasting the cmd as printed by Julia in my hacked version of locate into a shell session), the dlopen is clearly visible. The ldlinux.log reveals no further info; but in the strace log gave the key insight: the problem (at least for me!) are startup files, which BinaryBuilder should ignore. I submitted PR #195 which resolves the issue for me. Perhaps @MagnusKoudahl @Jejulia @GregPlowman can check whether on their affected systems there is also a .julia/config/startup.jl file, and whether my PR resolves the issue for them?

@GregPlowman
Copy link

Ignoring startup file seems to have fixed the issue for me. Thanks!

Julia v1.4.1 on Windows 10:
BinaryProvider v0.5.8 - Pkg.build successful for all packages
BinaryProvider v0.5.9 - Pkg.build failed for several packages
BinaryProvider v0.5.9 with patch --startup-file=no applied manually - all builds successful

@giordano
Copy link
Member

Awesome, thanks for the confirmation! Note that the fix is already in v0.5.10: JuliaRegistries/General#15196

@giordano giordano linked a pull request May 23, 2020 that will close this issue
@giordano
Copy link
Member

Ok, it does look like the problem was that the startup file was creating some issues: JuliaMath/DecFP.jl#126 (comment) is a further evidence of it, and also on Slack a user reported that they had a non-trivial startup file that was probably making dlopen fail. For these reasons, I'm going to close this as fixed by #195.

While BinaryProvider was not doing the right thing by loading the startup file, this points out that all users experiencing the issue have code in their init file that cause conflicts when running Julia non-interactively, so it's probably a good occasion to revisit its content. For example, if you use Revise or OhMyRepl I'd like to remind you that these are the recommended configuration to use:

Unconditionally loading these packaging and not guarding them with a try-catch block can be an issue in other places besides BinaryProvider.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants