Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xgboost 1.0.0 #50467

Closed
wants to merge 1 commit into from
Closed

xgboost 1.0.0 #50467

wants to merge 1 commit into from

Conversation

ankane
Copy link
Contributor

@ankane ankane commented Feb 20, 2020

  • Have you followed the guidelines for contributing?
  • Have you checked that there aren't other open pull requests for the same formula update/change?
  • Have you built your formula locally with brew install --build-from-source <formula>, where <formula> is the name of the formula you're submitting?
  • Is your test running fine brew test <formula>, where <formula> is the name of the formula you're submitting?
  • Does your build pass brew audit --strict <formula> (after doing brew install <formula>)?

XGBoost 1.0 added support for OpenMP with clang (which is important for performance).

@hcho3
Copy link

hcho3 commented Feb 20, 2020

@ankane Is it possible to also install the Python binding along with libxgboost.so? It would be useful to many users, e.g. dmlc/xgboost#4949. On my local machine, I can run

cd python-package
python3 setup.py install

to install the Python binding.

@ankane
Copy link
Contributor Author

ankane commented Feb 20, 2020

@hcho3 I think that'll be more difficult to do and add a lot of additional Homebrew dependencies (python, numpy, scipy). The Homebrew maintainers may have an opinion on this as well.

I think an alternative approach is to publish a prebuilt wheel with the dylib. LightGBM does this. The downside is if you decide to build with OpenMP, Mac users will also need to run brew install libomp to use it (unless there's a way for the XGBoost library to dynamically detect it).

@hcho3
Copy link

hcho3 commented Feb 20, 2020

@ankane I took a shot at adding the Python binding:

diff --git a/Formula/xgboost.rb b/Formula/xgboost.rb
index 16057d0dfe..dacc42c02a 100644
--- a/Formula/xgboost.rb
+++ b/Formula/xgboost.rb
@@ -14,6 +14,9 @@ class Xgboost < Formula

   depends_on "cmake" => :build
   depends_on "libomp"
+  depends_on "python"
+  depends_on "numpy"
+  depends_on "scipy"

   def install
     mkdir "build" do
@@ -29,6 +32,9 @@ class Xgboost < Formula
       system "make"
       system "make", "install"
     end
+    cd "python-package" do
+      system "python3", "setup.py", "install"
+    end
     pkgshare.install "demo"
   end

@@ -38,5 +44,6 @@ class Xgboost < Formula
       cp "../binary_classification/mushroom.conf", "."
       system "#{bin}/xgboost", "mushroom.conf"
     end
+    system "python3", "-c", "import xgboost; print(xgboost.__version__)"
   end
 end

As for whether it's okay to add three dependencies (Python, NumPy, SciPy), let us wait for the opinion of Homebrew maintainers. If it's not okay, I can think of other options.

@SMillerDev
Copy link
Member

Are those runtime dependencies then? Or can we get away with buildtime-only?

@hcho3
Copy link

hcho3 commented Feb 20, 2020

Yes, they are runtime dependencies, since we would like users to be able to import xgboost module from their Python session.

@SMillerDev
Copy link
Member

For runtime dependencies that's a lot to add for something homebrew has never had a request for up till now. I'm not in favor of this, that wouldn't stop you from keeping it in a tap though.

@hcho3
Copy link

hcho3 commented Feb 20, 2020

@SMillerDev Got it. I will create a separate formula xgboost-python for the Python binding and put it under my tap repository. You can review this pull request as-is.

@zbeekman
Copy link
Contributor

@hcho3 FYI, it's possible that there could be issues with mixed linkage to the OpenMP library. I don't recall off the top of my head whether scipy or numpy have any non-Fortran source codes with OpenMP. But they both consume OpenBLAS which uses libgomp from GCC as the OpenMP back end.

@zbeekman
Copy link
Contributor

(I'm still trying to make sure I understand the nuances of OpenMP on macOS with Homebrew, my current understanding is admittedly incomplete.)

Copy link
Contributor

@zbeekman zbeekman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that since this is a library using OpenMP, and, presumably, intended for consumption by other packages, GCC and libgomp should be used for compilation. If @fxcoudert has a chance to check/confirm my assessment that would be much appreciated.

@@ -13,10 +13,19 @@ class Xgboost < Formula
end

depends_on "cmake" => :build
depends_on "libomp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my currently incomplete understanding of how Homebrew, OpenMP and formulae all interact, I think this formula should be compiled with GCC, and pull in GCCs native libgomp. Here is the relative text from a discussion among maintainers that I feel pretty OK/confident is correct:

The current stand on OpenMP is: use Apple’s clang + libomp (which is a hack) only for standalone formulas that strongly benefit from OpenMP and do not expose this to dependents (good example being imagemagick).

So if we take that as truth I think you need to do:

Suggested change
depends_on "libomp"
depends_on "gcc"
fails_with :clang # because this is an OpenMP enabled library like OpenBLAS

args << "-DOpenMP_C_LIB_NAMES=omp"
args << "-DOpenMP_CXX_FLAGS=\"-Xpreprocessor -fopenmp -I#{libomp.opt_include}\""
args << "-DOpenMP_CXX_LIB_NAMES=omp"
args << "-DOpenMP_omp_LIBRARY=#{libomp.opt_lib}/libomp.dylib"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double check this, but if you make the changes I proposed above to enable OpenMP and in a manner such that it is consumable by other packages CMake should default to using gcc-9, g++-9 and pull in libgomp. You can run brew linkage xgboost locally after compiling to see what libraries are getting linked. You should see something like the output of running that command on OpenBLAS:

$ brew linkage openblas
System libraries:
  /usr/lib/libSystem.B.dylib
Homebrew libraries:
  /usr/local/opt/gcc/lib/gcc/9/libgfortran.5.dylib (gcc)
  /usr/local/opt/gcc/lib/gcc/9/libgomp.1.dylib (gcc)
  /usr/local/opt/gcc/lib/gcc/9/libquadmath.0.dylib (gcc)
Missing libraries:
  /usr/local/lib/gcc/9/libgcc_s.1.dylib

@hcho3
Copy link

hcho3 commented Feb 21, 2020

@zbeekman The previous version of the formula used Apple Clang instead of GCC because GCC was considered a heavy dependency. See discussion at #43246.

For the record, XGBoost is designed to work with both Apple Clang and GCC. I defer to Homebrew maintainers as to which compiler should be used.

@zbeekman
Copy link
Contributor

@hcho3 Hopefully @fxcoudert can weigh in, as I think he probably has the most complete understanding of the state of affairs of OpenMP in homebrew. But my suggestion was based on discussion with other maintainers as well as @fxcoudert's comment here: #50252 (comment)

GCC is definitely a heavy dependency, but if you anticipate people wanting/needing to use this with Python (which if this is targeted and NN/ML applications, I think adding the python bindings to the formula would make sense) then there will be at least an indirect dependency on OpenBLAS which builds with GCC to avoid mixed linkage against libomp and libgomp....

@hcho3
Copy link

hcho3 commented Feb 21, 2020

@zbeekman It is true that the Python binding of XGBoost will often be used with NumPy and SciPy. However, this pull request does not include the Python binding, per request of @SMillerDev. So the package under review is a CLI application. Anyhow, let us wait for @fxcoudert.

@zbeekman
Copy link
Contributor

@hcho3 Yes I understand that currently it doesn't include python and the the wonderful @SMillerDev requested this, but I'm wondering whether or not this makes sense. If lots of people use xgboost from python, then, in my mind, it would make sense to include the python bindings by default. Python is the most popular language for machine learning, and this seems like a machine learning library. Let's first hear about gcc vs libomp.

@zbeekman zbeekman added the needs response Needs a response from the issue/PR author label Feb 23, 2020
@ankane
Copy link
Contributor Author

ankane commented Feb 23, 2020

Hey @zbeekman, it looks like this was just marked as "needs response from the PR author". What do you need from me?

@zbeekman
Copy link
Contributor

@ankane sorry, I forgot we were waiting on FX to take a look. But I’m 85% sure this should be using the patch I suggested above. (Dependency on GCC and fails_with :clang, remove libomp dependency since gcc will pull in libgomp.) To progress this forwards my recommendation is to implement and test the changes needed to make this happen. Libraries like this one and OpenBLAS that may be or become dependencies of other formulae and that use and provide OpenMP threading should use this approach. libgomp was conditionally accepted for stand alone programs that could benefit, like imagemagick.

@stale
Copy link

stale bot commented Mar 16, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale No recent activity label Mar 16, 2020
@stale stale bot closed this Mar 23, 2020
@ankane
Copy link
Contributor Author

ankane commented Mar 31, 2020

Bump to mark as unstale

@ankane ankane mentioned this pull request Apr 23, 2020
5 tasks
@lock lock bot added the outdated PR was locked due to age label May 6, 2020
@lock lock bot locked as resolved and limited conversation to collaborators May 6, 2020
@ankane ankane deleted the xgboost-1-0 branch June 13, 2020 06:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
needs response Needs a response from the issue/PR author outdated PR was locked due to age stale No recent activity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants