-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow JVM-Package to access inplace predict method #9167
Conversation
Squashed commits: [2aa500b] Try a different way of creating the proxy [c92d2df] try only one path [2f81ff3] dMatrix Handle null check and more statments [2e6df6f] exit on entering ifs [688fc73] More null checks [3e37688] change error logging [710dae5] missing semicolon [489c337] Put checks back in, change check in c_api [ad5972d] time to break the cpu predictor [7dfa3c0] No more check, do it live [abd64ff] dereference for logging [eb4cd9a] See if we can log what the type for proxy is [74c4376] maybe log something [a367ca2] uncomment for test, pass dMatrix through Booster.java [3e98f14] comment out for testing [66f0d8c] Changes to tests [737b293] some more changes to test [b3796b7] a dMatrix for testing [e995228] Convert long to handle [66ca8c3] c h e c k s t y l e [14f2243] Checkstyle failure [163bfca] missing comma [c869a14] First try at getting dMatrix logic into changes
Squashed commits: [99b8f1f] Remove pom change [7b80fef] Remove changes from cpu_predictor, fix rebase miss [a531053] Cleaning up XGBoosterInplacePredict Further de-newification More clean-up [de6e109] pom change git didn't pick up (+5 squashed commits) Squashed commits: [6182244] One missed pom [ab5bc5e] Update all the poms, even the unused ones [a96c3b9] More clean-up [aea9cd3] Further de-newification [1539100] Cleaning up XGBoosterInplacePredict [a81361d] increment our version (+15 squashed commits) Squashed commits: [c4d6d52] missed param [b343f89] Remove redundant line [5609102] Remove need to pass DMatrix [e31c53d] another missed logging message [f6a2508] comment out some logging [148e29e] undo static cast [6372805] More logging [a173763] change how proxy is cast in cpu predictor [5a08ff8] proxy to stuff [823c92f] Dense adapter changes [9a3214b] change some variables and things [d9dd5c4] uncomment stuff, more logging [7968bca] Set dense data [9e79c2e] Try some stuff with DMatrix [2b8cca6] Print result of p_m.get()
c77e146
to
ee0a029
Compare
jvm-packages/xgboost4j/src/main/java/ml/dmlc/xgboost4j/java/Booster.java
Outdated
Show resolved
Hide resolved
jvm-packages/xgboost4j/src/main/java/ml/dmlc/xgboost4j/java/Booster.java
Outdated
Show resolved
Hide resolved
jvm-packages/xgboost4j/src/test/java/ml/dmlc/xgboost4j/java/BoosterImplTest.java
Outdated
Show resolved
Hide resolved
jvm-packages/xgboost4j/src/test/java/ml/dmlc/xgboost4j/java/BoosterImplTest.java
Outdated
Show resolved
Hide resolved
9ecad16
to
5ce8d65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just few cleanup comments from my side. I am not an expert in core functionality, I can only review JVM-related changes
jvm-packages/xgboost4j/src/test/java/ml/dmlc/xgboost4j/java/BoosterImplTest.java
Outdated
Show resolved
Hide resolved
jvm-packages/xgboost4j/src/test/java/ml/dmlc/xgboost4j/java/BoosterImplTest.java
Outdated
Show resolved
Hide resolved
jvm-packages/xgboost4j/src/test/java/ml/dmlc/xgboost4j/java/BoosterImplTest.java
Outdated
Show resolved
Hide resolved
jvm-packages/xgboost4j/src/test/java/ml/dmlc/xgboost4j/java/BoosterImplTest.java
Outdated
Show resolved
Hide resolved
jvm-packages/xgboost4j/src/test/java/ml/dmlc/xgboost4j/java/BoosterImplTest.java
Outdated
Show resolved
Hide resolved
src/c_api/c_api.cc
Outdated
out_dim, out_result); | ||
} | ||
|
||
XGB_DLL int XGBoosterInplacePredict(BoosterHandle handle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, this is the same function as XGBoosterPredictFromDense
, can we use it instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I've been working on moving us to XGBoosterPredictFromDense
but a direct usage would require us to use something similar to what is used in the python code, but from reading it it looks like the arrays that are passed through are created using methods from numpy, which there doesn't seem to be any native support or connection for these types in Java. Should we be building a custom byte stream to send in here to have all the metadata or would something like com.google.gson.Gson
being added to the imports for Java be acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I think one can build the JSON interface in xgboost.cpp using the xgboost:Json
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I've tried a few different ways of building the JSON interface locally, but the best I've managed failed the type check in PredictFromDense, so incorrect conversion. Could you point me to what methods in the Json class I should be using for converting a Java float array to the expected format in the api? I'm inexperienced with C++ and could use additional guidance on that side of the pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the slow reply. I will look into your PR tomorrow and see if I can help.
jvm-packages/xgboost4j/src/test/java/ml/dmlc/xgboost4j/java/BoosterImplTest.java
Outdated
Show resolved
Hide resolved
if (predContribs) { | ||
optionMask = 4; | ||
} | ||
DMatrix d_mat = new DMatrix(data, num_rows, num_features, missing); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, inplace_predict
doesn't need the DMatrix
, which is the primary motivation of developing this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay - I'm looking through the python code that calls predictFromDense to see how that works - should we be implementing and using proxy DMatrix on the Java side of things? I'll start looking at making that change if you can just let me know if that would be the correct approach.
Historically, we didn't have DMatrix here when we initially made our changes in 1.4 - this addition of the DMatrix here was due to changes in cpu_predictor.cc
that now requires a DMatrix, we were originally using the parameter dmlc::any const &x
in the old version of inplacePredict there and sending a null pointer for the DMatrix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for sharing! We unified the interface under the proxy DMatrix class. Let me do some more investigation into the JVM implementation first. I think I can help with the C++ code.
TODOs for me:
|
@trivialfis Hi, I wanted to check in and see if there's anything me or my team should be working on for this pull request at the moment |
Apologies for the long wait, I have been clearing up items for the 2.0 release. Will get to this once #9440 is sorted out. |
Please take a look at sovrn#9 . |
Update from our end - I merged sovrn#9 into this branch. I built a new version for testing on our end, however we are still using models for one of projects is still using models built on 1.4, so I can't do a test in a live environment yet. We're also going to look into adding a test to verify thread safety. |
I think xgboost can load old models. We are still testing models from 0.9 and 1.0: https://github.com/dmlc/xgboost/blob/master/tests/python/test_model_compatibility.py The problem arises when you load a new model with an old xgboost. |
We're failing to load the 1.4 model in our test environment, we have a 1.7 model training now to determine if its something on our end - upgrading that project is work we have in our backlog anyway, I'll let you know if we are seeing an actual compatibility issue |
We're able to load the new model trained on a 1.7 version. The models we have problems loading were trained on 1.4.1 |
Which model format are you using? |
We use a binary for the model |
* | ||
* @return predict Result matrix | ||
*/ | ||
public float[][] inplace_predict(float[] data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wbo4958 I just learned about the existence of BigDenseMatrix
, do you think we should return the prediction in that?
Feel free to share (maybe in private), we can help debug the issue. |
Thank you, I haven't verified yet if it is the model failing, since the error is appearing to us at the level where it could be our model or our mapping. I'm not too concerned on troubleshooting this, as we have it in our backlog to upgrade these models anyway which fixes whatever is going on. If its helpful, when compatibility checks are run and failing on your end for that version I can provide a model file. Otherwise, it might be unneeded effort since we have a fix available when this PR is merged and we can move to an official version. |
Hello! I am also from Sovrn and just wanted to check in and see if you had any specific thread-safety concerns that the test should address? I've been reading through some PRs and it sounds like the It looks like a few Java folks want to drop the synchronized off the predict call and this is on the 2.0 road map: #7027 It looks like in that PR, there is a reference to the "JVM iterator" not being thread-safe which also seems to be the same concern that came up in one of the branches for this PR: sovrn#9. I'm not sure this is related to the inplace_predict that we're trying to merge since we're not using a dmatrix but rather a float[][]. The method that we're specifically looking to get merged is the inplace_predict which utilizes "XGBoosterPredictFromDense", I think from some comments I've read in some PRs that indicate that this method is for sure thread-safe. So with that said, is there any specific thread-safety issues you're seeing/concerned with on the Java side? Thank you! |
No concern with thread safety at the moment. As you have discussed, the mentioned issues are related to I will review the PR. @wbo4958 would be great if you could take a look as well. |
namespace linalg = xgboost::linalg; | ||
jfloat *data = jenv->GetFloatArrayElements(jdata, nullptr); | ||
auto t_data = linalg::MakeTensorView( | ||
xgboost::Context::kCpuId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might need to rebase the PR, we made some changes to these utilities.
/** | ||
* Handle base margin | ||
*/ | ||
BoosterHandle proxy{nullptr}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please help add a test for prediction with margin. With regression using reg:squarederror
:
pred0 = booster.inplace_predict(X, margin)
pred1 = booster.inplace_predict(X) + margin
assert pred0 == pred1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trivialfis Hello! I wrote a test, but it appears to be failing. I think it may be related to this: #9536
Is the in-place prediction not working properly with margins at the moment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bug happens when you are running prediction with base margin on GPU while the input is from the CPU or the other way around. I don't think it's relevant to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to push your tests, I can take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much! I pushed the new tests. Let me know if you have any questions on anything! Right now what I'm seeing with the failing test is that the prediction( x with margin) != prediction(x) + margin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! That did the trick. I'll post your changes from below shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Please merge/rebase the latest branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Let me know if there's anything else needed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Let me know if there's anything else needed!
Fix CI errors, if there's any. ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the CI failed with some of the new C++ code you added from the sovrn#9 branch. I tried taking a look at it, but by no means am I a C++ expert. It looks like some sort of parameter type mismatch is occurring. Could you take a look?
…e tests: inplacePredictTest and inplacePredictMultiPredictTest. Extracted common logic of generating random data sets into generateRandomDataSet method. Added new test inplacePredictWithMarginTest which should test the utilization of a margin when performing predictions.
…ounds that we're training on is 10, and we want to use all rounds arbitrarily for this test.
…on_range as iterationRange, so it is clear. Set this iteration range to [0,0) which mimics the calls in inplace_predict methods without margin.
Anyway: diff --git a/jvm-packages/xgboost4j/src/test/java/ml/dmlc/xgboost4j/java/BoosterImplTest.java b/jvm-packages/xgboost4j/src/test/java/ml/dmlc/xgboost4j/java/BoosterImplTest.java
index 0d6020b00..e57368672 100644
--- a/jvm-packages/xgboost4j/src/test/java/ml/dmlc/xgboost4j/java/BoosterImplTest.java
+++ b/jvm-packages/xgboost4j/src/test/java/ml/dmlc/xgboost4j/java/BoosterImplTest.java
@@ -31,7 +31,7 @@ import static org.junit.Assert.fail;
/**
* test cases for Booster Inplace Predict
- *
+ *
* @author hzx and Sovrn
*/
public class BoosterImplTest {
@@ -229,43 +229,46 @@ public class BoosterImplTest {
DMatrix testingMatrix = new DMatrix(testX, testRows, features, Float.NaN);
testingMatrix.setLabel(testY);
- //Set booster parameters
+ // Set booster parameters
Map<String, Object> params = new HashMap<>();
params.put("eta", 1.0);
- params.put("max_depth",2);
- params.put("silent", 1);
+ params.put("max_depth", 2);
params.put("tree_method", "hist");
+ params.put("base_score", 0.0);
Map<String, DMatrix> watches = new HashMap<>();
watches.put("train", trainingMatrix);
watches.put("test", testingMatrix);
- //Train booster on training matrix.
+ // Train booster on training matrix.
Booster booster = XGBoost.train(trainingMatrix, params, 10, watches, null, null);
- //Create a margin
+ // Create a margin
float[] margin = new float[testRows];
Arrays.fill(margin, 0.5f);
- //Define an iteration range to use all training iterations, this should match the without margin call
- //which defines an iteration range of [0,0)
- int[] iterationRange = new int[] {0, 0};
+ // Define an iteration range to use all training iterations, this should match
+ // the without margin call
+ // which defines an iteration range of [0,0)
+ int[] iterationRange = new int[] { 0, 0 };
float[][] inplacePredictionsWithMargin = booster.inplace_predict(testX,
- testRows,
- features,
- Float.NaN,
- iterationRange,
- Booster.PredictionType.kValue,
- margin);
+ testRows,
+ features,
+ Float.NaN,
+ iterationRange,
+ Booster.PredictionType.kValue,
+ margin);
float[][] inplacePredictionsWithoutMargin = booster.inplace_predict(testX, testRows, features, Float.NaN);
- for(int i = 0; i < inplacePredictionsWithoutMargin.length; i++) {
- for(int j = 0; j < inplacePredictionsWithoutMargin[i].length; j++) {
+ for (int i = 0; i < inplacePredictionsWithoutMargin.length; i++) {
+ for (int j = 0; j < inplacePredictionsWithoutMargin[i].length; j++) {
inplacePredictionsWithoutMargin[i][j] += margin[j];
}
}
- assertArrayEquals(inplacePredictionsWithMargin, inplacePredictionsWithoutMargin);
+ for (int i = 0; i < inplacePredictionsWithoutMargin.length; i++) {
+ assertArrayEquals(inplacePredictionsWithMargin[i], inplacePredictionsWithoutMargin[i], 1e-6f);
+ }
}
private float[] generateRandomDataSet(int size) { |
…g the base_score as a parameter for the booster and asserting the values between the predict with margin and predict without margin are similar within a delta(since floating point calculations vary slighty)
@trivialfis I took a stab at trying to fix the C++ problems. I was able to build successfully locally and pass tests. If you could review and if looks good, re-trigger the CI I would appreciate it! Thank you! |
Thank you for the work, let me look into it next week. U don't have access to my device at the moment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thank you for the nice feature addition!
Looks like the build kite multi gpu job failed. Took a look at the error and it appears that it's related to a connection being reset and not anything related to the PR. Let me know if you need me to do anything for this. Thank you!! |
Here at Sovrn we have been using a fork of xgboost that with custom changes that allow us to access the inplace_predict method from the JNI. We believe this would be of use to others and are looking for feedback/reviews to add this functionality to the main xgboost library.
Close #5951 .