-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure metrics work with federated learning #9037
Conversation
@trivialfis CI seems a bit flaky, but this is ready to be reviewed. I'm following this up with extracting out common functionalities that encapsulate different behaviors with column split / vertical federated learning. |
Are you able to reproduce this error on master https://buildkite.com/xgboost/xgboost-ci-windows/builds/2208#01879482-95ad-4868-a96f-2c79e7768b38 ? Seems to be only reproducible on Windows. I can try a Windows build tomorrow if you haven't already. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, one minor issue with duplicated code.
@@ -28,9 +28,8 @@ | |||
#include <algorithm> // for stable_sort, copy, fill_n, min, max | |||
#include <array> // for array | |||
#include <cmath> // for log, sqrt | |||
#include <cstddef> // for size_t, std |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, these were generated by include-what-you-want with the latest clang. Seems difficult to maintain at this point, considering that it's not always correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think clang-tidy was complaining about it.
src/metric/rank_metric.cc
Outdated
@@ -385,6 +378,33 @@ class EvalRankWithCache : public Metric { | |||
} | |||
|
|||
double Evaluate(HostDeviceVector<float> const& preds, std::shared_ptr<DMatrix> p_fmat) override { | |||
double result{0.0}; | |||
if (p_fmat->Info().IsVerticalFederated()) { | |||
// TODO(rongou): better abstraction for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems not too messy to de-duplicate the code with the one in the common header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I merged the aggregator branch to take care of this. Not sure about the name, if you can come up with something better I'd be happy to change it.
@rongou Could you please help take a look into this flaky test when you are available https://buildkite.com/xgboost/xgboost-ci/builds/2349#0187bd27-5ff0-4397-a0d7-cf52b62284c2 ? Sanitizer might be helpful. |
I think the problem is the generated random port is sometimes already in use. I'll take a look. |
This looks like an IPv6 address |
I helped implement the IPv6 support for xgboost, maybe something is wrong with the loopback. |
Based on #9020.
Mainly adding tests to verify metrics for horizontal and vertical federated learning, a few changes to the actual code.
Needs better abstractions to deal with the behavior differences between row vs. column split, distributed vs. federated learning.