Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure metrics work with federated learning #9037

Merged
merged 26 commits into from
Apr 19, 2023

Conversation

rongou
Copy link
Contributor

@rongou rongou commented Apr 13, 2023

Based on #9020.

Mainly adding tests to verify metrics for horizontal and vertical federated learning, a few changes to the actual code.

Needs better abstractions to deal with the behavior differences between row vs. column split, distributed vs. federated learning.

@rongou
Copy link
Contributor Author

rongou commented Apr 18, 2023

@trivialfis CI seems a bit flaky, but this is ready to be reviewed.

I'm following this up with extracting out common functionalities that encapsulate different behaviors with column split / vertical federated learning.

@trivialfis
Copy link
Member

Are you able to reproduce this error on master https://buildkite.com/xgboost/xgboost-ci-windows/builds/2208#01879482-95ad-4868-a96f-2c79e7768b38 ? Seems to be only reproducible on Windows. I can try a Windows build tomorrow if you haven't already.

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, one minor issue with duplicated code.

@@ -28,9 +28,8 @@
#include <algorithm> // for stable_sort, copy, fill_n, min, max
#include <array> // for array
#include <cmath> // for log, sqrt
#include <cstddef> // for size_t, std
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, these were generated by include-what-you-want with the latest clang. Seems difficult to maintain at this point, considering that it's not always correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think clang-tidy was complaining about it.

@@ -385,6 +378,33 @@ class EvalRankWithCache : public Metric {
}

double Evaluate(HostDeviceVector<float> const& preds, std::shared_ptr<DMatrix> p_fmat) override {
double result{0.0};
if (p_fmat->Info().IsVerticalFederated()) {
// TODO(rongou): better abstraction for this.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems not too messy to de-duplicate the code with the one in the common header.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I merged the aggregator branch to take care of this. Not sure about the name, if you can come up with something better I'd be happy to change it.

@trivialfis trivialfis merged commit 42d100d into dmlc:master Apr 19, 2023
@trivialfis
Copy link
Member

@rongou Could you please help take a look into this flaky test when you are available https://buildkite.com/xgboost/xgboost-ci/builds/2349#0187bd27-5ff0-4397-a0d7-cf52b62284c2 ? Sanitizer might be helpful.

@rongou
Copy link
Contributor Author

rongou commented Apr 26, 2023

I think the problem is the generated random port is sometimes already in use. I'll take a look.

@trivialfis
Copy link
Member

This looks like an IPv6 address target_address:"[::1]:59993".

@trivialfis
Copy link
Member

I helped implement the IPv6 support for xgboost, maybe something is wrong with the loopback.

@rongou rongou deleted the vertical-federated-metrics branch September 25, 2023 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants