-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to optimize for NDCG at a given truncation level #3425
Conversation
In order to correctly optimize for NDCG@_k_, one should exclude pairs containing both documents beyond the top-_k_ (as they don't affect NDCG@_k_ when swapped).
Any benchmarks for the change? |
For MSLR-WEB30K dataset (fold 1) using default train.conf, performance on the validation set: Optimize for NDCG@20: Optimize for NDCG@10: Optimize for NDCG@5: Optimize for NDCG@3: Optimize for NDCG@1: |
I think this PR is a new feature. |
@metpavel Is the result of "full rank" the same as before? |
const data_size_t low = sorted_idx[j]; | ||
// start accmulate lambdas by pairs that contain at least one document above truncation level | ||
for (data_size_t i = 0; i < cnt - 1 && i < truncation_level_; ++i) { | ||
for (data_size_t j = i + 1; j < cnt; ++j) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this start from '0' ? if not, why?
updated:
I see. never mind.
src/objective/rank_objective.hpp
Outdated
const double high_label_gain = label_gain_[high_label]; | ||
const double high_discount = DCGCalculator::GetDiscount(high_rank); | ||
|
||
const data_size_t low_rank = label[sorted_idx[i]] > label[sorted_idx[j]] ? j : i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the additional branching? get high_rank and low_rank by one if.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Co-authored-by: Guolin Ke <[email protected]>
remove the additional branching: get high_rank and low_rank by one "if".
From the results, it seems we need a slightly larger truncation level, e.g. |
@guolinke, perhaps one may want to treat that m as a tunable hyper-parameter to balance between better alignment with the desired cutoff k (smaller m) and more pairs to train on (larger m) |
@metpavel Yes, maybe you can add this description into the document of truncation level? |
@metpavel we usually write the doc for parameters in |
BTW, can you also merge master, to ensure CI passes? |
add description to lambdarank_truncation_level parameter
@jameslamb can you help to fix the R's tests? It needs to update some ranking score constraints, as the ranking objective is updated. |
update expected NDCG value for a test, as it was affected by the underlying change in the algorithm
update NDCG@3 reference value
ok sure, I can help |
@metpavel I just created a pull request into this branch on your fork. If you merge it, it should fix the R tests |
// lambda is negative, so use minus to accumulate | ||
sum_lambdas -= 2 * p_lambda; | ||
} | ||
// update | ||
lambdas[high] += static_cast<score_t>(high_sum_lambda); | ||
hessians[high] += static_cast<score_t>(high_sum_hessian); | ||
} | ||
if (norm_ && sum_lambdas > 0) { | ||
double norm_factor = std::log2(1 + sum_lambdas) / sum_lambdas; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/gha run-valgrind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/gha run r-valgrind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ this comment started this run checking that these changes pass our valgrind tests: https://github.com/microsoft/LightGBM/runs/1296338580?check_suite_focus=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this didn't show any new issues...but I'll run it again once my fixes are merged into this PR: #3425 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/gha run-valgrind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/gha run r-valgrind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this passes, I'll come back and approve: https://github.com/microsoft/LightGBM/runs/1310275649?check_suite_focus=true
fix R learning-to-rank tests
Co-authored-by: Guolin Ke <[email protected]>
You need to re-run the parameter generator, due to the default value is changed. |
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Fixes #3420