Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

- ranking metric acceleration on the gpu #5398

Merged
merged 21 commits into from
Mar 22, 2020

Conversation

sriramch
Copy link
Contributor

@sriramch sriramch commented Mar 8, 2020

this is the last part of #5326 that has been split. the performance numbers are here

please note the following:

  • metrics applicable to only ranking datasets are fully accelerated on gpu - map, ndcg, pre
  • metrics (auc[pr]) applicable to ranking and non-ranking datasets work thusly:
    • these metrics when computed on non-ranking datasets aren't accelerated - i.e. they still run on cpu, but optimized on cpu (hence, should be better than what we had before - see here)
      • this can be worked on as a follow-up pr
    • these metrics when computed on ranking datasets are semi accelerated - meaning, processing multiple groups are still in parallel, but there is a linear iteration of predictions within each group to bucketize them
      • note: it is still much better than the version that runs on the cpu though (~ 6x-8x eval time improvement), even for datasets that have large number of elements/group and smaller group cardinality (which are atypical for ranking datasets). i tried a couple of 100 groups with ~250k items/group, and it still performed well

@RAMitchell @trivialfis - please review.

@mli
Copy link
Member

mli commented Mar 8, 2020

Codecov Report

Merging #5398 into master will not change coverage by %.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #5398   +/-   ##
=======================================
  Coverage   84.07%   84.07%           
=======================================
  Files          11       11           
  Lines        2411     2411           
=======================================
  Hits         2027     2027           
  Misses        384      384           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a38e7bd...810e537. Read the comment docs.

  - this will *significantly* help train non ranking datasets that uses the auc metric (which
    i hear is quite popular!)
  - i'll post the perf. numbers shortly
@sriramch
Copy link
Contributor Author

auc metric performance numbers

test environment

  • 1 socket
  • 6 cores/socket
  • 2 threads/core
  • 80 gb system memory
  • v100 gpu

test

  • uses all cpu threads
  • builds 100 trees
  • metric eval times are reported below

results

  • no additional gpu memory was used
  • all times are in seconds
  • mortgage dataset used 60m training instances (to be able to fit into the available gpu memory)
dataset eval time master eval time this pr
higgs 70.79 1.42
mortgage 266.98 30.19

src/metric/rank_metric.cu Outdated Show resolved Hide resolved
@sriramch
Copy link
Contributor Author

@trivialfis i would appreciate your review when you get a chance.

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Sorry for the long wait. Previously I mentioned with @RAMitchell that maybe simple functions are more suitable for implementing the GPU metrics as I think the registry is just too tricky and unnecessary. But I won't block the PR for this as we can refactor them later when needed (like implementing other metrics).

@RAMitchell RAMitchell merged commit d2231fc into dmlc:master Mar 22, 2020
@lock lock bot locked as resolved and limited conversation to collaborators Jun 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants