Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Expected Reciprocal Rank metric #31891

Merged
merged 5 commits into from
Jul 12, 2018
Merged

Conversation

cbuescher
Copy link
Member

This change adds Expected Reciprocal Rank (ERR) as a ranking evaluation metric
as descriped in:

Chapelle, O., Metlzer, D., Zhang, Y., & Grinspan, P. (2009).
Expected reciprocal rank for graded relevance.
Proceeding of the 18th ACM Conference on Information and Knowledge Management.
https://doi.org/10.1145/1645953.1646033

ERR is an extension of the classical reciprocal rank to the graded relevance
case and assumes a cascade browsing model. It quantifies the usefulness of a
document at rank i conditioned on the degree of relevance of the items at ranks
less than i. ERR seems to be gain traction as an alternative to (n)DCG, so it
seems like a good metric to support. Also ERR seems to be the default optimization
metric used for training in RankLib, a widely used learning to rank library.

Relates to #29653

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

Christoph Büscher added 2 commits July 10, 2018 10:58
This change adds Expected Reciprocal Rank (ERR) as a ranking evaluation metric
as descriped in:

Chapelle, O., Metlzer, D., Zhang, Y., & Grinspan, P. (2009).
Expected reciprocal rank for graded relevance.
Proceeding of the 18th ACM Conference on Information and Knowledge Management.
https://doi.org/10.1145/1645953.1646033

ERR is an extension of the classical reciprocal rank to the graded relevance
case and assumes a cascade browsing model. It quantifies the usefulness of a
document at rank `i` conditioned on the degree of relevance of the items at ranks
less than `i`. ERR seems to be gain traction as an alternative to (n)DCG, so it
seems like a good metric to support. Also ERR seems to be the default optimization
metric used for training in RankLib, a widely used learning to rank library.

Relates to elastic#29653
Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cbuescher! Nice implementation and tests. Just very small editing comments.

I was wondering if we also want to add documentation on this metric to rank-eval.asciidoc?

import static org.elasticsearch.index.rankeval.EvaluationMetric.joinHitsWithRatings;

/**
* Implemention of the Expected Reciprocal Rank metric described in:<p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implementation?


private final double two_pow_maxRelevance;

public static final String NAME = "err";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"err" doesn't sound like "error"? Have you though of using a diff name here and in code? It is fine to keep this name as well if you think so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the problem. On the other hand, the metric is abreviated like this in the paper itself and in most places citing it. I'm on the fence here. So far we have:

  • "precision" -> precision at K
  • "dcg" -> discounted cumulative gain
  • "mean_reciprocal_rank" -> well... mean reciprocal rank

I kind of like the shorter names, they are only ids in the end. But I can get this might be a source of confusion. I would go with the full "expected_reciprocal_rank" then, a bit long but one is not supposed to type it all the time. Will give it some additional thought though before changing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to go with "expected_reciprocal_rank" now, its long but only four chars more than "mean_reciprocal_rank", so what the heck...

for (int i = 0; i < relevanceRatings.length; i++) {
rated.add(new RatedDocument("index", Integer.toString(i), relevanceRatings[i]));
hits[i] = new SearchHit(i, Integer.toString(i), new Text("type"), Collections.emptyMap());
hits[i].shard(new SearchShardTarget("testnode", new Index("index", "uuid"), 0, null));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if SearchShardTarget is the same for all hits, would it be reasonable to create it outside of for loop? here and in the following test?

@cbuescher cbuescher merged commit 4ae4ac0 into elastic:master Jul 12, 2018
cbuescher pushed a commit that referenced this pull request Jul 12, 2018
This change adds Expected Reciprocal Rank (ERR) as a ranking evaluation metric
as descriped in:

Chapelle, O., Metlzer, D., Zhang, Y., & Grinspan, P. (2009).
Expected reciprocal rank for graded relevance.
Proceeding of the 18th ACM Conference on Information and Knowledge Management.
https://doi.org/10.1145/1645953.1646033

ERR is an extension of the classical reciprocal rank to the graded relevance
case and assumes a cascade browsing model. It quantifies the usefulness of a
document at rank `i` conditioned on the degree of relevance of the items at ranks
less than `i`. ERR seems to be gain traction as an alternative to (n)DCG, so it
seems like a good metric to support. Also ERR seems to be the default optimization
metric used for training in RankLib, a widely used learning to rank library.

Relates to #29653
dnhatn added a commit that referenced this pull request Jul 12, 2018
* master:
  [TEST] Mute SlackMessageTests.testTemplateRender
  Docs: Explain closing the high level client
  [ML] Re-enable memory limit integration tests (#31328)
  [test] disable packaging tests for suse boxes
  Add nio transport to security plugin (#31942)
  XContentTests : Insert random fields at random positions (#30867)
  Force execution of fetch tasks (#31974)
  Fix unreachable error condition in AmazonS3Fixture (#32005)
  Tests: Fix SearchFieldsIT.testDocValueFields (#31995)
  Add Expected Reciprocal Rank metric (#31891)
  [ML] Get ForecastRequestStats doc in RestoreModelSnapshotIT (#31973)
  SQL: Add support for single parameter text manipulating functions (#31874)
  [ML] Ensure immutability of MlMetadata (#31957)
  Tests: Mute SearchFieldsIT.testDocValueFields()
  muted tests due to #31940
  Work around reported problem in eclipse (#31960)
  Move build integration tests out of :buildSrc project (#31961)
  Tests: Remove use of joda time in some tests (#31922)
  [Test] Reactive 3rd party tests on CI (#31919)
  SQL: Support for escape sequences (#31884)
  SQL: HAVING clause should accept only aggregates (#31872)
  Docs: fix typo in datehistogram (#31972)
  Switch url repository rest tests to new style requests (#31944)
  Switch reindex tests to new style requests (#31941)
  Docs: Added note about cloud service to installation and getting started
  [DOCS] Removes alternative docker pull example (#31934)
  Add Snapshots Status API to High Level Rest Client (#31515)
  ingest: date_index_name processor template resolution (#31841)
  Test: fix null failure in watcher test (#31968)
  Switch test framework to new style requests (#31939)
  Switch low level rest tests to new style Requests (#31938)
  Switch high level rest tests to new style requests (#31937)
  [ML] Mute test failing due to Java 11 date time format parsing bug (#31899)
  [TEST] Mute SlackMessageTests.testTemplateRender
  Fix assertIngestDocument wrongfully passing (#31913)
  Remove unused reference to filePermissionsCache (#31923)
  rolling upgrade should use a replica to prevent relocations while running a scroll
  HLREST: Bundle the x-pack protocol project (#31904)
  Increase logging level for testStressMaybeFlush
  Added lenient flag for synonym token filter (#31484)
  [X-Pack] Beats centralized management: security role + licensing (#30520)
  HLRest: Move xPackInfo() to xPack().info() (#31905)
  Docs: add security delete role to api call table (#31907)
  [test] port archive distribution packaging tests (#31314)
  Watcher: Slack message empty text (#31596)
  [ML] Mute failing DetectionRulesIT.testCondition() test
  Fix broken NaN check in MovingFunctions#stdDev() (#31888)
  Date: Add DateFormatters class that uses java.time (#31856)
  [ML] Switch native QA tests to a 3 node cluster (#31757)
  Change trappy float comparison (#31889)
  Fix building AD URL from domain name (#31849)
  Add opaque_id to audit logging (#31878)
  re-enable backcompat tests
  add support for is_write_index in put-alias body parsing (#31674)
  Improve release notes script (#31833)
  [DOCS] Fix broken link in painless example
  Handle missing values in painless (#30975)
  Remove the ability to index or query context suggestions without context (#31007)
  Ingest: Enable Templated Fieldnames in Rename (#31690)
  [Docs] Fix typo in the Rollup API Quick Reference (#31855)
  Ingest: Add ignore_missing option to RemoveProc (#31693)
  Add template config for Beat state to X-Pack Monitoring (#31809)
  Watcher: Add ssl.trust email account setting (#31684)
  Remove link to oss-MSI (#31844)
  Painless: Restructure Definition/Whitelist (#31879)
  HLREST: Add x-pack-info API (#31870)
dnhatn added a commit that referenced this pull request Jul 12, 2018
* 6.x:
  Force execution of fetch tasks (#31974)
  [TEST] Mute SlackMessageTests.testTemplateRender
  Docs: Explain closing the high level client
  [test] disable java packaging tests for suse
  XContentTests : Insert random fields at random positions (#30867)
  Add Get Snapshots High Level REST API (#31980)
  Fix unreachable error condition in AmazonS3Fixture (#32005)
  [6.x][ML] Ensure immutability of MlMetadata (#31994)
  Add Expected Reciprocal Rank metric (#31891)
  SQL: Add support for single parameter text manipulating functions (#31874)
  muted tests due to #31940
  Work around reported problem in eclipse (#31960)
  Move build integration tests out of :buildSrc project (#31961)
  [Test] Reactive 3rd party tests on CI (#31919)
  Fix assertIngestDocument wrongfully passing (#31913) (#31951)
  SQL: Support for escape sequences (#31884)
  SQL: HAVING clause should accept only aggregates (#31872)
  Docs: fix typo in datehistogram (#31972)
  Switch url repository rest tests to new style requests (#31944)
  Switch reindex tests to new style requests (#31941)
  Switch test framework to new style requests (#31939)
  Docs: Added note about cloud service to installation and getting started
  [DOCS] Removes alternative docker pull example (#31934)
  ingest: date_index_name processor template resolution (#31841)
  Test: fix null failure in watcher test (#31968)
  Watcher: Slack message empty text (#31596)
  Switch low level rest tests to new style Requests (#31938)
  Switch high level rest tests to new style requests (#31937)
  HLREST: Bundle the x-pack protocol project (#31904)
  [ML] Mute test failing due to Java 11 date time format parsing bug (#31899)
  Increase logging level for testStressMaybeFlush
  rolling upgrade should use a replica to prevent relocations while running a scroll
  [test] port archive distribution packaging tests (#31314)
  HLRest: Move xPackInfo() to xPack().info() (#31905)
  Increase logging level for :qa:rolling-upgrade
  Backport: Add template config for Beat state to X-Pack Monitoring (#31809) (#31893)
  Fix building AD URL from domain name (#31849)
  Fix broken NaN check in MovingFunctions#stdDev() (#31888)
  Change trappy float comparison (#31889)
  Add opaque_id to audit logging (#31878)
  add support for is_write_index in put-alias body parsing (#31674)
  Ingest: Enable Templated Fieldnames in Rename (#31690) (#31896)
  Ingest: Add ignore_missing option to RemoveProc (#31693) (#31892)
  [Docs] Fix typo in the Rollup API Quick Reference (#31855)
  Watcher: Add ssl.trust email account setting (#31684)
  [PkiRealm] Invalidate cache on role mappings change (#31510)
  [Security] Check auth scheme case insensitively (#31490)
  HLREST: Add x-pack-info API (#31870)
  Remove link to oss-MSI (#31844)
  Painless: Restructure Definition/Whitelist (#31879)
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants