Adding French team contribution points #302

imenelydiaker · 2024-03-31T21:17:25Z

After discussing it with the team, we agreed on splitting the total numbers of points for French on 5 since we consider that we all contributed equally. I suggest the following and have some questions before adding the point to the points.md file:

Datasets:

DiaBLaBitextMining.py : 2 = 2 (1 language pair)
FloresBitextMining.py: 2 * 200 (200 languages pairs) = 400... How should we handle bitextmining tasks?
MasakhaNEWSClassification.py : 2 * 16 = 32 (may add 4 bonus points for some languaes since it's their first dataset on mteb)
AlloProfClusteringP2P.py: 2
AlloProfClusteringS2S.py: 2 (S2S and P2P datasets are the same, only the columns used for the task are different, should we count it as 1 contribution or 2?)
HALClusteringS2S.py: 2
MasakhaNEWSClusteringP2P.py: 2 * 16 = 32
MasakhaNEWSClusteringS2S.py: 2 * 16 = 32
OpusparcusPC.py: 2 * 6 = 12
AlloprofReranking.py : 2 + 4 (first dataset for French)
SyntecReranking.py: 2 + 4 (first dataset for French)
AlloprofRetrieval.py: 2
BSARDRetrieval.py: 2
SyntecRetrieval.py: 2
SickFrSTS.py: 2
STSBenchmarkMultilingualSTS.py: 2 * 10 = 20
SummEvalFrSummarization.py : 6 = 2 + 4 (first dataset for French Summarization).

Evaluated models:

42 = 1 * 42 models evaluated

I didn't add PR review points for @wissam-sib and I, I lost track of them 😅 but I'll try to do that quickly!

KennethEnevoldsen · 2024-04-01T08:36:21Z

FloresBitextMining.py: 2 * 200 (200 languages pairs) = 400... How should we handle bitextmining tasks?

I would argue this as one dataset so 2 points, but then you get the 2 bonus point pr. language which is not previously covered within Bitext mining (Tataoba). That should still give quite a few points for a valuable dataset.

Note: that the bonus points have been updated slightly to accommodate for exactly these cases. Let me know what you think.

MasakhaNEWSClassification.py : 2 * 16 = 32 (may add 4 bonus points for some languaes since it's their first dataset on mteb)

Def. add the bonus points. Generally it seems like you are missing quite a few bonus points (e.g. for retrieval x French)

SummEvalFrSummarization.py : 6 = 2 + 4 (first dataset for French Summarization).

In relation to MMTEB I don't believe we should add points for this as it is only machine-translated.

docs/mmteb/points.md

KennethEnevoldsen · 2024-04-03T13:02:04Z

I didn't add PR review points for @wissam-sib and I, I lost track of them 😅 but I'll try to do that quickly!

Will you add these as well?

imenelydiaker · 2024-04-03T15:24:40Z

@KennethEnevoldsen here is an update, still have some questions about dataset points. For PR reviews I just gave 1 point per review:

Datasets:

DiaBLaBitextMining.py : 2
FloresBitextMining.py: 2 * 129 (204 languages pairs - 75 pairs already handled by mteb) = 258
MasakhaNEWSClassification.py : 2 * 16 + 2 * 14 = 32 + 28 = 60 (16 languages with 14 languages newly handled).
AlloProfClusteringP2P.py: 2
AlloProfClusteringS2S.py: 2
HALClusteringS2S.py: 2
MasakhaNEWSClusteringP2P.py: 2 * 16 = 32 (I counted the bonus for languages in MasakhaNEWSClassification, should it be counted again?)
MasakhaNEWSClusteringS2S.py: 2 * 16 = 32 (same as previously)
OpusparcusPC.py: 2 * 6 = 12
AlloprofReranking.py : 2 + 2 (first dataset for French) = 4
SyntecReranking.py: 2 + 2 (first dataset for French) = 4
AlloprofRetrieval.py: 2 + 2 (at the time it was the first French retrieval dataset) = 4
BSARDRetrieval.py: 2 (should I add the bonus points again here?)
SyntecRetrieval.py: 2 (same as previously)
HagridRetrieval.py : 2
SickFrSTS.py: 2
STSBenchmarkMultilingualSTS.py: 2 * 10 = 20
SummEvalFrSummarization.py : 2

Evaluated models:

42 = 1 * 42 models evaluated

Total for the French contrib: 486
Nb points for each member: 486 / 5 = 97

PR reviews:

KennethEnevoldsen · 2024-04-03T16:42:50Z

FloresBitextMining.py: 2 * 129 (200 languages pairs - 75 pairs already handled by mteb) = 258

Shouldn't this be 2 * 125 = 250?

BSARDRetrieval.py: 2 (should I add the bonus points again here?)

I believe the best approach is to not have bonuses twice.

42 = 1 * 42 models evaluated

I would not include these as models evaluated is on the whole of MMTEB, which will change due to new additions.

imenelydiaker · 2024-04-03T16:54:52Z

FloresBitextMining.py: 2 * 129 (200 languages pairs - 75 pairs already handled by mteb) = 258

Shouldn't this be 2 * 125 = 250?

Sorry Flores is 204 languages. So it's 204 - 75 = 129

BSARDRetrieval.py: 2 (should I add the bonus points again here?)

I believe the best approach is to not have bonuses twice.

Ok we'll keep them on only one of the datasets.

42 = 1 * 42 models evaluated

I would not include these as models evaluated is on the whole of MMTEB, which will change due to new additions.

You don't count the language specific models? Some of the 42 models were French only models.. Also, I don't think contributors will be able to run models on the whole benchmark, it's really huge. I expect them to run models on their proposed datasets 🤔

KennethEnevoldsen · 2024-04-03T17:11:44Z

You don't count the language specific models? Some of the 42 models were French only models..

Yea so had the idea that all models would be run on everything, but that seems problematic/wasteful. We should problably make it more clear what we mean by running a model. @imenelydiaker do you (or someone from your team) have the time to outline that segment?

For now I would remove the models and get this PR merged it (then we can always add it in a new PR).

Also, I don't think contributors will be able to run models on the whole benchmark, it's really huge.

Yea I believe the compute cost of the MMTEB is something that we have to bring down. One solution might be to limit very large datasets. Another option is also to estimate the performance on unseen datasets in a smart way (e.g. estimate the latent factor of a models).

I expect them to run models on their proposed datasets 🤔

But to keep models comparable (evaluated on all dataset) everyone would need to run >20 models on their dataset.

imenelydiaker · 2024-04-03T18:28:36Z

You don't count the language specific models? Some of the 42 models were French only models..

Yea so had the idea that all models would be run on everything, but that seems problematic/wasteful. We should problably make it more clear what we mean by running a model. @imenelydiaker do you (or someone from your team) have the time to outline that segment?

For now I would remove the models and get this PR merged it (then we can always add it in a new PR).

I'll take care of that 🙂

Also, I don't think contributors will be able to run models on the whole benchmark, it's really huge.

Yea I believe the compute cost of the MMTEB is something that we have to bring down. One solution might be to limit very large datasets. Another option is also to estimate the performance on unseen datasets in a smart way (e.g. estimate the latent factor of a models).

Yep agreed, we may not need huge datasets since we have a lot of them. We should discuss this. 🤔

I expect them to run models on their proposed datasets 🤔

But to keep models comparable (evaluated on all dataset) everyone would need to run >20 models on their dataset.

Yes!

imenelydiaker · 2024-04-03T18:31:45Z

@KennethEnevoldsen final proposition, also added a column for affiliations (not all my colleagues have affiliations since some of them contributed during their free time):

Datasets:

DiaBLaBitextMining.py : 2
FloresBitextMining.py: 2 * 129 (204 languages pairs - 75 pairs already handled by mteb) = 258
MasakhaNEWSClassification.py : 2 * 16 + 2 * 14 = 32 + 28 = 60 (16 languages with 14 languages newly handled).
AlloProfClusteringP2P.py: 2
AlloProfClusteringS2S.py: 2
HALClusteringS2S.py: 2
MasakhaNEWSClusteringP2P.py: 2 * 16 = 32
MasakhaNEWSClusteringS2S.py: 2 * 16 = 32
OpusparcusPC.py: 2 * 6 = 12
AlloprofReranking.py : 2 + 2 (first dataset for French) = 4
SyntecReranking.py: 2
AlloprofRetrieval.py: 2 + 2 (at the time it was the first French retrieval dataset) = 4
BSARDRetrieval.py: 2
SyntecRetrieval.py: 2
HagridRetrieval.py : 2
SickFrSTS.py: 2
STSBenchmarkMultilingualSTS.py: 2 * 10 = 20
SummEvalFrSummarization.py : 2

Total for the French contrib: 442
Nb points for each member: 442 / 5 = 88

PR reviews:

KennethEnevoldsen · 2024-04-04T11:33:56Z

Wonder thanks @imenelydiaker I think this is very reasonable, feel free to merge it in.

Muennighoff · 2024-04-04T12:36:55Z

As you're now really familiar with the point system, would you be open to also add points to people for these PRs (maybe in a separate PR):
#214
#197
#116

* Update points.md * Update docs/mmteb/points.md * Update points.md * Update points.md

Update points.md

822b142

imenelydiaker requested a review from KennethEnevoldsen March 31, 2024 21:17

imenelydiaker changed the title ~~Adding French team contribution~~ Adding French team contribution points Mar 31, 2024

KennethEnevoldsen reviewed Apr 3, 2024

View reviewed changes

docs/mmteb/points.md Outdated Show resolved Hide resolved

Update docs/mmteb/points.md

bf5f0da

Update points.md

cfa2b66

imenelydiaker added 2 commits April 4, 2024 14:32

Update points.md

75c91c4

Merge branch 'main' into chore/add-french-team-points

a2c5d3a

imenelydiaker merged commit 23c9fdd into main Apr 4, 2024
5 checks passed

imenelydiaker deleted the chore/add-french-team-points branch April 4, 2024 12:42

imenelydiaker mentioned this pull request Apr 4, 2024

docs: first stab at point attribution #280

Closed

MartinBernstorff pushed a commit that referenced this pull request Apr 10, 2024

Adding French team contribution points (#302)

5305b58

* Update points.md * Update docs/mmteb/points.md * Update points.md * Update points.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding French team contribution points #302

Adding French team contribution points #302

imenelydiaker commented Mar 31, 2024 •

edited

Loading

KennethEnevoldsen commented Apr 1, 2024

KennethEnevoldsen commented Apr 3, 2024

imenelydiaker commented Apr 3, 2024 •

edited

Loading

KennethEnevoldsen commented Apr 3, 2024

imenelydiaker commented Apr 3, 2024 •

edited

Loading

KennethEnevoldsen commented Apr 3, 2024

imenelydiaker commented Apr 3, 2024 •

edited

Loading

imenelydiaker commented Apr 3, 2024 •

edited

Loading

KennethEnevoldsen commented Apr 4, 2024

Muennighoff commented Apr 4, 2024

Adding French team contribution points #302

Adding French team contribution points #302

Conversation

imenelydiaker commented Mar 31, 2024 • edited Loading

KennethEnevoldsen commented Apr 1, 2024

KennethEnevoldsen commented Apr 3, 2024

imenelydiaker commented Apr 3, 2024 • edited Loading

KennethEnevoldsen commented Apr 3, 2024

imenelydiaker commented Apr 3, 2024 • edited Loading

KennethEnevoldsen commented Apr 3, 2024

imenelydiaker commented Apr 3, 2024 • edited Loading

imenelydiaker commented Apr 3, 2024 • edited Loading

KennethEnevoldsen commented Apr 4, 2024

Muennighoff commented Apr 4, 2024

imenelydiaker commented Mar 31, 2024 •

edited

Loading

imenelydiaker commented Apr 3, 2024 •

edited

Loading

imenelydiaker commented Apr 3, 2024 •

edited

Loading

imenelydiaker commented Apr 3, 2024 •

edited

Loading

imenelydiaker commented Apr 3, 2024 •

edited

Loading