Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding French team contribution points #302

Merged
merged 5 commits into from
Apr 4, 2024

Conversation

imenelydiaker
Copy link
Contributor

@imenelydiaker imenelydiaker commented Mar 31, 2024

After discussing it with the team, we agreed on splitting the total numbers of points for French on 5 since we consider that we all contributed equally. I suggest the following and have some questions before adding the point to the points.md file:

Datasets:

Evaluated models:

  • 42 = 1 * 42 models evaluated

I didn't add PR review points for @wissam-sib and I, I lost track of them 😅 but I'll try to do that quickly!

@imenelydiaker imenelydiaker changed the title Adding French team contribution Adding French team contribution points Mar 31, 2024
@KennethEnevoldsen
Copy link
Contributor

FloresBitextMining.py: 2 * 200 (200 languages pairs) = 400... How should we handle bitextmining tasks?

I would argue this as one dataset so 2 points, but then you get the 2 bonus point pr. language which is not previously covered within Bitext mining (Tataoba). That should still give quite a few points for a valuable dataset.

Note: that the bonus points have been updated slightly to accommodate for exactly these cases. Let me know what you think.

MasakhaNEWSClassification.py : 2 * 16 = 32 (may add 4 bonus points for some languaes since it's their first dataset on mteb)

Def. add the bonus points. Generally it seems like you are missing quite a few bonus points (e.g. for retrieval x French)

SummEvalFrSummarization.py : 6 = 2 + 4 (first dataset for French Summarization).

In relation to MMTEB I don't believe we should add points for this as it is only machine-translated.

@KennethEnevoldsen
Copy link
Contributor

I didn't add PR review points for @wissam-sib and I, I lost track of them 😅 but I'll try to do that quickly!

Will you add these as well?

@imenelydiaker
Copy link
Contributor Author

imenelydiaker commented Apr 3, 2024

@KennethEnevoldsen here is an update, still have some questions about dataset points. For PR reviews I just gave 1 point per review:

Datasets:

Evaluated models:

  • 42 = 1 * 42 models evaluated

Total for the French contrib: 486
Nb points for each member: 486 / 5 = 97

PR reviews:

@KennethEnevoldsen
Copy link
Contributor

FloresBitextMining.py: 2 * 129 (200 languages pairs - 75 pairs already handled by mteb) = 258

Shouldn't this be 2 * 125 = 250?

BSARDRetrieval.py: 2 (should I add the bonus points again here?)

I believe the best approach is to not have bonuses twice.

42 = 1 * 42 models evaluated

I would not include these as models evaluated is on the whole of MMTEB, which will change due to new additions.

@imenelydiaker
Copy link
Contributor Author

imenelydiaker commented Apr 3, 2024

FloresBitextMining.py: 2 * 129 (200 languages pairs - 75 pairs already handled by mteb) = 258

Shouldn't this be 2 * 125 = 250?

Sorry Flores is 204 languages. So it's 204 - 75 = 129

BSARDRetrieval.py: 2 (should I add the bonus points again here?)

I believe the best approach is to not have bonuses twice.

Ok we'll keep them on only one of the datasets.

42 = 1 * 42 models evaluated

I would not include these as models evaluated is on the whole of MMTEB, which will change due to new additions.

You don't count the language specific models? Some of the 42 models were French only models.. Also, I don't think contributors will be able to run models on the whole benchmark, it's really huge. I expect them to run models on their proposed datasets 🤔

@KennethEnevoldsen
Copy link
Contributor

You don't count the language specific models? Some of the 42 models were French only models..

Yea so had the idea that all models would be run on everything, but that seems problematic/wasteful. We should problably make it more clear what we mean by running a model. @imenelydiaker do you (or someone from your team) have the time to outline that segment?

For now I would remove the models and get this PR merged it (then we can always add it in a new PR).

Also, I don't think contributors will be able to run models on the whole benchmark, it's really huge.

Yea I believe the compute cost of the MMTEB is something that we have to bring down. One solution might be to limit very large datasets. Another option is also to estimate the performance on unseen datasets in a smart way (e.g. estimate the latent factor of a models).

I expect them to run models on their proposed datasets 🤔

But to keep models comparable (evaluated on all dataset) everyone would need to run >20 models on their dataset.

@imenelydiaker
Copy link
Contributor Author

imenelydiaker commented Apr 3, 2024

You don't count the language specific models? Some of the 42 models were French only models..

Yea so had the idea that all models would be run on everything, but that seems problematic/wasteful. We should problably make it more clear what we mean by running a model. @imenelydiaker do you (or someone from your team) have the time to outline that segment?

For now I would remove the models and get this PR merged it (then we can always add it in a new PR).

I'll take care of that 🙂

Also, I don't think contributors will be able to run models on the whole benchmark, it's really huge.

Yea I believe the compute cost of the MMTEB is something that we have to bring down. One solution might be to limit very large datasets. Another option is also to estimate the performance on unseen datasets in a smart way (e.g. estimate the latent factor of a models).

Yep agreed, we may not need huge datasets since we have a lot of them. We should discuss this. 🤔

I expect them to run models on their proposed datasets 🤔

But to keep models comparable (evaluated on all dataset) everyone would need to run >20 models on their dataset.

Yes!

@imenelydiaker
Copy link
Contributor Author

imenelydiaker commented Apr 3, 2024

@KennethEnevoldsen final proposition, also added a column for affiliations (not all my colleagues have affiliations since some of them contributed during their free time):

Datasets:

Total for the French contrib: 442
Nb points for each member: 442 / 5 = 88

PR reviews:

@KennethEnevoldsen
Copy link
Contributor

Wonder thanks @imenelydiaker I think this is very reasonable, feel free to merge it in.

@Muennighoff
Copy link
Contributor

As you're now really familiar with the point system, would you be open to also add points to people for these PRs (maybe in a separate PR):
#214
#197
#116

#134

#137

#227

#224

#210

@imenelydiaker imenelydiaker merged commit 23c9fdd into main Apr 4, 2024
5 checks passed
@imenelydiaker imenelydiaker deleted the chore/add-french-team-points branch April 4, 2024 12:42
MartinBernstorff pushed a commit that referenced this pull request Apr 10, 2024
* Update points.md

* Update docs/mmteb/points.md

* Update points.md

* Update points.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants