Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically add metadata to Hugging Face Hub repos when uploading projects #793

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

juhoinkinen
Copy link
Member

@juhoinkinen juhoinkinen commented Jun 17, 2024

With this PR, when running annif upload:

  • if README.md (Model Card) does not exist in the destination repository, then README.md is created with default contents and some metadata of the uploaded projects,
  • if README.md exists, its metadata are updated as necessary.

Closes #790.

The metadata includes these:

language:
- <language-code tags automatically obtained from the uploaded projects>
tags:
- annif   # custom tag
pipeline_tag: text-classification  # HFH tag

The Model Card text content is very minimal; it has just the repo name as the heading and info about how to download projects from the repo, see an example in https://huggingface.co/juhoinkinen/Annif-models-upload-testing.

@juhoinkinen
Copy link
Member Author

About @osma's suggestions in #790 (comment):

For example it could include the Annif version used for training, the backend, vocabulary name and size, possibly some of the hyperparameters / configuration settings as well.

  • Annif version:
    • The Annif version used for training is not stored anywhere at the moment; the version performing the upload is not necessarily the same. This kind of metadata should be first stored somewhere, for which there is the issue Store metadata of project training #329
  • Backend, vocabulary name and other project configuration:

Copy link

sonarcloud bot commented Jun 18, 2024

Quality Gate Passed Quality Gate passed

Issues
6 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

Copy link

codecov bot commented Jun 18, 2024

Codecov Report

Attention: Patch coverage is 96.66667% with 4 lines in your changes missing coverage. Please review.

Project coverage is 99.60%. Comparing base (3b5f7a1) to head (55b0ffc).
Report is 30 commits behind head on main.

Files with missing lines Patch % Lines
annif/hfh_util.py 91.66% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #793      +/-   ##
==========================================
- Coverage   99.64%   99.60%   -0.05%     
==========================================
  Files          91       93       +2     
  Lines        6817     7048     +231     
==========================================
+ Hits         6793     7020     +227     
- Misses         24       28       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@juhoinkinen
Copy link
Member Author

@CodiumAI-Agent /review

@CodiumAI-Agent
Copy link

CodiumAI-Agent commented Jun 18, 2024

PR Reviewer Guide 🔍

(Review updated until commit 845f53d)

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Key issues to review

Error Handling
The function upsert_modelcard lacks error handling for potential failures during the push_to_hub operation. Consider adding try-except blocks to handle exceptions that might arise during the push operation, ensuring that the function can gracefully handle errors and provide meaningful feedback to the user.

Configuration Error Handling
The error handling in _read_config might not provide clear feedback to the user since it directly raises ConfigurationException with err.message, which might not be defined. It's recommended to ensure that the exception message is informative and user-friendly.

)
def test_upsert_modelcard_existing_card(ModelCard, _list_files_in_hf_hub, project):
repo_id = "annif-user/Annif-HFH-repo"
project.vocab_lang = "fi"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The project fixture does not provide vocab_lang so for these tests it is just set here, not super clean maybe?

@juhoinkinen juhoinkinen marked this pull request as ready for review June 18, 2024 10:31
@juhoinkinen
Copy link
Member Author

Possible Bug:
Ensure that the upsert_modelcard function handles cases where project language data might be missing or malformed. > The current implementation assumes that proj.vocab_lang is always available and valid.

Good point by the AI, but I think the project language is always set if this point is reached...?

@juhoinkinen juhoinkinen requested a review from osma June 18, 2024 10:38
@CodiumAI-Agent
Copy link

Persistent review updated to latest commit 845f53d

@juhoinkinen
Copy link
Member Author

I added an automatically updating Projects section to the modelcard, like this: https://huggingface.co/juhoinkinen/Annif-models-upload-testing#projects

annif/config.py Fixed Show fixed Hide fixed
annif/config.py Fixed Show fixed Hide fixed
logger.debug("Reading configuration from a string in CFG format")
read_method = self._config.read_string
source = projstr
self._read_config(read_method, source)

Check failure

Code scanning / CodeQL

Potentially uninitialized local variable Error

Local variable 'read_method' may be used before it is initialized.
logger.debug("Reading configuration from a string in CFG format")
read_method = self._config.read_string
source = projstr
self._read_config(read_method, source)

Check failure

Code scanning / CodeQL

Potentially uninitialized local variable Error

Local variable 'source' may be used before it is initialized.
Copy link

sonarcloud bot commented Sep 19, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automatically add metadata to Hugging Face Hub repos when uploading projects
2 participants