Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock cache file of HF model list #6628

Merged
merged 3 commits into from
Oct 15, 2024
Merged

Conversation

tohtana
Copy link
Contributor

@tohtana tohtana commented Oct 15, 2024

The error in the following log suggests that the cache file for HF model list can be broken:
https://github.com/microsoft/DeepSpeed/actions/runs/11343665365/job/31546708118?pr=6614

The actual cause of the above error is unclear, but _hf_model_list potentially breaks the cache file when it is concurrently called from multiple processes. This PR locks the cache file to ensure _hf_model_list safely reads and writes the file.

@tohtana tohtana changed the title Lock hf model list file Lock cache file of HF model list Oct 15, 2024
@tohtana tohtana added this pull request to the merge queue Oct 15, 2024
Merged via the queue into master with commit 1a45bd8 Oct 15, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants