Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update huggingface_hub Version in the storage initializer to fix ImportError #2180

Conversation

helenxie-bit
Copy link
Contributor

What this PR does / why we need it:
Due to the update of huggingface_hub, split_torch_state_dict_into_shards is not supported in v0.19.3. Therefore, I updated the version in the requirements.txt for the storage initializer to fix the "ImportError".

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #2179

Checklist:

@coveralls
Copy link

Pull Request Test Coverage Report for Build 10026064625

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall first build on helenxie/fix_huggingface_hub_version at 35.398%

Totals Coverage Status
Change from base Build 9999203579: 35.4%
Covered Lines: 4377
Relevant Lines: 12365

💛 - Coveralls

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this error is irrelevant to the huggingface_hub version.
Which peft version do you use in your local?

I guess that your local peft version is newer than v0.3.0:

@helenxie-bit
Copy link
Contributor Author

It seems that this error is irrelevant to the huggingface_hub version. Which peft version do you use in your local?

I guess that your local peft version is newer than v0.3.0:

@tenzen-y My local peft version is 0.3.0:

Name: peft
Version: 0.3.0
Summary: Parameter-Efficient Fine-Tuning (PEFT)
Home-page: https://github.com/huggingface/peft
Author: The HuggingFace team
Author-email: [email protected]
License: Apache
Location: /opt/homebrew/anaconda3/envs/kubeflow/lib/python3.11/site-packages
Requires: accelerate, numpy, packaging, psutil, pyyaml, torch, transformers
Required-by: 

And here is the detailed information of the error:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/app/storage_initializer/storage.py", line 2, in <module>
    from .hugging_face import HuggingFace, HuggingFaceDataset
  File "/app/storage_initializer/hugging_face.py", line 8, in <module>
    from peft import LoraConfig
  File "/usr/local/lib/python3.11/site-packages/peft/__init__.py", line 22, in <module>
    from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING, get_peft_config, get_peft_model
  File "/usr/local/lib/python3.11/site-packages/peft/mapping.py", line 16, in <module>
    from .peft_model import (
  File "/usr/local/lib/python3.11/site-packages/peft/peft_model.py", line 22, in <module>
    from accelerate import dispatch_model, infer_auto_device_map
  File "/usr/local/lib/python3.11/site-packages/accelerate/__init__.py", line 16, in <module>
    from .accelerator import Accelerator
  File "/usr/local/lib/python3.11/site-packages/accelerate/accelerator.py", line 34, in <module>
    from huggingface_hub import split_torch_state_dict_into_shards
ImportError: cannot import name 'split_torch_state_dict_into_shards' from 'huggingface_hub' (/usr/local/lib/python3.11/site-packages/huggingface_hub/__init__.py)

Do you have any idea where the problem could be?

@andreyvelich
Copy link
Member

@tenzen-y @helenxie-bit Getting the same error on my side with huggingface_hub==0.19.3 version
I think, this update can be related: #2056.

@tenzen-y @johnugeorge @deepanker13 Should we move this forward to fix errors in train API ?

Additionally, @helenxie-bit if you could help us with some simple e2e tests for train API that would be amazing!

@helenxie-bit
Copy link
Contributor Author

Yeah, of course. I can help with the e2e tests.

@tenzen-y
Copy link
Member

@tenzen-y @helenxie-bit Getting the same error on my side with huggingface_hub==0.19.3 version I think, this update can be related: #2056.

@tenzen-y @johnugeorge @deepanker13 Should we move this forward to fix errors in train API ?

Additionally, @helenxie-bit if you could help us with some simple e2e tests for train API that would be amazing!

SGTM

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we can merge it. Thanks @helenxie-bit!
/lgtm
/approve

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 0f8588a into kubeflow:master Aug 5, 2024
39 checks passed
andreyvelich pushed a commit to andreyvelich/training-operator that referenced this pull request Aug 29, 2024
andreyvelich pushed a commit to andreyvelich/training-operator that referenced this pull request Aug 29, 2024
…portError (kubeflow#2180)

Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: Andrey Velichkevich <[email protected]>
google-oss-prow bot pushed a commit that referenced this pull request Aug 29, 2024
* Update `huggingface_hub` Version in the storage initializer to fix ImportError (#2180)

Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: Andrey Velichkevich <[email protected]>

* [SDK] Fix trainer error: Update the version of base image and add "num_labels" for downloading pretrained models (#2230)

* fix trainer error

Signed-off-by: helenxie-bit <[email protected]>

* rerun tests

Signed-off-by: helenxie-bit <[email protected]>

* update the process of num_labels in trainer

Signed-off-by: helenxie-bit <[email protected]>

* rerun tests

Signed-off-by: helenxie-bit <[email protected]>

* adjust the  default value of 'num_labels'

Signed-off-by: helenxie-bit <[email protected]>

---------

Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: Andrey Velichkevich <[email protected]>

---------

Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: Andrey Velichkevich <[email protected]>
Co-authored-by: Hezhi Xie <[email protected]>
Co-authored-by: Hezhi (Helen) Xie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"ImportError" when running fine-tuning API
4 participants