[AIR] `HuggingFaceTrainer`&`Predictor` implementation #23876

Yard1 · 2022-04-12T23:27:23Z

Why are these changes needed?

Implements HuggingFaceTrainer & HuggingFacePredictor.

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

python/ray/ml/train/integrations/huggingface/huggingface_trainer.py

amogkam

LGTM! Just some minor nits

python/ray/ml/predictors/integrations/huggingface/huggingface_predictor.py

python/ray/ml/tests/test_huggingface_predictor.py

amogkam · 2022-04-28T02:13:41Z

python/ray/ml/tests/test_huggingface_predictor.py

+    model_config = AutoConfig.from_pretrained(model_checkpoint)
+    model = AutoModelForCausalLM.from_config(model_config)
+    predictor = HuggingFacePredictor(
+        pipeline=pipeline(


Nice, loving the use of pipeline!

python/ray/ml/train/integrations/huggingface/huggingface_trainer.py

python/ray/ml/utils/huggingface_checkpoint_utils.py

python/ray/ml/examples/huggingface/huggingface_basic_language_modeling_example.py

amogkam · 2022-04-29T18:21:03Z

python/ray/ml/train/integrations/huggingface/huggingface_utils.py

+
+        def get_train_dataloader(self):
+            if self.train_dataset is None:
+                raise ValueError("Trainer: training requires a train_dataset.")


Is this error message exposed to users? If so, can we have a better message here?

This is the same message & validation as in HuggingFace.

amogkam · 2022-04-29T18:31:02Z

python/ray/ml/train/integrations/huggingface/huggingface_trainer.py

+        if resume_from_checkpoint:
+            self._param_dict[
+                "resume_from_checkpoint"
+            ] = self._convert_directory_checkpoint_to_sync(resume_from_checkpoint)


Ah I see we need this in both as_trainable and setup since we have to support both user-specified resuming and Tune resuming.

If we change the logic here a little bit: https://github.com/ray-project/ray/blob/master/python/ray/ml/trainer.py#L347-L352

to be like this instead

if checkpoint_dir: config["resume_from_checkpoint"] = Checkpoint.from_directory(checkpoint_dir) trainer = trainer_cls(**config)

then _convert_directory_checkpoint_to_sync only needs to be called in init, right?

Correct— my suggestion would still resolve both cases right?

That should work, yeah. Not sure if this change should be made here though, as it would impact all trainers (possibly in unforeseen ways). Let's do that in a followup PR

Co-authored-by: Amog Kamsetty <[email protected]>

python/ray/ml/examples/huggingface/huggingface_basic_language_modeling_example.py

…modeling_example.py

Yard1 added 15 commits April 7, 2022 12:27

WIP

b72c722

WIP

9a0b41e

WIP

5ca35b0

WIP

f8e153a

Make datasets arg mandatory

240e661

WIP

196905a

Merge branch 'master' into hf_trainer_implementation

d73991e

Add docs

f6e9daf

WIP

55633ef

Merge branch 'master' into hf_trainer_implementation

e94d55d

Merge branch 'master' into hf_trainer_implementation

587f8ad

HuggingFaceTrainer

3152272

Add basic example

7b19a01

Remove notebook

0a0223e

Doc

ec83f90

Yard1 added this to the Ray AIR milestone Apr 12, 2022

Yard1 requested review from maxpumperla, matthewdeng, richardliaw, gjoliver, amogkam and krfricke April 12, 2022 23:27

Yard1 assigned ericl, matthewdeng, richardliaw, amogkam and krfricke Apr 12, 2022

Yard1 commented Apr 12, 2022

View reviewed changes

python/ray/ml/train/integrations/huggingface/huggingface_trainer.py Outdated Show resolved Hide resolved

Yard1 added 2 commits April 12, 2022 23:30

Doc

66cef4a

Better example

347ba0b

Yard1 requested review from clarkzinzow and ericl April 27, 2022 21:09

Yard1 added 2 commits April 27, 2022 21:14

Clarify

8f085b4

Remove shuffle mention from docstring

2126556

Yard1 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Apr 27, 2022

Yard1 added 3 commits April 28, 2022 12:47

Merge branch 'master' into hf_trainer_implementation

89e5d55

Doc fix

0abab40

Upgrade torch version

3c69367

Yard1 requested a review from sven1977 as a code owner April 28, 2022 20:33

Yard1 added 5 commits April 28, 2022 23:47

Update requirements_ml_docker.txt

dc4fb41

Update requirements_dl.txt

2c83878

Revert

58d8136

Revert

790a31e

Update huggingface_basic_language_modeling_example.py

3ed8551

ericl approved these changes Apr 28, 2022

View reviewed changes

Yard1 added 8 commits April 29, 2022 00:23

Update huggingface_basic_language_modeling_example.py

3bc93e4

Update huggingface_basic_language_modeling_example.py

e48794c

Merge branch 'ray-project:master' into hf_trainer_implementation

79ad5b4

Update custom_directives.py

42d99dd

Update custom_directives.py

cf322b2

Merge branch 'ray-project:master' into hf_trainer_implementation

27f1d70

Merge branch 'ray-project:master' into hf_trainer_implementation

1757ff1

Better checkpoint detection

71d2f1b

amogkam approved these changes Apr 29, 2022

View reviewed changes

Yard1 and others added 3 commits April 29, 2022 20:34

Apply suggestions from code review

3b45939

Co-authored-by: Amog Kamsetty <[email protected]>

Add context

b26f67f

Load huggingface checkpoint to staticmethod

2c8f48c

amogkam reviewed Apr 29, 2022

View reviewed changes

python/ray/ml/examples/huggingface/huggingface_basic_language_modeling_example.py Outdated Show resolved Hide resolved

Update python/ray/ml/examples/huggingface/huggingface_basic_language_…

b549dbe

…modeling_example.py

amogkam merged commit ff0ced1 into ray-project:master Apr 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AIR] `HuggingFaceTrainer`&`Predictor` implementation #23876

[AIR] `HuggingFaceTrainer`&`Predictor` implementation #23876

Yard1 commented Apr 12, 2022 •

edited

Loading

amogkam left a comment

amogkam Apr 28, 2022

amogkam Apr 29, 2022

Yard1 Apr 29, 2022

amogkam Apr 29, 2022

amogkam Apr 29, 2022

Yard1 Apr 29, 2022

[AIR] HuggingFaceTrainer&Predictor implementation #23876

[AIR] HuggingFaceTrainer&Predictor implementation #23876

Conversation

Yard1 commented Apr 12, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

amogkam left a comment

Choose a reason for hiding this comment

amogkam Apr 28, 2022

Choose a reason for hiding this comment

amogkam Apr 29, 2022

Choose a reason for hiding this comment

Yard1 Apr 29, 2022

Choose a reason for hiding this comment

amogkam Apr 29, 2022

Choose a reason for hiding this comment

amogkam Apr 29, 2022

Choose a reason for hiding this comment

Yard1 Apr 29, 2022

Choose a reason for hiding this comment

[AIR] `HuggingFaceTrainer`&`Predictor` implementation #23876

[AIR] `HuggingFaceTrainer`&`Predictor` implementation #23876

Yard1 commented Apr 12, 2022 •

edited

Loading